The Most Obscure Arithmetic Run-Time Error Contest

by Yannick Moy in Formal Verification – September 22, 2016

Something that many developers do not realize is the number of run-time checks that occur in innocent looking arithmetic expressions. Of course, everyone knows about overflow checks and range checks (although many people confuse them) and division by zero. After all, these are typical errors that do show up in programs, so programmers are aware that they should keep an eye on these. Or do they?

Let's start with the typical examples of overflows and division by zero:

   Z := X + Y;
   Z := X / Y;

Here, X, Y and Z might be integers, fixed-point numbers or floating-point numbers. If X and Y are too big, X + Y won't fit into a machine integer or floating-point, hence the overflow error. If Y is zero, X / Y has no meaningful value, hence the division by zero error.

Let's look now at a more exotic overflow, which happens when you negate a signed integer:

   Z := -X;

In the specific case where X is the minimal machine integer of this size (say, -2147483648 for a 32 bits signed integer), negating it results in a value that's one more than the maximal integer of that size (2147483647 for a 32 bits signed integer). This is because machine signed integers are asymmetric: there is one more negative value than there are positive values.

This error is rather common, because it is such a special case, so programmers tend to overlook it. Take for example the original C code of the Crazyflie small drone that we translated into SPARK. They discovered the bug after debugging a scenario that causes the drone to spin uncontrollably.

The same reason explains why there might be an overflow on absolute value operation:

   Z := abs X;

Indeed, if X is -2147483648 here, its absolute value would be 2147483648 which is more than the maximal 32 bits machine integer.

There is an even more obscure overflow, which happens when you divide integers:

   Z := X / Y;

In the specific case where X is again the minimal machine integer of this size (-2147483648 for a 32 bits signed integer), and Y is -1, dividing X by Y is the same as negating X, so we are back to the previous run-time error.

Now, let's open the curtain on the most obscure arithmetic run-time error: the possibility that an exponentiation on a floating-point value results in a division by zero. Yes, you may have values for X and N such that X**N results in a division by zero. How? Take X=0.0 and N=-1. The Ada standard says that X**N in that case is the same as 1/(X**(-N))... but X is zero, so X**(-N) is zero, so... yes, we have a division by zero. Amazing!

Well, it turned out that this was also a surprise for the SPARK developers. We did not know this rule of Ada, so we did not implement the check that detects this case in SPARK. This is now fixed thanks to Rod Chapman who noticed that. At this point, you should wonder how one can ever make sure to get rid of all run-time errors in her code? Are coding standard, reviews and testing going to detect all possible problems? I don't believe so. I believe that the only way to detect all possible such cases is to use a static analysis tool like SPARK. If we were able to miss one case, while we earn a living for focusing on such checking, no amount of attention and cleverness is going to make a developer catch all such cases in a real life application.

comments powered by Disqus