Consider the following piece of C++ Code:

#include <iostream> #include <cmath> using namespace std; int main() { cout.precision(1000000000); float a,b,c; a = 1; b = -1; c = pow(2, -50); cout << "a = " << a << endl; cout << "b = " << b << endl; cout << "c = " << c << endl; float ab = a + b; float bc = b + c; float abc = ab + c; float bca = bc + a; cout << "a + b = " << ab << endl; cout << "b + c = " << bc << endl; cout << "(a + b) + c = " << abc << endl; cout << "(b + c) + a = " << bca << endl; return 0; }

Which yields the output:

a = 1 b = -1 c = 8.8817841970012523233890533447265625e-16 a + b = 0 b + c = -1 (a + b) + c = 8.8817841970012523233890533447265625e-16 (b + c) + a = 0

Why is b + c = -1?

I am not getting my head around this effect of the IEEE 754 standard.

To my understanding the exponent ranges from -126 to 127. (8 bit for the biased exponent with a bias of 127.)

So 2^(-50) is representable without an issue as is 1 or -1. Neither of them are subnormal (denormalized) numbers, if I understand the standard correctly.

But why does the addition of -1 + 2^(-50) result in -1, thus the smaller number being neglected?

Thanks in advance for any help!

## Answer

The IEEE 754 standard specifies 1 sign bit, 7 exponent bits and 24 bits for the mantissa. When performing addition, the mantissas of each number get normalized, so 2^-50 is 1 shifted right by 50 bits relative to 1. This causes it to fall outside of the 24 bit mantissa used for the result. You should try repeating your experiment with 2^-25 to prove this.