Floating-Point in Practice: Absorption, Non-Associativity and Comparisons
Floating-Point in Practice: Absorption, Non-Associativity and Comparisons
Floating-point arithmetic in computers is limited by precision, which can lead to rounding errors and inconsistent results when performing sequential operations. This page discusses key pitfalls such as absorption and equality comparison issues, and how to handle them properly.
Non-Associativity
Floating-point operations are **not associative**, meaning that `(a + b) + c` may produce a different result than `a + (b + c)` due to rounding differences.
Example in C:
#include <stdio.h>
int main() {
float val1 = 100.0, val2 = 0.05, val3 = 0.05;
printf("sum 1: %f\n", (val1 + val2) + val3);
printf("sum 2: %f\n", val1 + (val2 + val3));
}
Output:
sum 1: 100.100006 sum 2: 100.099998
These results differ slightly because of rounding after each intermediate operation.
Example in Java
public class Absorption {
public static void main(String[] args) {
float value1 = 100.0f;
float value2 = 0.05f;
float value3 = 0.05f;
System.out.println("sum 1: " + ((value1 + value2) + value3));
System.out.println("sum 2: " + (value1 + (value2 + value3)));
System.out.println("sum 3: " + (value1 + value2 + value3));
}
}
Output:
sum 1: 100.100006 sum 2: 100.1 sum 3: 100.100006
Even though the mathematical result is identical, rounding differences lead to unequal binary representations.
Absorption Effect
When adding a very small number to a very large one, the small number may be **absorbed** and have no effect due to limited precision in the mantissa.
Example:
1.0 × 10⁶ + 0.001 = 1.0 × 10⁶
This occurs because 0.001 cannot be represented at the precision scale of one million in floating-point format.
Comparison Problems
Due to rounding and absorption, **floating-point numbers must not be compared using direct equality (==)**. Even values that appear identical may differ slightly.
Faulty example:
final float EPSILON = 1e-6f;
if (Math.abs(value1 - value2) < EPSILON) {
// equal?
}
This “epsilon check” works only for small values. For larger numbers, the difference threshold must scale with magnitude.
Scaled (Relative) Comparison
A more reliable approach compares the *relative difference* between two values:
Formula:
( |a − b| ) / min(|a|, |b|) < ε
If a or b ≈ 0, use an absolute error threshold instead.
Example: For a = 100.0999985 and b = 100.100006, |a − b| = 7.5×10⁻⁶ → (|a − b|) / min(a, b) ≈ 7.5×10⁻⁸ < 10⁻⁶ → equal.
Pseudocode for Safe Comparison
diff ← | |a| − |b| |
if a == b then
return true
else if a == 0 or b == 0 or (|a| + |b| < min_normal_value) then
return diff < (ε × min_normal_value)
else
return (diff / min(|a| + |b|, max_float_value)) < ε
end if
Practical Note
In engineering and scientific contexts, exact equality is rarely needed. Instead, results are compared using inequalities (>, <) or checked against acceptable tolerances.
Example:
- In structural analysis, whether a beam stress is *below* a safety limit is more relevant than exact equality.
However, rounding errors can accumulate across repeated computations or iterative algorithms — sometimes leading to significant deviation if not properly accounted for.