Floating-Point in Practice: Absorption, Non-Associativity and Comparisons

Floating-point arithmetic in computers is limited by precision, which can lead to rounding errors and inconsistent results when performing sequential operations. This page discusses key pitfalls such as absorption and equality comparison issues, and how to handle them properly.

Non-Associativity

Floating-point operations are **not associative**, meaning that `(a + b) + c` may produce a different result than `a + (b + c)` due to rounding differences.

Example in C:

#include <stdio.h>
int main() {
    float val1 = 100.0, val2 = 0.05, val3 = 0.05;
    printf("sum 1: %f\n", (val1 + val2) + val3);
    printf("sum 2: %f\n", val1 + (val2 + val3));
}

Output:

sum 1: 100.100006  
sum 2: 100.099998

These results differ slightly because of rounding after each intermediate operation.

Example in Java

public class Absorption {
  public static void main(String[] args) {
    float value1 = 100.0f;
    float value2 = 0.05f;
    float value3 = 0.05f;

    System.out.println("sum 1: " + ((value1 + value2) + value3));
    System.out.println("sum 2: " + (value1 + (value2 + value3)));
    System.out.println("sum 3: " + (value1 + value2 + value3));
  }
}

Output:

sum 1: 100.100006  
sum 2: 100.1  
sum 3: 100.100006

Even though the mathematical result is identical, rounding differences lead to unequal binary representations.

Absorption Effect

When adding a very small number to a very large one, the small number may be **absorbed** and have no effect due to limited precision in the mantissa.

Example:

1.0 × 10⁶ + 0.001 = 1.0 × 10⁶

This occurs because 0.001 cannot be represented at the precision scale of one million in floating-point format.

Comparison Problems

Due to rounding and absorption, **floating-point numbers must not be compared using direct equality (==)**. Even values that appear identical may differ slightly.

Faulty example:

final float EPSILON = 1e-6f;
if (Math.abs(value1 - value2) < EPSILON) {
    // equal?
}

This “epsilon check” works only for small values. For larger numbers, the difference threshold must scale with magnitude.

Scaled (Relative) Comparison

A more reliable approach compares the *relative difference* between two values:

Formula:

( |a − b| ) / min(|a|, |b|) < ε

If a or b ≈ 0, use an absolute error threshold instead.

Example: For a = 100.0999985 and b = 100.100006, |a − b| = 7.5×10⁻⁶ → (|a − b|) / min(a, b) ≈ 7.5×10⁻⁸ < 10⁻⁶ → equal.

Pseudocode for Safe Comparison

diff ← | |a| − |b| |
if a == b then
    return true
else if a == 0 or b == 0 or (|a| + |b| < min_normal_value) then
    return diff < (ε × min_normal_value)
else
    return (diff / min(|a| + |b|, max_float_value)) < ε
end if

Practical Note

In engineering and scientific contexts, exact equality is rarely needed. Instead, results are compared using inequalities (>, <) or checked against acceptable tolerances.

Example:

In structural analysis, whether a beam stress is *below* a safety limit is more relevant than exact equality.

However, rounding errors can accumulate across repeated computations or iterative algorithms — sometimes leading to significant deviation if not properly accounted for.

Floating-Point in Practice: Absorption, Non-Associativity and Comparisons

Contents

Floating-Point in Practice: Absorption, Non-Associativity and Comparisons

Non-Associativity

Example in Java

Absorption Effect

Comparison Problems

Scaled (Relative) Comparison

Pseudocode for Safe Comparison

Practical Note

Navigation menu

Floating-Point in Practice: Absorption, Non-Associativity and Comparisons

Floating-Point in Practice: Absorption, Non-Associativity and Comparisons

Non-Associativity

Example in Java

Absorption Effect

Comparison Problems

Scaled (Relative) Comparison

Pseudocode for Safe Comparison

Practical Note

Navigation menu

Search