IEEE 754 Formats, Special Values and Rounding Modes
Jump to navigation
Jump to search
IEEE 754 Formats, Special Values and Rounding Modes
This page covers the standard IEEE 754 formats for floating-point numbers, including their bit structure, special values, and rounding behaviors.
IEEE 754 Formats
IEEE 754 defines several binary floating-point formats. The three most common are:
Format | Bits | Sign | Exponent | Mantissa | Bias
------- | ---- | ---- | -------- | -------- | ----
Single precision | 32 | 1 | 8 | 23 | 127
Double precision | 64 | 1 | 11 | 52 | 1023
Quadruple precision | 128 | 1 | 15 | 112 | 16383
Due to the *hidden bit*, the mantissa precision is actually one bit higher than the stored value (e.g., 24 bits for single precision).
Special Values
Certain bit patterns are reserved for special cases:
Exponent | Mantissa | Value | Description
------------------|----------|-------------------|---------------------------
00000000 | 000...0 | ±0 | Zero (positive or negative)
00000000 | ≠0 | ±0.m × 2^(1−bias) | Denormalized small number
00000001–11111110 | any | ±1.m × 2^(e−bias) | Normalized number
11111111 | 000...0 | ±∞ | Infinity
11111111 | ≠0 | NaN | Not a Number
Special values are important in computation, as they define behavior for overflows, underflows, and invalid operations.
Comments on Special Values
- **Zero (±0):** Distinguishes direction in some calculations (e.g., underflows).
- **Infinity (±∞):** Used when a result exceeds representable range.
- **NaN (Not a Number):** Represents undefined or invalid results (e.g., 0/0).
- **Denormalized values:** Extend range for very small numbers but reduce precision and can slow computation.
Example behaviors:
- +∞ + 3 = +∞
- 5 ÷ 0 = +∞
- 0 ÷ 0 = NaN
IEEE 754 Operations with Special Values
Operation | Result
--------------|--------
x ÷ ±∞ | 0
±∞ × ±∞ | ±∞
±non-zero ÷ 0 | ±∞
±0 ÷ ±0 | NaN
∞ + ∞ | ∞
∞ − ∞ | NaN
±∞ ÷ ±∞ | NaN
±∞ × 0 | NaN
Rounding Modes
Because most results cannot be represented exactly, IEEE 754 defines five rounding modes:
Mode | Description
--------------------------------------| ------------
Round to nearest, ties to even | Round to nearest representable value; if exactly halfway, choose the one with even least significant bit (default).
Round to nearest, ties away from zero | Round to nearest; if halfway, round upward (away from zero).
Round toward zero | Truncates fractional bits (rounds toward zero).
Round toward +∞ | Always rounds upward.
Round toward −∞ | Always rounds downward.
Rounding Examples
Rounding mode | +11.5 | +12.5 | −11.5 | −12.5
--------------------------------| ------| ----- | ----- | -------
To nearest, ties to even | +12.0 | +12.0 | −12.0 | −12.0
To nearest, ties away from zero | +12.0 | +13.0 | −12.0 | −13.0
Toward zero | +11.0 | +12.0 | −11.0 | −12.0
Toward +∞ | +12.0 | +13.0 | −11.0 | −12.0
Toward −∞ | +11.0 | +12.0 | −12.0 | −13.0
Significance of Rounding
- Rounding introduces small precision errors that accumulate over repeated operations.
- Different CPUs or compilers can produce slightly different results due to rounding mode defaults.
- Non-associativity: (a + b) + c may not equal a + (b + c).