IEEE 754 Formats, Special Values and Rounding Modes

From MediaWiki
Jump to navigation Jump to search

IEEE 754 Formats, Special Values and Rounding Modes

This page covers the standard IEEE 754 formats for floating-point numbers, including their bit structure, special values, and rounding behaviors.

IEEE 754 Formats

IEEE 754 defines several binary floating-point formats. The three most common are:

Format              | Bits | Sign | Exponent | Mantissa | Bias
-------             | ---- | ---- | -------- | -------- | ----
Single precision    | 32   | 1    | 8        | 23       | 127
Double precision    | 64   | 1    | 11       | 52       | 1023
Quadruple precision | 128  | 1    | 15       | 112      | 16383

Due to the *hidden bit*, the mantissa precision is actually one bit higher than the stored value (e.g., 24 bits for single precision).

Special Values

Certain bit patterns are reserved for special cases:

Exponent          | Mantissa | Value             | Description
------------------|----------|-------------------|---------------------------
00000000          | 000...0  | ±0                | Zero (positive or negative)
00000000          | ≠0       | ±0.m × 2^(1−bias) | Denormalized small number
00000001–11111110 | any      | ±1.m × 2^(e−bias) | Normalized number
11111111          | 000...0  | ±∞                | Infinity
11111111          | ≠0       | NaN               | Not a Number

Special values are important in computation, as they define behavior for overflows, underflows, and invalid operations.

Comments on Special Values

  • **Zero (±0):** Distinguishes direction in some calculations (e.g., underflows).
  • **Infinity (±∞):** Used when a result exceeds representable range.
  • **NaN (Not a Number):** Represents undefined or invalid results (e.g., 0/0).
  • **Denormalized values:** Extend range for very small numbers but reduce precision and can slow computation.

Example behaviors:

  • +∞ + 3 = +∞
  • 5 ÷ 0 = +∞
  • 0 ÷ 0 = NaN

IEEE 754 Operations with Special Values

Operation     | Result
--------------|--------
x ÷ ±∞        | 0
±∞ × ±∞       | ±∞
±non-zero ÷ 0 | ±∞
±0 ÷ ±0       | NaN
∞ + ∞         | ∞
∞ − ∞         | NaN
±∞ ÷ ±∞       | NaN
±∞ × 0        | NaN

Rounding Modes

Because most results cannot be represented exactly, IEEE 754 defines five rounding modes:

Mode                                  | Description
--------------------------------------| ------------
Round to nearest, ties to even        | Round to nearest representable value; if exactly halfway, choose the one with even least significant bit (default).
Round to nearest, ties away from zero | Round to nearest; if halfway, round upward (away from zero).
Round toward zero                     | Truncates fractional bits (rounds toward zero).
Round toward +∞                       | Always rounds upward.
Round toward −∞                       | Always rounds downward.

Rounding Examples

Rounding mode                   | +11.5 | +12.5 | −11.5 | −12.5
--------------------------------| ------| ----- | ----- | -------
To nearest, ties to even        | +12.0 | +12.0 | −12.0 | −12.0
To nearest, ties away from zero | +12.0 | +13.0 | −12.0 | −13.0
Toward zero                     | +11.0 | +12.0 | −11.0 | −12.0
Toward +∞                       | +12.0 | +13.0 | −11.0 | −12.0
Toward −∞                       | +11.0 | +12.0 | −12.0 | −13.0

Significance of Rounding

  • Rounding introduces small precision errors that accumulate over repeated operations.
  • Different CPUs or compilers can produce slightly different results due to rounding mode defaults.
  • Non-associativity: (a + b) + c may not equal a + (b + c).