Legacy Encodings & Code Pages (EBCDIC, ISO-8859, CP1252)

ASCII was revolutionary but limited to English. As computing spread internationally, new encodings extended ASCII to support more characters. This page explains these historical encodings, how they evolved, and why Unicode replaced them.

The problem with ASCII

ASCII uses 7 bits per character, giving only 128 symbols. This excludes letters with accents (é, ä, ñ), non-Latin alphabets (Cyrillic, Greek, Hebrew), and scripts like Arabic or Thai.

When 8-bit bytes became standard, the extra bit was used to add 128 new symbols (128–255). Unfortunately, there was no single global standard — every region and vendor created its own **code page**.

EBCDIC

- EBCDIC (Extended Binary Coded Decimal Interchange Code)** was developed by IBM in 1963.

It was primarily used on IBM mainframes and punched cards.

8-bit encoding, unrelated to ASCII
Designed to minimize long runs of punched holes
Not compatible with ASCII — letter codes differ completely
Still occasionally found in legacy financial and government systems

Example (partial comparison):

Character	ASCII (decimal)	EBCDIC (decimal)
A	65	193
B	66	194
a	97	129
0	48	240

EBCDIC’s incompatibility made text exchange difficult between IBM systems and others.

ISO-8859 series (Latin-1 and friends)

To address regional diversity, ISO standardized several 8-bit extensions to ASCII. Each supported specific language groups.

Positions 0–127 identical to ASCII
Positions 128–255 defined for regional characters

Common variants:

Code Page | Alias | Region / Languages

| ------ | -----------------

Windows Code Pages (CP125x)

Microsoft extended the ISO series with its own encodings called **Windows Code Pages**. The most common was **CP1252 (Windows Latin-1)**, which added extra characters such as typographic quotes and the Euro symbol.

Differences:

ISO-8859-1 leaves positions 128–159 unused (reserved for control codes)
CP1252 uses these for printable symbols like “€”, “–”, “‘”, “’”, ““”, “””, etc.

CP1252 example (hex → character): ```

0x80 € 0x91 ‘ 0x92 ’ 0x93 “ 0x94 ” 0x95 • 0x96 – 0x97 —

```

Code page chaos

Because every region used different mappings for the same byte values:

A document encoded in CP1252 might display gibberish if opened as ISO-8859-2.
Multi-language systems (e.g., English + Russian) could not mix characters safely.
File exchange required explicit knowledge of the code page used.

This lack of interoperability led to **encoding mismatches** and the need for a universal standard.

Transition to Unicode

Unicode unified all scripts under one standard:

Every character has a unique code point (e.g. U+00E9 for “é”).
Unicode includes ASCII (0–127) and can represent characters from all code pages simultaneously.
UTF-8 became the universal encoding for modern systems.

Today:

Legacy code pages persist in old software and file archives.
Modern systems default to UTF-8 or UTF-16.
Tools like `iconv` or `recode` can convert between encodings.

Summary

EBCDIC was IBM’s early alternative to ASCII.
ISO-8859 extended ASCII for regional languages.
Windows CP125x added additional characters for specific markets.
Code pages were incompatible with each other.
Unicode and UTF-8 replaced them with a universal, consistent model.

Legacy Encodings & Code Pages (EBCDIC, ISO-8859, CP1252)

Contents

Legacy Encodings & Code Pages (EBCDIC, ISO-8859, CP1252)

The problem with ASCII

EBCDIC

ISO-8859 series (Latin-1 and friends)

Windows Code Pages (CP125x)

Code page chaos

Transition to Unicode

Summary

Navigation menu

Legacy Encodings & Code Pages (EBCDIC, ISO-8859, CP1252)

Legacy Encodings & Code Pages (EBCDIC, ISO-8859, CP1252)

The problem with ASCII

EBCDIC

ISO-8859 series (Latin-1 and friends)

Windows Code Pages (CP125x)

Code page chaos

Transition to Unicode

Summary

Navigation menu

Search