ASCII: Code Chart, Control Codes & End-of-Line Conventions

From MediaWiki
Jump to navigation Jump to search

ASCII: Code Chart, Control Codes & End-of-Line Conventions

This page explains the ASCII standard — how it encodes characters as numbers, the role of control codes, and how text structure (like newlines) is represented.

What is ASCII

ASCII (American Standard Code for Information Interchange) was developed in the early 1960s to unify how computers represent text. Before ASCII, each manufacturer used its own incompatible encoding.

ASCII defines a mapping between numbers (0–127) and symbols:

  • Letters A–Z and a–z
  • Digits 0–9
  • Punctuation
  • Control characters (for formatting and transmission)
H e l l o ,   w o r l d !
72 101 108 108 111 44 32 119 111 114 108 100 33

Each character corresponds to a 7-bit value, fitting conveniently in one byte with the highest bit unused.

Structure of the ASCII table

Range Type Examples
0–31 Control codes NUL, LF, CR, TAB
32–47 Punctuation Space, !"#$%&'()*+,-./
48–57 Digits 0–9
58–64 Punctuation :;<=>?@
65–90 Uppercase letters A–Z
91–96 Symbols [\]^_`
97–122 Lowercase letters a–z
123–126 Symbols }~
127 Control DEL

Control characters

ASCII uses the range 0–31 and 127 for non-printable control codes that manage text flow and communication. Common ones still relevant today:

Code | Abbrev | Meaning | Typical use


| ------- | -------- | -----------

8 | BS | Backspace | Move cursor one position left 9 | HT | Horizontal Tab | Indentation or alignment 10 | LF | Line Feed | Move to next line (Unix newline) 13 | CR | Carriage Return | Move to line start (Mac classic) 3 | ETX | End of Text | Interrupt signal (Ctrl-C) 4 | EOT | End of Transmission | End of input (Ctrl-D on Unix) 26 | SUB | Substitute | End of input on Windows (Ctrl-Z)

End-of-line conventions

Different systems use different characters to indicate line endings:

  • Unix/Linux/macOS: LF (Line Feed, code 10)
  • Old Mac systems: CR (Carriage Return, code 13)
  • Windows: CR LF (Carriage Return + Line Feed, codes 13 and 10)

Example:

Unix:    Hello\nWorld
Windows: Hello\r\nWorld
Mac OS 9: Hello\rWorld

Many modern editors and libraries handle all formats transparently, but mixed line endings can cause parsing errors.

Historical background: “Carriage Return”

The term originates from typewriters:

  • The **carriage** held the paper and moved horizontally while typing.
  • Returning the carriage (CR) moved it back to the left margin.
  • Feeding the line (LF) rolled the paper up one line.

Teletypes, which inspired early terminals, used the same terminology, which carried over into ASCII and computing.

Limitations of ASCII

  • Only 128 symbols — suitable for English, but insufficient for accented or non-Latin scripts.
  • Cannot represent characters like ä, é, ñ, ß, α, or й.
  • Led to the creation of **extended ASCII** encodings (ISO-8859 series, CP1252, etc.), where values 128–255 were used for regional symbols.

Why ASCII still matters

  • ASCII forms the foundation of modern text encoding — all other encodings (UTF-8, Unicode) include it as a subset.
  • ASCII characters (0–127) have the same numeric values in UTF-8.
  • Most programming languages and file formats assume UTF-8 or ASCII-compatible encodings.

Summary

  • ASCII maps 128 characters to numbers 0–127.
  • Control codes (0–31, 127) manage structure and transmission.
  • Text files differ in line endings across systems.
  • ASCII remains the core subset of Unicode and UTF-8.
  • Understanding ASCII is essential for debugging, protocol analysis, and encoding conversions.