Typographic Concepts: Graphemes, Glyphs, Ligatures, Fonts
Typographic Concepts: Graphemes, Glyphs, Ligatures, Fonts
Before text can be stored digitally, it must first be understood linguistically and visually. This page introduces the key typographic and linguistic terms that underlie all character encoding systems.
Grapheme
A grapheme is the smallest unit of a writing system of a given language. In computing, it roughly corresponds to what we call a **character**.
- Examples:
* “A” — Latin alphabet grapheme * “あ” — Japanese hiragana grapheme * “ß” — a single grapheme representing “ss” in German
Graphemes are logical symbols, independent of how they are drawn on screen or paper.
Glyph
A glyph is the visual representation of a grapheme — its actual shape. The same character can have many glyphs depending on the font or writing style.
- Example: The letter “A” in Arial vs Times New Roman are different glyphs but the same grapheme.
- Glyphs can differ across writing systems too — Latin “A”, Greek “Α”, and Cyrillic “А” are distinct graphemes even though they share a similar glyph shape.
Ligature
A ligature is a single glyph representing a combination of two or more graphemes. Ligatures improve visual flow and aesthetics in typography.
- Example: “fi” represents “f” + “i”
- “æ” historically represents “a” + “e”
- Some fonts use ligatures automatically when rendering text.
Homoglyphs
Homoglyphs are glyphs that look alike but correspond to different characters or code points.
- Example: Latin “A”, Greek “Α”, and Cyrillic “А” are homoglyphs.
- This visual similarity can cause confusion or even be exploited in **phishing attacks** (e.g., domain “раypal.com” using Cyrillic letters instead of Latin ones).
Font and Typeface
A **font** defines the size, weight, and style of a **typeface**. A **typeface** is a collection of glyphs that share a consistent design.
- Typeface examples: Times New Roman, Helvetica, Courier New
- Font examples:
* Helvetica Bold 12pt * Times New Roman Italic 10pt
Fonts translate the logical concept of characters (graphemes) into visual shapes (glyphs) for display or printing.
Character
The word **character** in computing is ambiguous and can mean:
- A unit of textual information (the “A” in “ABC”)
- A code point (numeric identifier in a coded character set like Unicode)
- A glyph (visual representation)
In this context, we use **character** to mean “a unit of textual data” — something that can be represented, stored, or transmitted digitally.
Summary
- A **grapheme** is the abstract letter or symbol.
- A **glyph** is the visual shape.
- A **ligature** combines multiple graphemes into one glyph.
- **Homoglyphs** look alike but are distinct characters.
- A **font** is a styled implementation of a typeface.
- Correctly distinguishing these terms is essential for understanding character encoding, rendering, and text processing.