Strings in Assembly: Buffers, Length with $ and equ, Concatenation
Strings in Assembly (Buffers, Length with $, equ, Concatenation)
Strings in assembly language are simply sequences of bytes in memory. Each byte may represent a character depending on the encoding (commonly ASCII or UTF-8).
What Is a String in Assembly?
A string is not a special data type — it is an array of bytes. The assembler does not automatically track the string’s length or add an end marker.
Example:
SECTION .data Msg: db "Eat at Joe’s!", 10 ; define bytes for text plus newline
Each character occupies one byte. The `10` adds a line feed (LF, '\n') at the end.
Important points:
- No automatic end-of-string marker (unlike C’s `0x00`).
- Length must be tracked manually or computed.
- Each byte can be accessed by its address in memory.
Defining Strings
You can define strings using `db`:
db "Hello, world!", 10 db 'A', 'B', 'C', 0
You can also mix numeric and text data:
db "A=", 65, 10
Uninitialized String Buffers
In the **.bss** section, you can reserve space for strings without giving them content yet.
Example:
SECTION .bss Buffer: resb 32 ; reserve 32 bytes for a string
The memory is allocated but initialized to zero at runtime.
Calculating String Length
Assembly provides a powerful special symbol `$`, meaning “the current address during assembly.”
If you subtract two labels, NASM computes the byte distance between them — useful for string lengths.
Example:
SECTION .data Msg: db "Eat at Joe’s!" MsgLen: equ $ - Msg ; calculates the length automatically
Equivalent to `mov rdx, 13`, but automatically updates if the string changes.
The equ Directive
`equ` assigns a symbolic name to a value. Used with `$`, it allows you to automatically define a string length constant.
Example:
Text: db "Assembly Rocks!", 10 TextLen: equ $ - Text
Later in your code:
mov rdx, TextLen
String Concatenation
Multiple strings or values can be joined using commas in one `db` statement.
Examples:
db "Eat at Joe’s!", 10 db "Come again!", 10
Or combined:
TwoLine: db "Eat at Joe’s ...", 10, "... tonight!", 10
Output:
Eat at Joe’s ... ... tonight!
Strings and Encodings
Remember that strings are just bytes. How they are interpreted depends on the encoding (e.g., ASCII, UTF-8). Always ensure your output mechanism (syscall, terminal, etc.) expects the same encoding.
Example Program
SECTION .data Greeting: db "Hello, world!", 10 GreetLen: equ $ - Greeting
SECTION .text
global _start
_start:
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, Greeting
mov rdx, GreetLen
syscall
mov rax, 60 ; sys_exit
xor rdi, rdi
syscall
Summary
- Strings are arrays of bytes — no null terminator by default.
- `$ - Label` gives the number of bytes (string length).
- `equ` defines constants, such as lengths.
- Strings can be split or concatenated using commas.
- Always know your encoding when printing or processing strings.