Strings in Assembly: Buffers, Length with $ and equ, Concatenation

From MediaWiki
Jump to navigation Jump to search

Strings in Assembly (Buffers, Length with $, equ, Concatenation)

Strings in assembly language are simply sequences of bytes in memory. Each byte may represent a character depending on the encoding (commonly ASCII or UTF-8).

What Is a String in Assembly?

A string is not a special data type — it is an array of bytes. The assembler does not automatically track the string’s length or add an end marker.

Example:

SECTION .data
Msg: db "Eat at Joe’s!", 10     ; define bytes for text plus newline

Each character occupies one byte. The `10` adds a line feed (LF, '\n') at the end.

Important points:

  • No automatic end-of-string marker (unlike C’s `0x00`).
  • Length must be tracked manually or computed.
  • Each byte can be accessed by its address in memory.

Defining Strings

You can define strings using `db`:

db "Hello, world!", 10
db 'A', 'B', 'C', 0

You can also mix numeric and text data:

db "A=", 65, 10

Uninitialized String Buffers

In the **.bss** section, you can reserve space for strings without giving them content yet.

Example:

SECTION .bss
Buffer: resb 32       ; reserve 32 bytes for a string

The memory is allocated but initialized to zero at runtime.

Calculating String Length

Assembly provides a powerful special symbol `$`, meaning “the current address during assembly.”

If you subtract two labels, NASM computes the byte distance between them — useful for string lengths.

Example:

SECTION .data
Msg: db "Eat at Joe’s!"
MsgLen: equ $ - Msg    ; calculates the length automatically

Equivalent to `mov rdx, 13`, but automatically updates if the string changes.

The equ Directive

`equ` assigns a symbolic name to a value. Used with `$`, it allows you to automatically define a string length constant.

Example:

Text: db "Assembly Rocks!", 10
TextLen: equ $ - Text

Later in your code:

mov rdx, TextLen

String Concatenation

Multiple strings or values can be joined using commas in one `db` statement.

Examples:

db "Eat at Joe’s!", 10
db "Come again!", 10

Or combined:

TwoLine: db "Eat at Joe’s ...", 10, "... tonight!", 10

Output:

Eat at Joe’s ...
... tonight!

Strings and Encodings

Remember that strings are just bytes. How they are interpreted depends on the encoding (e.g., ASCII, UTF-8). Always ensure your output mechanism (syscall, terminal, etc.) expects the same encoding.

Example Program

SECTION .data
Greeting: db "Hello, world!", 10
GreetLen: equ $ - Greeting
SECTION .text
global _start
_start:
    mov rax, 1       ; sys_write
    mov rdi, 1       ; stdout
    mov rsi, Greeting
    mov rdx, GreetLen
    syscall
    mov rax, 60      ; sys_exit
    xor rdi, rdi
    syscall

Summary

  • Strings are arrays of bytes — no null terminator by default.
  • `$ - Label` gives the number of bytes (string length).
  • `equ` defines constants, such as lengths.
  • Strings can be split or concatenated using commas.
  • Always know your encoding when printing or processing strings.