What Is ASCII?
ASCII (American Standard Code for Information Interchange) is a character encoding standard that assigns numeric values to letters, digits, punctuation marks and control characters. First published in 1963 by the American National Standards Institute (ANSI), ASCII remains the foundation of virtually every modern text encoding system, including UTF-8, which is backward-compatible with ASCII for the first 128 code points.
The standard ASCII table contains 128 characters numbered 0 through 127. Each character can be represented using 7 bits of binary data, which is why early computer systems used 7-bit data words. The 8th bit was often used as a parity bit for error detection. When 8-bit bytes became standard, the upper half (128–255) was used for various “extended ASCII” character sets such as ISO 8859-1 (Latin-1) and Windows-1252.
ASCII Table Structure
The 128 ASCII characters fall into three logical groups:
Control Characters (0–31 and 127)
The first 32 codes (0–31) and code 127 (DEL) are non-printable control characters. They were originally designed to control teletypewriters and data transmission equipment. Some remain critically important today:
- 0 (NUL) — The null character, used as a string terminator in C and many other programming languages.
- 9 (HT) — Horizontal Tab, the familiar Tab key on your keyboard.
- 10 (LF) — Line Feed, the newline character on Unix and Linux systems (
\n). - 13 (CR) — Carriage Return, used together with LF on Windows (
\r\n) for line endings. - 27 (ESC) — Escape, used as the start of ANSI escape sequences for terminal colors and formatting.
- 127 (DEL) — Delete. On paper tape systems, punching all holes (1111111 in binary) would delete a character.
Printable Characters (32–126)
Codes 32 through 126 are the printable characters that make up the visible text you read every day:
- 32 — The Space character (often overlooked, but it is an actual character with a code).
- 48–57 — Digits 0 through 9. Notice that the digit
0has ASCII value 48, not 0. This distinction is a common source of bugs in C programs. - 65–90 — Uppercase letters A through Z. The difference between uppercase and lowercase is exactly 32 (one bit flip: bit 5), a deliberate design choice that makes case conversion efficient.
- 97–122 — Lowercase letters a through z.
- 33–47, 58–64, 91–96, 123–126 — Punctuation and symbols such as
!,@,#,$,%,^,&,*, brackets, braces, and more.
Extended ASCII (128–255)
The term “Extended ASCII” refers to 8-bit character sets that use the upper 128 positions (128–255) for additional characters. There is no single “extended ASCII” standard — multiple encoding schemes exist, including ISO 8859-1 (Latin-1) for Western European languages and Windows-1252 (a superset of Latin-1 commonly used on Windows). These extensions include accented characters like é, ü, and ñ, as well as currency symbols, mathematical operators and box-drawing characters.
Number Representations
Every ASCII character can be expressed in multiple number bases:
- Decimal (base 10) — The most human-friendly form. The letter ‘A’ is 65 in decimal.
- Hexadecimal (base 16) — Compact and byte-aligned. ‘A’ is 0x41. Hex is the most common notation in programming contexts.
- Octal (base 8) — Used historically in Unix file permissions and some C escape sequences (
\101for ‘A’). - Binary (base 2) — The raw bit pattern. ‘A’ is 01000001. Standard ASCII uses 7 bits; the 8th bit is zero for all standard characters.
ASCII in Programming
Understanding ASCII values is fundamental in many programming scenarios:
- String comparison — When you sort strings “alphabetically” in most programming languages, you are actually sorting by ASCII/Unicode code points. This means uppercase letters come before lowercase (‘Z’ = 90 < ‘a’ = 97), which can lead to unexpected sort orders.
- Character arithmetic — In C, Java and many other languages, you can perform arithmetic on characters:
'A' + 1 == 'B','a' - 'A' == 32. This is used extensively in algorithms for case conversion, Caesar ciphers, and digit-to-integer conversion (ch - '0'). - Input validation — Checking whether a character is a digit (
ch >= 48 && ch <= 57) or a letter is a common operation in parsers and validators. - Escape sequences — Control characters like tab (
\t, ASCII 9), newline (\n, ASCII 10), and carriage return (\r, ASCII 13) are used in every programming language for formatting text output.
ASCII vs Unicode vs UTF-8
ASCII handles only 128 characters, which is sufficient for English text but inadequate for the world’s diverse writing systems. Unicode was created to solve this problem by assigning a unique code point to every character in every script — over 149,000 characters as of Unicode 15.1.
UTF-8 is the dominant encoding for Unicode text. It is a variable-width encoding that uses 1 to 4 bytes per character. The crucial design decision of UTF-8 is that the first 128 characters (code points U+0000 through U+007F) are encoded identically to ASCII, using a single byte. This means all existing ASCII text is automatically valid UTF-8, which is why the transition from ASCII to UTF-8 was so smooth.
When working with modern software, you are almost certainly using UTF-8 encoding. But since UTF-8 is backward-compatible with ASCII, understanding the ASCII table remains essential — the first 128 code points of Unicode are the ASCII table.
Common ASCII Values Worth Memorizing
Experienced programmers often have a few key ASCII values committed to memory:
- 0 — NUL (null terminator)
- 10 — LF (newline on Unix/Mac)
- 13 — CR (carriage return, part of Windows newline)
- 32 — Space
- 48 — ‘0’ (start of digits)
- 65 — ‘A’ (start of uppercase)
- 97 — ‘a’ (start of lowercase)
- 127 — DEL (delete)
Knowing that uppercase and lowercase letters differ by exactly 32 (binary: one bit flip) is particularly useful. To convert ‘A’ to ‘a’, add 32. To convert ‘a’ to ‘A’, subtract 32. In binary terms, you toggle bit 5 (ch ^ 0x20).
Frequently Asked Questions
What is the difference between ASCII and Unicode?
ASCII defines 128 characters using 7 bits. Unicode defines over 149,000 characters from all the world’s writing systems. The first 128 Unicode code points (U+0000 to U+007F) are identical to ASCII. UTF-8, the most common Unicode encoding, encodes these first 128 characters in a single byte, making it backward-compatible with ASCII.
Why does ASCII start at 0 and not 1?
ASCII code 0 (NUL) was intentionally included as a “no operation” character for paper tape and data transmission systems. The NUL character could be used to fill gaps or serve as padding without affecting the data. In C programming, NUL (code 0) serves as the string terminator, marking the end of a null-terminated string.
Why is the Space character code 32?
The ASCII table was designed so that control characters occupy positions 0–31 and printable characters start at 32. Space is the first printable character. This layout allows software to easily distinguish control characters from printable characters by checking if the code is below 32.
How do I find the ASCII code of a character?
Use the search box in the ASCII table above to look up any character. In most programming languages, you can also get the code directly: “A”.charCodeAt(0) in JavaScript returns 65, ord(“A”) in Python returns 65, and (int)’A’ in C/Java gives 65.
What is extended ASCII?
Extended ASCII refers to any 8-bit encoding that uses codes 128–255 for additional characters beyond the standard 128. There is no single “extended ASCII” standard — multiple incompatible encodings exist (ISO 8859-1, Windows-1252, etc.). This ambiguity is one of the reasons Unicode was created to provide a single, universal character set.