Escape sequences
Escape sequences are used to represent certain special characters within string literals and character literals.
The following escape sequences are available:
Escape sequence |
Description | Representation |
---|---|---|
Simple escape sequences | ||
\'
|
single quote | byte 0x27 in ASCII encoding
|
\"
|
double quote | byte 0x22 in ASCII encoding
|
\?
|
question mark | byte 0x3f in ASCII encoding
|
\\
|
backslash | byte 0x5c in ASCII encoding
|
\a
|
audible bell | byte 0x07 in ASCII encoding
|
\b
|
backspace | byte 0x08 in ASCII encoding
|
\f
|
form feed - new page | byte 0x0c in ASCII encoding
|
\n
|
line feed - new line | byte 0x0a in ASCII encoding
|
\r
|
carriage return | byte 0x0d in ASCII encoding
|
\t
|
horizontal tab | byte 0x09 in ASCII encoding
|
\v
|
vertical tab | byte 0x0b in ASCII encoding
|
Numeric escape sequences | ||
\nnn
|
arbitrary octal value | byte nnn
|
\xnn
|
arbitrary hexadecimal value | byte nn
|
Conditional escape sequences[1] | ||
\c
|
Implementation-defined | Implementation-defined |
Universal character names | ||
\unnnn
|
arbitrary Unicode value; may result in several code units |
code point U+nnnn
|
\Unnnnnnnn
|
arbitrary Unicode value; may result in several code units |
code point U+nnnnnnnn
|
- ↑ Conditional escape sequences are conditionally-supported. The character
c
in each conditional escape sequence is a member of basic source character set (until C++23)basic character set (since C++23) that is not the character following the\
in any other escape sequence.
Range of universal character names
If a universal character name corresponds to a code point that is not 0x24 ( |
(until C++11) |
If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed. If a universal character name corresponds surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed. If a universal character name used in a UTF-16/32 string literal does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive), the program is ill-formed. |
(since C++11) (until C++20) |
If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed. If a universal character name does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive) or corresponds to a surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed. |
(since C++20) (until C++23) |
If a universal character name corresponding to a scalar value of a character in the basic character set or a control character appear outside a character or string literal, the program is ill-formed. If a universal character name does not correspond to a scalar value of a character in the translation character set, the program is ill-formed. |
(since C++23) |
Notes
\0 is the most commonly used octal escape sequence, because it represents the terminating null character in null-terminated strings.
The new-line character \n has special meaning when used in text mode I/O: it is converted to the OS-specific newline representation, usually a byte or byte sequence. Some systems mark their lines with length fields instead.
Octal escape sequences have a limit of three octal digits, but terminate at the first character that is not a valid octal digit if encountered sooner.
Hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (char, char8_t (since C++20), char16_t, char32_t (since C++11), or wchar_t), the result is unspecified.
A universal character name in a narrow string literal or a 16-bit string literal may map to more than one code unit, e.g. \U0001f34c is 4 char code units in UTF-8 (\xF0\x9F\x8D\x8C) and 2 char16_t code units in UTF-16 (\xD83C\xDF4C). |
(since C++11) |
The question mark escape sequence \? is used to prevent trigraphs from being interpreted inside string literals: a string such as "??/" is compiled as "\", but if the second question mark is escaped, as in "?\?/", it becomes "??/". As trigraphs have been removed from C++, the question mark escape sequence is no longer necessary. It is preserved for compatibility with C++14 (and former revisions) and C. (since C++17)
Example
#include <iostream> int main() { std::cout << "This\nis\na\ntest\n\nShe said, \"Sells she seashells on the seashore?\"\n"; }
Output:
This is a test She said, "Sells she seashells on the seashore?"
Defect Reports
The following behavior-changing defect reports were applied retroactively to previously published C++ standards.
DR | Applied to | Behavior as published | Correct behavior |
---|---|---|---|
CWG 505 | C++98 | the behavior was undefined if the character following a backslash was not one of those specified in the table |
made conditionally supported (semantic is implementation-defined) |