mblen
Defined in header <stdlib.h>
|
||
int mblen( const char* s, size_t n ); |
||
Determines the size, in bytes, of the multibyte character whose first byte is pointed to by s
.
If s
is a null pointer, resets the global conversion state and (until C23) determined whether shift sequences are used.
This function is equivalent to the call mbtowc((wchar_t*)0, s, n), except that conversion state of mbtowc is unaffected.
Parameters
s | - | pointer to the multibyte character |
n | - | limit on the number of bytes in s that can be examined |
Return value
If s
is not a null pointer, returns the number of bytes that are contained in the multibyte character or -1 if the first bytes pointed to by s
do not form a valid multibyte character or 0 if s
is pointing at the null charcter '\0'.
If s
is a null pointer, resets its internal conversion state to represent the initial shift state and (until C23) returns 0 if the current multibyte encoding is not state-dependent (does not use shift sequences) or a non-zero value if the current multibyte encoding is state-dependent (uses shift sequences).
Notes
Each call to |
(until C23) |
|
(since C23) |
Example
#include <string.h> #include <stdlib.h> #include <locale.h> #include <stdio.h> // the number of characters in a multibyte string is the sum of mblen()'s // note: the simpler approach is mbstowcs(NULL, str, sz) size_t strlen_mb(const char* ptr) { size_t result = 0; const char* end = ptr + strlen(ptr); mblen(NULL, 0); // reset the conversion state while(ptr < end) { int next = mblen(ptr, end - ptr); if(next == -1) { perror("strlen_mb"); break; } ptr += next; ++result; } return result; } void dump_bytes(const char* str) { const char* end = str + strlen(str); for (; str != end; ++str) { printf("%02X ", (unsigned char)str[0]); } printf("\n"); } int main(void) { setlocale(LC_ALL, "en_US.utf8"); const char* str = "z\u00df\u6c34\U0001f34c"; printf("The string \"%s\" consists of %zu characters, but %zu bytes: ", str, strlen_mb(str), strlen(str)); dump_bytes(str); }
Possible output:
The string "zß水🍌" consists of 4 characters, but 10 bytes: 7A C3 9F E6 B0 B4 F0 9F 8D 8C
References
- C17 standard (ISO/IEC 9899:2018):
- 7.22.7.1 The mblen function (p: 260)
- C11 standard (ISO/IEC 9899:2011):
- 7.22.7.1 The mblen function (p: 357)
- C99 standard (ISO/IEC 9899:1999):
- 7.20.7.1 The mblen function (p: 321)
- C89/C90 standard (ISO/IEC 9899:1990):
- 4.10.7.1 The mblen function
See also
converts the next multibyte character to wide character (function) | |
(C95) |
returns the number of bytes in the next multibyte character, given state (function) |