mbrtowc() — Convert a Multibyte Character to a Wide Character (Restartable)

Format

#include <wchar.h>
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);

Language Level: ANSI

Threadsafe:: Yes, if ps is not NULL

Locale Sensitive: The behavior of this function might be affected by the LC_CTYPE category of the current locale. This function might also be affected by the LC_UNI_CTYPE category of the current locale if LOCALETYPE(*LOCALEUCS2) or LOCALETYPE(*LOCALEUTF) is specified on the compilation command. This function is not available when LOCALETYPE(*CLD) is specified on the compilation command. For more information, see Understanding CCSIDs and Locales.

Wide Character Function: See Wide Characters for more information.

Description

This function is the restartable version of the mbtowc() function.

If s is a null pointer, the mbrtowc() function determines the number of bytes necessary to enter the initial shift state (zero if encodings are not state-dependent or if the initial conversion state is described). In this situation, the value of the pwc parameter will be ignored and the resulting shift state described will be the initial conversion state.

If s is not a null pointer, the mbrtowc() function determines the number of bytes that are in the multibyte character (and any leading shift sequences) pointed to by s, produces the value of the corresponding multibyte character and if pwc is not a null pointer, stores that value in the object pointed to by pwc. If the corresponding multibyte character is the null wide character, the resulting state will be reset to the initial conversion state.

This function differs from its corresponding internal-state multibyte character function in that it has an extra parameter, ps of type pointer to mbstate_t that points to an object that can completely describe the current conversion state of the associated multibyte character sequence. If ps is NULL, this function uses an internal static variable for the state.

At most, n bytes of the multibyte string are examined.

Return Value

If s is a null pointer, the mbrtowc() function returns the number of bytes necessary to enter the initial shift state. The value returned must be less than the MB_CUR_MAX macro.

If a conversion error occurs, errno might be set to ECONVERT.

If s is not a null pointer, the mbrtowc() function returns one of the following:

0
If the next n or fewer bytes form the multibyte character that corresponds to the null wide character.
positive
If the next n or fewer bytes form a valid multibyte character. The value returned is the number of bytes that constitute the multibyte character.
(size_t)-2
If the next n bytes form an incomplete (but potentially valid) multibyte character, and all n bytes have been processed. It is unspecified whether this can occur when the value of n is less than the value of the MB_CUR_MAX macro.
(size_t)-1
If an encoding error occurs (when the next n or fewer bytes do not form a complete and correct multibyte character). The value of the macro EILSEQ is stored in errno, but the conversion state is unchanged.
Note:
When a -2 value is returned, the string could contain redundant shift-out and shift-in characters or a partial UTF-8 character. To continue processing the multibyte string, increment the pointer by the value n, and call mbrtowc() again.

Example that uses mbrtowc()

/* This program is compiled with LOCALETYPE(*LOCALE) and   */
/* SYSIFCOPT(*IFSIO)                                       */

#include  <stdio.h>
#include  <stdlib.h>
#include  <locale.h>
#include  <wchar.h>
#include  <errno.h>

#define  LOCNAME     "/qsys.lib/JA_JP.locale"
#define  LOCNAME_EN  "/qsys.lib/EN_US.locale"

int main(void)
{
    int length, sl = 0;
    char  string[10];
    wchar_t buffer[10];
    mbstate_t ps = 0;
    memset(string, '\0', 10);
    string[0] = 0xC1;
    string[1] = 0x0E;
    string[2] = 0x41;
    string[3] = 0x71;
    string[4] = 0x41;
    string[5] = 0x72;
    string[6] = 0x0F;
    string[7] = 0xC2;
    /* In this first example we will convert                  */
    /* a multibyte character when the CCSID of locale         */
    /* associated with LC_CTYPE is 37.                        */
    /* For single byte cases the state will always            */
    /* remain in the initial state  0                         */

    if (setlocale(LC_ALL, LOCNAME_EN) == NULL)
        printf("setlocale failed.\n");

    length = mbrtowc(buffer, string, MB_CUR_MAX, &ps);

    /* In this case length is 1, and C1 is converted 0x00C1   */

    printf("length = %d, state = %d\n\n", length, ps);
    printf("MB_CUR_MAX: %d\n\n", MB_CUR_MAX);

    /* Now lets try a multibyte example.  We first must set the */
    /* locale to a multibyte locale.  We choose a locale with     */
    /* CCSID 5026  */

    if (setlocale(LC_ALL, LOCNAME) == NULL)
        printf("setlocale failed.\n");

    length = mbrtowc(buffer, string, MB_CUR_MAX, &ps);

    /* The first is single byte so length is 1 and             */
    /* the state is still the initial state 0.  C1 is converted*/
    /* to 0x00C1     */

   printf("length = %d, state = %d\n\n", length, ps);
   printf("MB_CUR_MAX: %d\n\n", MB_CUR_MAX);

    sl += length;

    length = mbrtowc(&buffer[1], &string[sl], MB_CUR_MAX, &ps);

    /* The next character is a mixed byte.  Length is 3 to     */
    /* account for the shiftout 0x0e.  State is                */
    /* changed to double byte state.  0x4171 is copied into    */
    /* the buffer   */

   printf("length = %d, state = %d\n\n", length, ps);


    sl += length;

    length = mbrtowc(&buffer[2], &string[sl], MB_CUR_MAX, &ps);

    /* The next character is also a double byte character.     */
    /* The state is changed to initial state since this was   */
    /* the last double byte character.  Length is 3 to         */
    /* account for the ending 0x0f shiftin.  0x4172 is copied  */
    /* into the buffer.  */

    printf("length = %d, state = %d\n\n", length, ps);

    sl += length;

    length = mbrtowc(&buffer[3], &string[sl], MB_CUR_MAX, &ps);

    /* The next character is single byte so length is 1 and    */
    /* state remains in initial state.  0xC2 is converted to   */
    /* 0x00C2.   The buffer now has the value:                 */
    /* 0x00C14171417200C2                                      */

    printf("length = %d, state = %d\n\n", length, ps);

}
/*  The output should look like this:

length = 1, state = 0

MB_CUR_MAX: 1

length = 1, state = 0

MB_CUR_MAX: 4

length = 3, state = 2

length = 3, state = 0

length = 1, state = 0
                                   */                 

Related Information



[ Top of Page | Previous Page | Next Page | Contents | Index ]