如何将UTF-16转换为UTF-32并在C中打印生成的wchar_t?

我正在尝试打印出一串UTF-16字符。 我暂时发布了这个问题,给出的建议是使用iconv转换为UTF-32并将其打印为一串wchar_t。

我做了一些研究,并成功编写了以下代码:

// *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; char out_buf[sz * 2]; char* out; size_t out_sz; icv = iconv_open("UTF-32", "UTF-16"); memcpy(in_buf, c, sz); in = in_buf; in_sz = sz; out = out_buf; out_sz = sz * 2; size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); printf("ret = %d\n", ret); printf("*** %ls ***\n", ((wchar_t*) out_buf)); 

iconv调用总是返回0,所以我猜转换应该没问题?

但是,印刷似乎很受欢迎。 有时,转换后的wchar_t字符串打印正常。 其他时候,它似乎在打印wchar_t时遇到问题,并且完全终止printf函数调用,使得即使是尾随的“***”也不会被打印。

我也试过用

 wprintf(((wchar_t*) "*** %ls ***\n"), out_buf)); 

但什么都没有打印出来。

我错过了什么吗?

参考: 如何在C中打印UTF-16字符?

UPDATE

在评论中纳入了一些建议。

更新的代码:

 // *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; wchar_t out_buf[sz / 2]; char* out; size_t out_sz; icv = iconv_open("UTF-32", "UTF-16"); memcpy(in_buf, c, sz); in = in_buf; in_sz = sz; out = (char*) out_buf; out_sz = sz * 2; size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); printf("ret = %d\n", ret); printf("*** %ls ***\n", out_buf); wprintf(L"*** %ls ***\n", out_buf); 

仍然是相同的结果,并非所有UTF-16字符串都被打印(printf和wprintf)。

我还能错过什么?

顺便说一下,我正在使用Linux,并且已经validationwchar_t是4个字节。

这是一个简短的程序,它将UTF-16转换为宽字符数组,然后将其打印出来。

 #include  #include  #include  #include  #include  #include  #include  #define FROMCODE "UTF-16" #if (BYTE_ORDER == LITTLE_ENDIAN) #define TOCODE "UTF-32LE" #elif (BYTE_ORDER == BIG_ENDIAN) #define TOCODE "UTF-32BE" #else #error Unsupported byte order #endif int main(void) { void *tmp; char *outbuf; const char *inbuf; long converted = 0; wchar_t *out = NULL; int status = EXIT_SUCCESS, n; size_t inbytesleft, outbytesleft, size; const char in[] = { 0xff, 0xfe, 'H', 0x0, 'e', 0x0, 'l', 0x0, 'l', 0x0, 'o', 0x0, ',', 0x0, ' ', 0x0, 'W', 0x0, 'o', 0x0, 'r', 0x0, 'l', 0x0, 'd', 0x0, '!', 0x0 }; iconv_t cd = iconv_open(TOCODE, FROMCODE); if ((iconv_t)-1 == cd) { if (EINVAL == errno) { fprintf(stderr, "iconv: cannot convert from %s to %s\n", FROMCODE, TOCODE); } else { fprintf(stderr, "iconv: %s\n", strerror(errno)); } goto error; } size = sizeof(in) * sizeof(wchar_t); inbuf = in; inbytesleft = sizeof(in); while (1) { tmp = realloc(out, size + sizeof(wchar_t)); if (!tmp) { fprintf(stderr, "realloc: %s\n", strerror(errno)); goto error; } out = tmp; outbuf = (char *)out + converted; outbytesleft = size - converted; n = iconv(cd, (char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft); if (-1 == n) { if (EINVAL == errno) { /* junk at the end of the buffer, ignore it */ break; } else if (E2BIG != errno) { /* unrecoverable error */ fprintf(stderr, "iconv: %s\n", strerror(errno)); goto error; } /* increase the size of the output buffer */ converted = size - outbytesleft; size <<= 1; } else { /* done */ break; } } converted = (size - outbytesleft) / sizeof(wchar_t); out[converted] = L'\0'; fprintf(stdout, "%ls\n", out); /* flush the iconv buffer */ iconv(cd, NULL, NULL, &outbuf, &outbytesleft); exit: if (out) { free(out); } if (cd) { iconv_close(cd); } exit(status); error: status = EXIT_FAILURE; goto exit; } 

由于UTF-16是一种可变长度编码,因此您猜测输出缓冲区需要多大。 正确的程序应该处理输出缓冲区不足以容纳转换数据的情况。

您还应注意iconv不为NULL您输出缓冲区。

Iconv是面向流的处理器,因此如果要将其重新用于另一次转换,则需要刷新iconv_t (示例代码在接近结束时执行此操作)。 如果你想进行流处理,你将处理EINVAL错误,将输入缓冲区中剩余的任何字节复制到新输入缓冲区的开头,然后再次调用iconv