Tag: utf 8

我应该从Windows代码中删除TCHAR吗？: 我正在修改一些非常古老的（10年）C代码。该代码在Unix / Mac上使用GCC进行编译，并使用MinGW对Windows进行交叉编译。目前整个都有TCHAR字符串。我想摆脱TCHAR并使用C ++字符串代替。是否仍然需要使用Windows范围的function，或者我现在可以使用Unicode和UTF-8完成所有操作吗？

C：确定UTF-8字符串中UTF-16字符串需要多少字节的最有效方法: 我已经看到一些非常聪明的代码用于在Unicode代码点和UTF-8之间进行转换，所以我想知道是否有人（或者会喜欢设计）这个。给定UTF-8字符串，相同字符串的UTF-16编码需要多少字节。假设UTF-8字符串已经过validation。它没有BOM，没有超长序列，没有无效序列，是以空值终止的。它不是CESU-8 。必须支持带代理的完整UTF-16。具体来说，我想知道是否有快捷方式可以在不完全将UTF-8序列转换为代码点的情况下知道何时需要代理对。我见过的最好的UTF-8代码点代码使用了矢量化技术，所以我想知道这是否也可以。

将unicode代码点转换为UTF-8的最简单方法: 在C中将Unicode代码点转换为UTF-8字节序列的最简单方法是什么？想到的唯一方法是使用iconv从UTF-32LE代码页映射到UTF-8，但这看起来有点过分。

如何“解码”UTF-8角色？: 我们假设我想编写一个函数来比较两个Unicode字符。我该怎么做？我读了一些文章（像这样），但仍然没有。我们以€作为输入。它在0x0800和0xFFFF范围内，因此它将使用3个字节对其进行编码。我该如何解码呢？按位操作从wchar_t获取3个字节并存储到3个char ？ C中的示例中的代码可能很棒。这是我的C代码“解码”，但显然错误的值解码unicode … #include #include void printbin(unsigned n); int length(wchar_t c); void print(struct Bytes *b); // support for UTF8 which encodes up to 4 bytes only struct Bytes { char v1; char v2; char v3; char v4; }; int main(void) { struct Bytes bytes = { […]

如何在C代码中使用UTF-8？: 我的设置：gcc-4.9.2，UTF-8环境。以下C程序以ASCII格式运行，但不以UTF-8格式运行。创建输入文件： echo -n ‘привет мир’ > /tmp/вход 这是test.c： #include #include #include #define SIZE 10 int main(void) { char buf[SIZE+1]; char *pat = “привет мир”; char str[SIZE+2]; FILE *f1; FILE *f2; f1 = fopen(“/tmp/вход”,”r”); f2 = fopen(“/tmp/выход”,”w”); if (fread(buf, 1, SIZE, f1) > 0) { buf[SIZE] = 0; if (strncmp(buf, pat, SIZE) == 0) […]

如何SubString，限制使用C？: 第1号 #include #include #include #include int main(int argc, char **argv) { static const unsigned char text[] = “000ßh123456789”; int32_t current=1; int32_t text_len = strlen(text)-1; ///////////////////////////////// printf(“Result : %s\n”,text); ///////////////////////////////// printf(“Lenght : %d\n”,text_len); ///////////////////////////////// printf(“Index0 : %c\n”,text[0]); printf(“Index1 : %c\n”,text[1]); printf(“Index2 : %c\n”,text[2]); printf(“Index3 : %c\n”,text[3]);//==> why show this ` `? printf(“Index4 : %c\n”,text[4]);//==> why show […]

是否（不）可以在C源中使用特殊字符？: 在某种程度上，我想在C项目的函数名中使用µ字符。这是不可能的吗？我得到的错误就像 error: stray ‘\302’ in program 我尝试添加选项： -fexec-charset=UTF-8 -finput-charset=UTF-8 到我的构建脚本，但我不能理解那些启用。我正在运行这个版本的gcc： arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]

使用iconv进行简单的UTF8-> UTF16字符串转换: 我想写一个函数将UTF8字符串转换为UTF16（little-endian）。问题是， iconv函数似乎不会让您事先知道存储输出字符串需要多少字节。我的解决方案是首先分配2*strlen(utf8) ，然后在循环中运行iconv ，必要时使用realloc增加该缓冲区的大小： static int utf8_to_utf16le(char *utf8, char **utf16, int *utf16_len) { iconv_t cd; char *inbuf, *outbuf; size_t inbytesleft, outbytesleft, nchars, utf16_buf_len; cd = iconv_open(“UTF16LE”, “UTF8”); if (cd == (iconv_t)-1) { printf(“!%s: iconv_open failed: %d\n”, __func__, errno); return -1; } inbytesleft = strlen(utf8); if (inbytesleft == 0) { printf(“!%s: empty string\n”, __func__); […]

C ++中的Unicode问题，但不是C: 我正在尝试在Windows上用C ++编写unicode字符串到屏幕上。我将控制台字体更改为Lucida Console ，并将输出设置为CP_UTF8即65001。我运行以下代码： #include //notice this header file.. #include #include int main() { SetConsoleOutputCP(CP_UTF8); const char text[] = “Россия”; printf(“%s\n”, text); } 打印出来就好了！但是，如果我这样做： #include //the C++ version of the header.. #include #include int main() { SetConsoleOutputCP(CP_UTF8); const char text[] = “Россия”; printf(“%s\n”, text); } 它打印：我不知道为什么.. 另一件事是我做的时候： #include #include int main() […]

unicode你好世界的C？: 我想从C输出像안，蠀，things这样的东西 #include int main() { fwprintf(stdout, L”안, 蠀, ☃\n”); return 0; } 输出是？，？，？如何打印这些字符？编辑： #include #include int main() { setlocale(LC_CTYPE, “”); fwprintf(stdout, L”안, 蠀, ☃\n”); return 0; } 这样做了。输出是안，蠀，☃。除了中文字符和雪人在我的urxvt中显示为框，可能是因为我没有启用这些区域设置。 $ locale -a C en_US en_US.iso88591 en_US.iso885915 en_US.utf8 ja_JP.utf8 ko_KR ko_KR.euckr ko_KR.utf8 korean korean.euc POSIX zh_CN.utf8 我必须另外启用哪个区域设置才能显示中文字符和雪人？也许我需要字体？以上程序将在Windows上运行吗？