Character Encoding Conversion Techniques in C++

// Convert UTF-8 to GB2312 char* ConvertUTF8ToGB(const char* utfInput) { int bufferSize = MultiByteToWideChar(CP_UTF8, 0, utfInput, -1, NULL, 0); wchar_t* wideBuffer = new wchar_t[bufferSize+1]; memset(wideBuffer, 0, (bufferSize+1)*sizeof(wchar_t)); MultiByteToWideChar(CP_UTF8, 0, utfInput, -1, wideBuffer, bufferSize); ...

Posted on Mon, 22 Jun 2026 17:59:29 +0000 by Reformed

Decoding CJK String Length and Display Width in Java

Java stores strings internally using UTF-16 encoding, where String.length() returns the number of 16-bit code units. While most common CJK characters occupy a single code unit, their visual representaiton in monospaced terminals or grid interfaces typically spans two horizontal cells. Consequently, one Chinese character functions as the equival ...

Posted on Fri, 05 Jun 2026 18:01:01 +0000 by asolell

Java Basics: Characters and Strings

The char data type is used to represent a single character. Character literals are enclosed in single quotes. ``` char a = 'A'; char b = '4'; char c = '\u041'; // Unicode for A String literals must be enclosed in double quotes, while character literals are single characters enclosed in single quotes. Thus, "A" is a string, and 'A' is ...

Posted on Fri, 15 May 2026 15:18:46 +0000 by davey_b_