@maxim2511
I think mixing Win32 API functions and C/C++ Runtime functions like this can give "unexpected" results 😨
That's because the Win32 API
SetConsoleOutputCP(CP_UTF8)
only tells the terminal to interpret the given bytes as UTF-8, but there is absolutely
no guarantee that C/C++ functions like
std::cout
,
std::wcout
,
puts()
,
putws()
,
printf()
or
wprintf()
will actually write UTF-8 data to the
stdout. Also, do
not rely on any of those C/C++ functions to pass through the given string to
stdout in a "1:1" fashion!
Spoiler: They don't always do 😕
So, here is the proper pure Win32 API (
no CRT) solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
#include <Windows.h>
int main()
{
const wchar_t *const text = L"\x0393\x03b9\x03ac\x03bd\x03b7\x03c2 \x0392\x03b1\x03c1\x03bf\x03c5\x03c6\x03ac\x03ba\x03b7\x03c2"; //Γιάνης Βαρουφάκης
SetConsoleOutputCP(CP_UTF8);
char *utf8_text = utf16_to_utf8(text);
if (utf8_text)
{
DWORD written;
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), utf8_text, lstrlenA(utf8_text), &written, NULL);
LocalFree(utf8_text);
}
}
| |
We use a little helper function to convert from UTF-16 (wchar_t) to UTF-8:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
char* utf16_to_utf8(const wchar_t *const input)
{
int buff_size = WideCharToMultiByte(CP_UTF8, 0, input, -1, NULL, 0, NULL, NULL);
if (buff_size > 0)
{
char *buffer = (char*)LocalAlloc(LPTR, buff_size);
if (buffer)
{
int result = WideCharToMultiByte(CP_UTF8, 0, input, -1, buffer, buff_size, NULL, NULL);
if ((result > 0) && (result <= buff_size))
{
return buffer;
}
LocalFree(buffer);
}
}
return NULL;
}
| |
You can also write UTF-16 to the terminal directly via
WriteConsoleW()
, but
stdout-redirection to a file will
not work this way:
1 2 3 4 5 6
|
int main()
{
const wchar_t *const text = L"\x0393\x03b9\x03ac\x03bd\x03b7\x03c2 \x0392\x03b1\x03c1\x03bf\x03c5\x03c6\x03ac\x03ba\x03b7\x03c2"; //Γιάνης Βαρουφάκης
DWORD written;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), text, lstrlenW(text), &written, NULL);
}
| |
In fact, I would recommend something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
BOOL write_text(HANDLE handle, const wchar_t *const text)
{
BOOL result = FALSE;
DWORD written;
if (GetFileType(handle) == FILE_TYPE_CHAR) // <-- is our handle connected to a terminal?
{
result = WriteConsoleW(handle, text, lstrlenW(text), &written, NULL);
}
else
{
char *const utf8_text = utf16_to_utf8(text);
if (utf8_text)
{
result = WriteFile(handle, utf8_text, lstrlenA(utf8_text), &written, NULL);
LocalFree(utf8_text);
}
}
return result;
}
| |
Alternatively, here is the pure C/C++ solution (using some MSVC-specific extensions):
1 2 3 4 5 6 7 8 9 10 11 12
|
#include <cstdio>
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main()
{
const wchar_t *const text = L"\x0393\x03b9\x03ac\x03bd\x03b7\x03c2 \x0392\x03b1\x03c1\x03bf\x03c5\x03c6\x03ac\x03ba\x03b7\x03c2"; //Γιάνης Βαρουφάκης
_setmode(_fileno(stdout), _O_U8TEXT); // <-- note: you could use _O_U16TEXT instead here
_putws(text);
std::wcout << text << std::endl;
}
| |
Yes, functions like
std::wcout
or
putws()
implicitly convert from UTF-16 to UTF-8, if the file mode was set accordingly!
Note: I have escaped all non-ASCII characters in the string literal, because otherwise there is yet another pitfall – the character encoding in which the C/C++ source code file is stored, vs. the character encoding that the C/C++ compiler assumes 🙄
(As others have pointed out, all of this can still be screwed up, if the selected console font doesn't support the required characters!)