ninja01 wrote: |
---|
This UTF-8 is really a pain, I still cannot figure it out. |
I think it's quite nice actually.
On Linux the following code just works* without me having to do anything special.
* Assuming the font used supports for these characters.
UTF-8 has become the most widely used encoding in recent year. Even this cplusplus.com HTML page is encoded in UTF-8.
On Windows I can imagine it being a bit painful, at least for "console applications". I have had no problems porting games that used SDL from Linux to Windows but that was because I read all text from UTF-8 encoded text files and used SDL functions with UTF-8 support to draw the text.
Note that I'm not necessarily talking about using char8_t/std::u8string. Those might become useful in the future but for the moment I'm ignoring their existence.
ninja01 wrote: |
---|
I thought we cannot know what char's size is |
You can use CHAR_BIT.
1 2 3 4 5 6 7
|
#include <iostream>
#include <climits>
int main()
{
std::cout << "A char is " << CHAR_BIT << " bits.\n";
}
| |
char is 8 bits on all mainstream computers. You'll need to make an effort to find a computer architecture where that is not the case. If you find something it will not be your ordinary mobile/laptop/desktop computer. It will either be something very old or something very specialized.
Note that char is the smallest possible size that an object can be. All objects is a multiple of char. If char is larger than 8 bits it would mean there are no 8-bit integers available. This would make it incompatible with a lot of existing things, or at least less efficient when it comes to handling 8-bit formats, so it would probably be hard to advertise such a product.
This might seem strange if you just think of char as a type to store characters, but despite its name it's more than that. Historically it has essentially been
the "byte" type in C and C++. There are even special rules that allow you to use char pointers to inspect the underlying data of other data types, something that is normally not allowed, and this has obviously very little to do with characters/text.
So I don't think you should feel bad if you assume char is 8 bits when writing a program. A lot of software do that in one way or another.
If you really feel like making it explicit you could use static_assert to test and document your assumption.
1 2
|
#include <climits>
static_assert(CHAR_BIT == 8);
| |
This will give you a compilation error if char is not 8 bits.
Note that all people that use std::int8_t or std::uint8_t implicitly makes this assumption.