Can UTF-16 file be BOM-less?

UTF-8 encoded file may or may not have BOM in it, does the same apply to UTF-16?

Can there be UTF-16 file without BOM?
I mean is it a usual scenario like with UTF-8?

I have so far not seen such file so that made me ask this.
no expert but from the wiki on it:

For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark should not be used because the names of these character sets already determine the byte order. If encountered anywhere in such a text stream, U+FEFF is to be interpreted as a "zero width no-break space".

and a bit later:
"The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian."
Last edited on
Thanks, I see, therefore it's possible for UTF-16 file be without BOM.

Interesting from wiki is also:
>Files local to a computer for which the native byte ordering is little-endian, for example, might be argued to be encoded as UTF-16LE implicitly. Therefore, the presumption of big-endian is widely ignored.

Which confirms my observation on Windows which uses UTF-16LE, also Win API functions use the same, so on Windows it's correct to assume files are UTF-16LE if not BOM is present.

And then it seems alternative to double check is to scan for those special characters if wanted:

>If there is no BOM, it is possible to guess whether the text is UTF-16 and its byte order by searching for ASCII characters (i.e. a 0 byte adjacent to a byte in the 0x20-0x7E range, also 0x0A and 0x0D for CR and LF)
Topic archived. No new replies allowed.