Section 0) Introduction
This article is to go over being Unicode Friendly in WinAPI.
I don't normally encourage programming in WinAPI, since you're typically better off with a crossplatform widgetry lib such as wxWidgets or QT or whatever. But a lot of people still like to use WinAPI directly... so I should at least point them in the right direction. Besides, a lot of the stuff here applies to wx as well (and possibly to QT, though I've never used QT so I can't say for sure).
I didn't put a lot of work into formatting or proofreading this article. So my apologies there. I still think it gets the idea across pretty well, even if my throughts are unorganized.
Unicode forever! Spread the love!
Section 1) The UNICODE macro
The UNICODE macro (and/or _UNICODE macro -- usually both) is scattered throughout all of WinAPI. It redefines some types and functions to use either char* strings (if it's not defined) or wchar_t* Unicode strings (if it is defined).
If you use MSVS, these macros are often automatically defined by the compiler before it begins compiling if you set your project settings to make the program a Unicode program. Otherwise you can do it yourself by #defining them before you include Windows.h:
1 2 3 4 5 6 7 8 9
|
#ifndef UNICODE
#define UNICODE
#endif
#ifndef _UNICODE
#define _UNICODE
#endif
#include <Windows.h>
| |
You don't need to #define either of them to use Unicode in your program. It just changes around some types to make it easier to use the Unicode parts of WinAPI.
Further in this article, "Unicode build" refers to UNICODE and _UNICODE being defined, whereas "ANSI build" refers to neither of them being defined.
Section 2) LPSTR, LPCSTR, LPTSTR, LPCTSTR, LPWTFISALLTHIS
Anybody who's looked at WinAPI has probably seen the above types... but what exactly are they?
An inexperienced C/C++ coder might think they're strings, like std::string. It can certainly look that way from the documentation and examples. And since WinAPI pages doesn't ever really seem to tell you exactly what they are, it's a logical conclusion.
However, this is not the case. All of the above are
macros which #define different types.
Now you might look at "LPCTSTR" and see the "STR" in there, but the rest might look like random letter combinations that make no sense. Rest assured there's a method to the madness.
- The starting 'LP' stands for "Long Pointer". Without getting too much into what a Long Pointer is (or really what it used to be, it doesn't have as much meaning in modern computing), we'll just say that this is basically a pointer. This means that the LP is telling you that this type is not a string by itself, but is a POINTER to a string (or really, a C-style string).
- The 'C' means that the string is constant
- The 'W' means the string is wide (Unicode)
- The 'T' means the string is TCHARs (see section on TCHAR below)
So really, the #defines are the following:
1 2 3 4 5 6 7 8
|
#define LPSTR char*
#define LPCSTR const char*
#define LPWSTR wchar_t*
#define LPWCSTR const wchar_t*
#define LPTSTR TCHAR*
#define LPCTSTR const TCHAR*
| |
Section 3) TCHAR, _T(), T(), TEXT()
TCHAR is #defined as either a char or a wchar_t depending on whether or not the UNICODE macro was defined.
By using TCHARs properly, you can create both ANSI and Unicode builds of your program. All you have to do is #define UNICODE if you want a Unicode build, or don't define it if you want an ANSI build.
This presents a bit of a problem, though. String literals in C++ can take 2 forms, either char or wchar_t:
1 2
|
const char* a = "Foo";
const wchar_t* b = L"Bar"; // <-- note the L. That makes it wide.
| |
The compiler doesn't auto-detect... so things like this would throw compiler errors:
1 2
|
const char* a = L"Foo"; // <-- error, can't point char* to a wide string
const wchar_t* b = "Bar"; // <-- error, can't point wchar_t* to a non-wide string
| |
So what about this?:
Remember that TCHAR is char or wchar_t depending on Unicode. So the above code will work
only if you are not building Unicode. If you are building Unicode you'll get an error.
Likewise, the following won't work
unless you're building Unicode:
To get around this problem... WinAPI provides some other macros, _T(), T(), and TEXT(), all of which do the same thing. In a Unicode build, they put the L before the string literal to make it wide, and in non-Unicode, they do nothing. Therefore they will always work hand in hand with TCHARs:
|
const TCHAR* d = _T("foo"); // works in both Unicode and ANSI builds
| |
Section 4) Function and Structure Name Aliases
A lot of Windows functions take strings as parameters. But because char and wchar_t strings are two distinctly different types, the same function can't be used for both of them.
Take for example, the WinAPI function "DeleteFile" which takes a single parameter. Let's say you want to delete "myfile.txt":
|
DeleteFile( _T("myfile.txt") ); // notice _T because DeleteFile takes a LPCTSTR
| |
The trick here is that the function DeleteFile doesn't really exist! There are actually two different functions:
1 2
|
DeleteFileA( LPCSTR ); // ANSI version, taking a LPCSTR
DeleteFileW( LPCWSTR ); // Unicode version, taking LPCWSTR
| |
DeleteFile is actually a
macro defined as either DeleteFileA or DeleteFileW, depending on whether or not this is a Unicode build.
As such... for WinAPI functions that take a C style string... there are, in a sense, 3 different versions, each taking a different type of C string:
1 2 3
|
DeleteFile <- Takes a TCHAR string (LPCTSTR)
DeleteFileA <- Takes a char string (LPCSTR)
DeleteFileW <- Takes a wchar_t string (LPCWSTR)
| |
This is true of virtually all WinAPI functions that take a C string as a param.
But it doesn't stop there! There are also some structs that have strings in them, as well. For instance, the OPENFILENAME structure contains various C strings for use with the open file dialog box. As you might expect, There are 3 versions of this struct as well:
1 2 3
|
OPENFILENAME <- has TCHAR strings
OPENFILENAMEA <- has char strings
OPENFILENAMEW <- has wchar_t strings
| |
And again... note that OPENFILENAME doesn't
really exist, but is just a #define of one of the other two depending on the build.