a unicode char problem

Hello all.
I have a char in unicode at place 7990 (and because unicode is idepended from locale?).
I whant this character to make wstring. How i must do it. I am trying ...

1
2
3

wchar_t *a1=new (wchar_t)7990; // not work ...
char *a2=new (char)7990; not work ...

I dont know how i must think... Any ideas?

Last edited on

Disch (13742)

Sadly, C++ lacks standard support for Unicode (at least currently -- it is planned in "C++0x"). Whether or not you can output Unicode depends on the library you're using to output.

Standard lib output methods (like cout) are not guaranteed to be Unicode, but they are sometimes.

Since you didn't specify, I'm going to assume you are trying to output Unicode to the Windows Console. If that is the case I can refer you to this thread:

http://cplusplus.com/forum/windows/9797/

Note: you can skip to the last page for the solution I found (which worked on my machine, but apparently the other guy had problems getting it working).

If you need Unicode for something else (opening unicode file names, *nix console, GDI output, etc) that thread won't be of any help. Like I say it all depends on the lib you're using. If you can be more specific about what you need I might be able to help further.

ps: 7990 decimal is U+1F36 -> ἶ
is that what you want? or did you mean 0x7990 hex -- U+7990 -> 禐

Last edited on

dkaip (196)

Good moorning.
ps: 7990 decimal is U+1F36 -> ἶ is i am lloking for. Win xp Greek edition, codeblocks, gcc compiler i use.
The need is in a wstring to replace a character with the ἶ cgharacter.

Last edited on

writetonsharma (1461)

hahahaha.. same thread i will refer..

we had done so much in that thread.. right disch.. :P

dkaip (196)

I find a solution ..

1
2
3

ofstream outputFile("log.txt",ios::binary);
wchar_t example[] = L"\x1F36";
outputFile<<WChar_to_UTF8(example)<<endl;

wchar_t example[] = L"\x1F36";
outputFile<<WChar_to_UTF8(example)<<endl;
wstring asd(example);
outputFile<<WChar_to_UTF8(asd.c_str())<<endl;

wchar_t example[] = L"\x1F36";
outputFile<<WChar_to_UTF8(example)<<endl;
wstring asd(example);
outputFile<<WChar_to_UTF8(asd.c_str())<<endl;
wstring asd1(L"\x1F36");
outputFile<<WChar_to_UTF8(asd1.c_str())<<endl;

WChar_to_UTF8 is from Helios.
outputFile<< gives the ἶ character. Now i must convert 7990 to hex?
Is a way like wchar_t example[] = L"\x....7990"; or like bellow for this?

1
2

unsigned int i=7990;
wstring asd1( ???? );

If i try ...

wchar_t outStr[1];
int inDec = 7990;
swprintf(outStr,L"%x",inDec);
wstring asd2(outStr);
outputFile<<WChar_to_UTF8(asd2.c_str())<<endl;

it gives no ἶ but 1f36 at log.txt file.

Last edited on

Disch (13742)

if your source file is UTF-8 encoded, and your compiler doesn't mangle the encoding when it compiles your source, you can just input the characters normally:

char myutf8string[] = "Example: ἶ, 禐\n";
outputfile << myutf8string;

 // or just output the string directly
outputfile << "Example: ἶ, 禐\n";

EDIT:
Of course... 'outputfile' would also need to be a utf8 encoded text file for this to work. Whether or not that is the case depends entirely on how the text editor you're using determines it. If you're using something like *gag* MS Notepad, you might need to put a pseudo-BOM at the start of the file to indicate that the file is UTF-8:

ofstream outfile("myfile.txt");

// output pseudo-BOM
outfile << "\357\277\276";
outfile << "Example: ἶ, 禐\n"; // Notepad will now treat this as UTF-8 when you view it

The 'BOM' here is the U+FFFE character but in UTF-8 encoding. It's really only supposed to be used with UTF-16 files (so you can deteremine between big/little endian), but Notepad and similar programs sometimes use it in UTF-8 files.

Last edited on

helios (17607)

If that's my routine, you're using it improperly.
WChar_to_UTF8() returns heap arrays. If you just pass the returned pointer to std::cout::operator<<(), you create a memory leak.

Disch's method will work only for static strings. If you need to output a character determined at run time, you can do this:

wchar_t character[]={/*value goes here*/,0};
char *str=WChar_to_UTF8(character);
outputfile <<str;
delete[] str; //free memory!

Last edited on

Disch (13742)

perhaps WChar_to_UTF8 should return a std::string instead of a char* so that you don't have to delete[] ?

helios (17607)

The application I wrote it for was going to use it to convert strings as long as 5 MiB. I couldn't just go around making copies as I pleased.
Besides, I think it's silly to copy a whole array just because the coder was too lazy to add one more line of memory management. The coder is of course expected not to forget to do it.

Disch (13742)

I would just assume std::string reference counts so there'd be no copy. I'd hope so anyway. If it doesn't, it really is the worst string lib on the planet.

I'm just a big fan of RAII style setups. One of the beauties of them is that it's impossible to forget (on top of being safer regarding exceptions -- what if << throws an exception before you can delete[]?)

Anyway whatever. Just spouting out random crap. Feel free to ignore me. XD

Last edited on

helios (17607)

I would just assume std::string reference counts so there'd be no copy.

*Benchmarks string "copy"*
Well, I- Umm... Hm.
Okay, now I feel stupid. That was a tremendous waste of time. Seriously, a good deal of the application is wide text parsing.
Oh, well. At least I know I have enough training not to produce a memory leak ever again.

Last edited on

dkaip (196)

If that's my routine, you're using it improperly.
WChar_to_UTF8() returns heap arrays....

The routine is from you and in my program works just fine, i have exelent resaults if files.
I am trying to convert a program from basic toy c++ and now i read STL etc...
No experience with UTF8. I must study in a very litle time that i have.
Thank's very mutch.

Last edited on

Topic archived. No new replies allowed.

a unicode char problem

C++

Forum