Problem: Reading string with Umlaute from cin and writing to a file

Hello everybody!

I have following problem for roughly one year:

I want to read a name from stdin/cin and write this name to a file. The problem is that the name can contain umlaute (ÄäÖöÜüß).

I am working with cygwin. Until one year, when I made an update, it worked fine. On an older machine with the old cygwin it still works, but on my actual machine with the newest cygin packages it doesn't. I could test the code even on an actual Ubuntu system 10.10 with the same wrong output.

This is the code I used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <stdio.h>

using namespace std;

int main ()
{
  string mystr;
  char name[21];

  cout << "Name: ";
  getline (cin,mystr);

  strncpy(name,mystr.data(),20);
  name[20]='\0';

  // -------------------------------------

  cout << "      12345678901234567890" << endl;
  cout << "Mystr:" << mystr << endl;
  cout << "Name: " << name << endl;

  // -------------------------------------

  ofstream myfile ("example.txt");
  if (myfile.is_open())
  {
    myfile << "Mystr:" << mystr << endl;
    myfile << "Name: " << name << endl;
    myfile.close();
  }
  else cout << "Unable to open file";

  // -------------------------------------

  FILE * pFile;

  pFile = fopen ("myfile.txt","w");
  fprintf (pFile, "Mystr:%s\n",mystr.data());
  fprintf (pFile, "Name: %s\n",name);
  fclose (pFile);

  // -------------------------------------

  return 0;
}


This is how I compiled it:

 
g++ -o test test.C


And these are some examples of the output:

1. Everything Okay

1
2
3
4
5
6
7
8
9
$ ./test.exe ; cat example.txt myfile.txt
Name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
      12345678901234567890
Mystr:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: aaaaaaaaaaaaaaaaaaaa
Mystr:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: aaaaaaaaaaaaaaaaaaaa
Mystr:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: aaaaaaaaaaaaaaaaaaaa


2. One shift for every umlaut

1
2
3
4
5
6
7
8
9
$ ./test.exe ; cat example.txt myfile.txt
Name: äaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
      12345678901234567890
Mystr:äaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äaaaaaaaaaaaaaaaaaa
Mystr:äaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äaaaaaaaaaaaaaaaaaa
Mystr:äaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äaaaaaaaaaaaaaaaaaa


1
2
3
4
5
6
7
8
9
$ ./test.exe ; cat example.txt myfile.txt
Name: äääaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
      12345678901234567890
Mystr:äääaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äääaaaaaaaaaaaaaa
Mystr:äääaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äääaaaaaaaaaaaaaa
Mystr:äääaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Name: äääaaaaaaaaaaaaaa


1
2
3
4
5
6
7
8
9
$ ./test.exe ; cat example.txt myfile.txt
Name: ÄäÖöÜüßääööüüßßöäüöäüßßßüäößüöääöÄü
      12345678901234567890
Mystr:ÄäÖöÜüßääööüüßßöäüöäüßßßüäößüöääöÄü
Name: ÄäÖöÜüßääö
Mystr:ÄäÖöÜüßääööüüßßöäüöäüßßßüäößüöääöÄü
Name: ÄäÖöÜüßääö
Mystr:ÄäÖöÜüßääööüüßßöäüöäüßßßüäößüöääöÄü
Name: ÄäÖöÜüßääö


I have no idea how to solve this.

So any help will be appreciated.

Many thanks.

Regards

DrDee
How is the file encoded? (UTF-8? MS "Extended ASCII"? etc?)

What code do you use to read the file?
How do you store the string? (std::string? wchar_t s[N]? etc?)
How do you display it? (cout? wxSomething? etc?) [code please]
Hi Duoas,

thank you for taking some time with my problem.

The only code I use is the code above. I don't have any extra code!

That means I don't read any file. I just take the input from cin and write it to a file as you can see above. The result of the new file is read with 'cat' - if that's what you asked for.

The files are written directly from the program. So I don't know where I can set the type of file encoding.

Regards

DrDee
Oh. Sorry...

It seems I wasn't paying enough attention the first time around.

The problem is that your OS handles input using UTF-8 encoding, and the 'ä' letter takes two chars to encode and not just one. So when you assume that every letter takes exactly one char then your code breaks...

Since your code is concerned with space requirements, it is behaving correctly.

Hope this helps.
Topic archived. No new replies allowed.