I'm trying to read in a large file of data. Sadly, the people who released this data did a terrible job at formatting it, making it very hard to read. My default reader (a method I found referenced on this forum a few months back) already gives back a better result than I anticipated. However, I still need to handle strings of letters and numbers combined. There are two types:
1. From the string "CAPACITY : 100" I need to extract the "100". The number could be any size, so no guarantee that it's the final 3 characters. It is guaranteed to be the only numerical value in that string and the non-numerical part (here: "CAPACITY : " will be identical for each instance. If possible, the ability to read a similar string with non-identical text and two numbers on random locations in that string would be handy, but not required.
2. From the string " 1 82 76" I need to extract the second and third numbers. The amount of digits of each numbers can vary, but they will always be present (so at least one digit) and they'll always be separated by a space. They will always be integers (but if there's an easy way to make it extendable to other numerical types, that could be handy); they can be positive and negative (obviously shown by a minus sign in front of the number itself).
The easiest way to handle this would be, without a doubt, using regular expressions. Regular Expressions are not (yet) a standard feature of C++, but the Boost C++ Libraries provide such functionality with Boost.Regex
you can use std::stringstream for this.
for your first problem you can to it like this: (let text be the string containing "CAPACITY : 100")
1 2 3 4 5
std::string tmp;
int value;
char c;
std::stringstream ss(text);
ss >> tmp >> c >> value;
tmp will be CAPACITY, c will be : and value will be 100.
but this only works because capacity and : are seperated by a space. if they are not, you dont need the >> c part.
for your second problem: (let text be the string containing "1 82 76")
1 2 3 4 5
int a;
int b;
int c;
std::stringstream ss(text);
ss >> a >> b >> c;
I've quickly reviewed the files and it seems the spacing is quite consistent in these files. I've got other files where they're not so consistent (spaces uses as padding so the numbers visually form a neat column, so you get " 3" and " 30", for example), so I guess I'll have to familiarize myself with Regular Expressions anyway, but for now the stringstream methodology will do.
Trying it now; I'll let you know how it worked out.
If it doesn't work because of spacing and what not, maybe you could read a single char to see if it is a number; if so then use the method above otherwise move the file cursor forward with ios::seekg.
Would definitely work, but I am still for him to look into Regular expressions cause it would help in situations where he wouldn't get away as easily as that.
Mathes' solution was sufficient for my current problem, so I've used that for now, and hanst99's Regular Expressions seem like the best option when a completely flexible reader is required, so I've added that to my "To Read" bookmarks.
How it looks now:
Three methods, one for each "format" present in my data files:
int catchInt(string &s) {
std::stringstream ss(s);
std::string tmp;
char c;
int value;
ss >> tmp >> c >> value;
return value;
}
void catchCoords(string &s, int &x, int &y) {
int a;
std::stringstream ss(s);
ss >> a >> x >> y;
}
void catchDemand(string &s, float &d) {
int a;
std::stringstream ss(s);
ss >> a >> d;
}
which are called by passing the string that contains the data and the containers for the values to be saved:
1 2 3 4 5 6
nnode = catchInt(file_f[3][0]);
// Fetch node data
for (int l = 0; l < nnode; ++l) {
catchCoords(file_f[l+7][0], x[l], y[l]);
catchDemand(file_f[l+8+nnode][0], d[l]);
}
thereby saving the values where they belong.
I was pleasantly surprised to see that while 'd' is actually defined as a vector of floats, it has no problems with the mostly integer values in the file. I was afraid it would skip non-float values.