String Parsing using known delimiters? [C++]

The age old issue of string parsing comes up again ...
I have a text file that contains lines that are SUPPOSED to follow a set format, specifically:
string, string, long string int string double int

The delimiters are therefore:
Comma (,) for the first two fields
Spaces for all other fields

Strings like this would be valid:
Jon, Jack, 100 CPN 5 KTE 1.00 10
Jon, Jack 100 CPN 5 KTE 1.00 10 // notice the extra spaces

Whereas something like these would be considered invalid:
Jon Jack 100 CPN 5 KTE 1.00 10 // missing the commas
Jon, Jack, 100 CPN 5 KTE 1.00 // missing the last field "10"
Jon, Jack, 100CPN 5 KTE 1.00 10 // missing space between "100" and "CPN"

The goal is to EXTRACT each section and store them, and if possible determine when a string is INVALID (does not follow format).
I have a class with the following data members:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class A
{
private:
	// Record
	string A
	string B
	long C;
	string D;
	string E;
	string F;
	double G;
	int H;

public:
	A(string sLine);	// constructor
};

A::A(string sLine)
{
	// somehow parse the string here and determine if it is valid //
}



So, how can I parse the string (sLine) and extract each piece into there components (A, B, C, D, E, F, G, H)...
I was thinking of using the old method of simply doing substring searches but I find it very error prone and long ... is there a better way to accomplish this?

Anything anyone would recommend?
Any help would be much appreciated...
Thanks,
closed account (S6k9GNh0)
Use get() and set the delimiter to whatever you want.
Last edited on
Hi...

Use strtok function!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <string.h>
#include <stdio.h>

char string[] = "A string\tof ,,tokens\nand some  more tokens";
char seps[]   = " ,\t\n";
char *token;

void main( void )
{
   printf( "%s\n\nTokens:\n", string );
   /* Establish string and get the first token: */
   token = strtok( string, seps );
   while( token != NULL )
   {
      /* While there are tokens in "string" */
      printf( " %s\n", token );
      /* Get next token: */
      token = strtok( NULL, seps );
   }
}
For this I would use a regular expression (boost::regex). It will do what you want without having to write any parsing code, and the regular expression to capture your format is trivial.

Topic archived. No new replies allowed.