Basically , what the code above does is to read a text file containing a text. and then the tokenizer will tokenize it based on the dropped delimeters which i specify. So for example, if my text contains a sentence such as "Hey how , is your ! day?" , the tokenizer will seperate each word from each other thus individual tokens such as "Hey" , "how" will be formed.
Those caracters should need a \ before them : http://msdn.microsoft.com/en-us/library/h21280bw%28v=vs.80%29.aspx
For the others, if they are in the first 128 ASCII characters, you should not have any problems. If not, you might have encoding related problems to solve too