This seems like a hard task. I’ve never done anything like this, but if I had to, I would probably try like this:
The first, easier, part is to detect the basic parts of speech (nouns, verbs, etc. maybe even with more detail, like verb form). This can be done using a dictionary, containing words along with flags indicating what part of speech the words are. Obviously the dictionary should be kept in a container that allows fast searching, like a hash table or balanced binary tree.
Now the difficult part. If you know what part of speech each word in the sentence is, you could try to make a grammar parser to detect the basic parts of sentence (subjects, objects, etc.). In this parser, you should simply list the possible sentence structures, taking advantage of the fact that English is quite well structured language. Something like this (I use Yacc/Bison parser generator):
1 2 3 4
|
sentence
: NOUN VERB NOUN { subject($1); object($3); } /* e.g. "Andy is a singer." */
: DO NOUN VERB NOUN { subject($2); object($4); } /* e.g. "Does Andy like football?" */
;
| |
You get the idea. I don't know if this method is really practical or even possible. Also, it would require quite a lot of work and a good knowledge of the English grammar.