Frequency

Pages: 12
What is the best method to get the frequency of words in a file? I have this so far, however it only grabs the first word in the file and checks it with itself.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <fstream>					
#include <string>
using namespace std;

int main()
{
	string filename;
	ifstream Line("KeyWordsOnLineHelp.txt");

	if (Line.fail())				
	{
		cout << "Error: main(): Failed to open the file: ";
		cout << "KeyWordsOnLineHelp.txt" << endl;
	}
	else
	{
		while(!Line.eof())
		{
			string word1;
			string word2;
			getline(Line,word1);
			if (!Line.eof())
			{
				bool Found = false;
				while(!Line.eof() && !Found)
				{
					getline(Line,word2);
					if (!Line.eof())
					{
						if(word1 == word2)
						{
							Found = true;
						}
					}
				}
				if(Found = true)
				{
					cout <<      word1    << endl;
				}
			}
		}
		cout << endl;
	}
	cout << endl;
	system("pause");
	return 0;
}
Grab the word in a temp string. Compare it against the words you've already grabbed and stored into a vector. If it's not found push_back() that word into the vector. Repeat.

EDIT: Of course you'll actually want to use a custom class or struct to store the variables so that you can associate an int with the word to count how many times it occured.

EDIT 2: Did you want psuedo code? I don't want to just give you everything because some people enjoy a challenge.
Last edited on
I would love something to work off, yeah.
Ok, here it goes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <fstream>					
#include <string>

#include <vector> //New Header File To Include

//code code code

class Occurances //Our New class, Objects Are Great
{
     public:
     Occurances() {}
     Occurances(std::string Arg) : Word(Arg), Freq(1) {}
     
     std::string Word;
     int              Freq;
};



int main()
{
                //More Of Your Code
                
                Occurances Counts; //Our New Struct
                std::vector<Occurances> OcVec; //Our Vector
                std::string Temp; //Temp Variable




	while(!Line.eof()) //Don't Test For eof()
                {
                       Line >> Temp; //Extraction Operator Is Fine In This Instance
                       
                       for(int i = 0; i < OcVec.size(); i++)
                       {
                             if(Temp == OcVec[i].Word)
                             {
                                    OcVec[i].Freq++;
                              }
                              else
                              {
                                    Occurances Another(Temp);
                                    OcVec.push_back(Another);
                               }
                          }
                       }



	system("pause");
	return 0;
}


I didn't test this, and it's a lot more code then I intended to give you but hopfully you understand it.

EDIT: Feel free to bombard me with questions BTW.
Last edited on
Few things- I understand most of what you did, other than the concept of the class. Is there a way to do it with a function such a void(stuff) rather than a class? Haven't touched classes yet in what I am doing. Also why on line 32 do you not check for eof?
- Objects aren't terribly complicated and the tutorial on this site would give a great starting point. I liked to use them in this case because you have to keep track of two different variables for each word (what the word is and the number of times it occurs). By using a class (or struct) we can tell the computer that the variables are related and we can access them in relation to eachother.

- I said don't check for eof() because my method of getting the words was to use the extraction operator. So if you had a whitespace at the end of your file then the operator would never take that out of the buffer and technically the read would never finish.

Ok, so what is put in place of the eof? Also, what does the public mean up in the class function?
- I've always liked:
 
while(Line >> Temp)

But this has it's limitations since at the end of the file it will feed the data you last read into "Temp" through the loop again so you need to blank it at the end of each iteration.


- In the case of a class object, the data members and functions are private by default, this means that no operation outside the scope of the object can reference these pieces. By writing "public:" I am telling the system that I want operations outside of the scope of the class to be able to refer to these members.
Alright, I understand the class part now due to looking at the tutorial and your help. The only thing I am stuck on now is blanking the Temp after each iteration. It would go in the for loop correct? I am also confused as to what "blanking" it means. Is it closing it of assigning it a zero value?

Also when while(Line>> Temp) is put it, it says that Line is undefined, however if I put in an ifstream, it doesnt pick it up.
Last edited on
- Yes, blanking means that you are setting the variable 'Temp' to whitespace of something like that.

- Using Line like that should not produce any errors. Can I see your current code?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>
#include <fstream>					
#include <string>
#include <vector> 


class Occurances 
{
	public:
	Occurances() {}
	Occurances(std::string Arg) : Word(Arg), Freq(1) {}

	std::string Word;
	int Freq;
}



int main()
{
	Occurances Counts; 
	std::vector<Occurances> OcVec; 
	std::string Temp;
	while(Line >> Temp) 
	{
		Line >> Temp;

		for(int i = 0; i < OcVec.size(); i++)
		{
			if(Temp == OcVec[i].Word)
			{
				OcVec[i].Freq++;
			}
			else
			{
				Occurances Another(Temp);
				OcVec.push_back(Another);
			}
			Temp = ' ';
		}
	}

	cout << endl;
	system("pause");
	return 0;
}
Ooops, my bad. I didn't mean for you to delete your declarations. Most of the stuff you had should have stayed. I wrote "//More Of Your Code" because I didn't want to type it out again.

EDIT: So you still need the line of code that reads:
ifstream Line("KeyWordsOnLineHelp.txt");
Last edited on
Alright, I put it back, however on both
1
2
string filename;
	ifstream Line("KeyWordsOnLineHelp.txt");
it says string and ifstream are undefined, which is what confused me to begin with.
Ok, you also need to type using namespace std; or else you need to change your declarations to:
1
2
std::stiring filename;
std::ifstream Line("KeyWordsOnLineHelp.txt");

There's another section in the tutorials on namespaces and why this is if you're interested.
Haha, that is my fault, I should have noticed it got removed there.

Does the Temp = ' '; at the end of the for loop do what it needs to?

Last edited on
Not exactly, you need to use Temp = "";//Notice The Use Of DOUBLE QUOTES Not Single since "Temp" is an std::string.

EDIT: There's also no need for the variable on Line 21 called "Counts". I'm just not the best at cleaning stuff like this up :p .
Last edited on
Also, the last thing I need is to cout the words that were repeated and how many times. Would that statement go before the while or before the for statement?
That would go AFTER the while(...) statement. This code needs to read through the entire document before it will know how many times each word is repeated. You'd use a loop to iterate through the vector and output the data.
Not sure what you mean, so I would use a while loop to have it output the words and the number of times they appear? Also, is Temp what will be put into the vector to iterate through?
An "std::vector<class T>" container is a dynamic array of Templates, don't worry if that sounds scary it's simpler then the description implies. See here for a complete reference: http://www.cplusplus.com/reference/stl/vector/

As with any array you can access any of its elements by it's index number. I suggest using a for loop for this as it allows you to first initialize an integer variable to zero, then compare that variable against a condition (in this case while the integer is less then the size of the vector), and finally promote the integer by one. Inside the loop you would use the integer to access the elements of the array sequentially that is starting at the index number 0, then going to the end of the array.

The member function size() is already part of the std::vector container and is updated pretty much everytime the size of the container changes (there are specific exeptions to this that do not apply to this example).
Last edited on
Pages: 12