Display only the 20 most used words[word frequency]

by Ong Mark Anthony L.
Last edited on
You should consider using a dictionary that maps a word to a count to hold your data. Also, use strings rather than char arrays.

The algorithm is something like:

open file
while there's more to read
read next word
have we seen it?
YES: increment the count
NO: store it with a count of 1

traverse the dictionary
display word, count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
ok tnx i get i now i modify my code.base on what i research i came up with this:

#include <iostream>
#include <map>
#include <fstream>
#include <string>

using namespace std;

typedef map<string, int> word_count_list;

int main()
{

    word_count_list word_count;
    string filename;

    // Get the filename.
    cout << "Enter the file you wish to have searched:\n";
    cin >> filename;

    // Open file.
    ifstream file(filename.c_str());

    // Read in all the words.
    string word;

    while (file >> word)
    {
        // Remove punctuation.
        int index;
        while ((index = word.find_first_of(".,!?\\;-*+")) != string::npos)
        {
            word.erase(index, 1);
        } 

        ++word_count[word];
    }

    // Print out the word counts.
    word_count_list::const_iterator current(word_count.begin());

    while (current != word_count.end())
    {
        cout << "The word '" <<current->first<< "'appears " << current->second <<" times" << endl;

        ++current;
        
    }
    system("pause"); 
}


now my problem is how to sort or rank them from highest count to lower ....

i think i have lots of trial and error to do .hahah...

Any tips fill free to post
Last edited on
use STL, it has sort and all kinds of fun stuff

http://www.cplusplus.com/reference/algorithm/sort/

and please use the code tags in your posts
Last edited on
Are your counts correct or are they off by one?
they are definitely correct test it your self


c'mon just 1 more problem and it will be over..i just need to sort them.

is it possible that i reverse it to map<int,string> so that the count will sort?
Ok here is the program that i change base on my what i have than but my problem now is when they have the same value .online the first one will be display and bypass the others that has the same value.
here is the program


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <iostream>
#include <map>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <utility>
#include <iterator>
#include <set>

using namespace std;
struct sortPairSecond
{
   bool operator()(const pair<string, int> &lhs, const pair<string, int> &rhs)
   {
       return rhs.second < lhs.second;
   }
};

typedef map<string,int> word_count_list;

int main()
{

    word_count_list word_count;
    string filename;

    // Get the filename.
    cout << "Enter the file you wish to have searched:\n";
    cin >> filename;

    // Open file.
    ifstream file(filename.c_str());

    // Read in all the words.
    string word;

    while (file >> word)
    {
        // Remove punctuation.
        int index;
        while ((index = word.find_first_of(".,!?\\;-*+")) != string::npos)
        {
            word.erase(index, 1);
        } 

        ++word_count[word];
    }

    // Print out the word counts.
    
    word_count_list::const_iterator current(word_count.begin());            
        
    set<pair<string,int>, sortPairSecond > mySet;
    for(map<string, int>::const_iterator it = word_count.begin(); it != word_count.end(); ++it)
    {
        mySet.insert(*it); 
    }

    cout << "\nSet Order:\n--------------\n";
    for(set<pair<string, int> >::const_iterator it = mySet.begin(); it != mySet.end(); ++it)
    {
        cout << it->first << " = " << it->second << "\n";
    }
 system("pause"); 
    }
   

Last edited on
A map sorted by key, the word in your case. It might be easier to copy the content into a vector and sort that. There's no point in trying to bend a map into something it isn't.
Okay... this is not very elegant... maybe someone else could provide a prettier solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <fstream>
#include <algorithm>
#include <string>
#include <map>
#include <vector>

using namespace std;

typedef map<string,int> word_count_list;

struct val_lessthan : binary_function < pair<string,int>, pair<string,int>, bool >
{
  bool operator() (const pair<string,int>& x, const pair<string,int>& y) const
    {return x.second<y.second;}
}val_lt;

int main()
{
    word_count_list word_count;
    string filename;
    
    // Get the filename.
    cout << "Enter the file you wish to have searched:\n";
    cin >> filename;
    
    // Open file.
    ifstream file(filename.c_str());
    
    // Read in all the words.
    string word;
    
    while (file >> word){
        // Remove punctuation.
        int index;
        while ((index = word.find_first_of(".,!?\\;-*+")) != string::npos)
            word.erase(index, 1);
    
        ++word_count[word];
    }
    
    //copy pairs to vector
    vector<pair<string,int> > wordvector;
    copy(word_count.begin(), word_count.end(), back_inserter(wordvector));

    //sort the vector by second (value) instead of key
    sort(wordvector.begin(), wordvector.end(), val_lt);
    
    for(int i=0; i<wordvector.size(); ++i)
        cout << wordvector[i].first << " = " << wordvector[i].second << endl;
    
    return 0;
}

Last edited on
What's ugly about that? It's clear in its intent.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <fstream>
#include <algorithm>
#include <string>
#include <map>
#include <vector>

using namespace std;

typedef map<string,int> word_count_list;

struct val_lessthan : binary_function < pair<string,int>, pair<string,int>, bool >
{
  bool operator() (const pair<string,int>& x, const pair<string,int>& y) const
    {return x.second<y.second;}
}val_lt;

int main()
{
    word_count_list word_count;
    string filename;
    
    // Get the filename.
    cout << "Enter the file you wish to have searched:\n";
    cin >> filename;
    
    // Open file.
    ifstream file(filename.c_str());
    
    // Read in all the words.
    string word;
    
    while (file >> word){
        // Remove punctuation.
        int index;
        while ((index = word.find_first_of(".,!?\\;-*+")) != string::npos)
            word.erase(index, 1);
    
        ++word_count[word];
    }
    
    //copy pairs to vector
    vector<pair<string,int> > wordvector;
    copy(word_count.begin(), word_count.end(), back_inserter(wordvector));

    //sort the vector by second (value) instead of key
    sort(wordvector.begin(), wordvector.end(), val_lt);
    
    for(int i=0; i<wordvector.size(); ++i)
        cout << wordvector[i].first << " = " << wordvector[i].second << endl;
    
    return 0;
}

THANKS alot.nuff said its very clear ...thanks

now im Trying to limit the output to 20 in other words display the 20 most used words....

i will post the code later
Last edited on
turbozedd: Was it really necessary to provide a solution? That isn't help, that's doing it for him/her. Please don't post solutions in future.
kbw: I agree with you in general, and I do it to much...but in some cases it is good for them to see a cleaner implementation of their own code. I agree that in this case I did do too much, it is a laziness issue.

markrezak: USE CODE TAGS!!!!!!!!!!! the # icon on the right will do it for you
i try changing this code for(int i=0; i<wordvector.size(); ++i to for(int i=0; i<=20; ++i

but nothing happens.any tips?
How many words are in your test file? Also, you need to make sure that 20 is not bigger than the vector. And then decide what to do in the case of ties, especially when they are tied for 20th.

Try adding cout << i << ":\n"; inside the for loop to see what is going on you can also add a while inside for ties.
1
2
3
4
5
6
7
8
for(int i=0; i<wordvector.size(); ++i){
    while(number of occurences is same){
       cout...
       skip++;
    }
    add number to skip to i
}
Last edited on
thank you very much

i add a to limit the loop

1
2
3
4
5
6
for(int i=0,a=0; i<wordvector.size(),a<21; ++i,++a)
        cout <<a<< wordvector[i].first << " = " << wordvector[i].second << endl;
if(a=20)
{
break;
}


thank for giving me the idea
You don't need to break it.

Use some logic (i.e. &&) in your for statement.

Plus, the break is not in the same scope as the the loop.... so you aren't breaking anything. This actually generates an error with g++ compiler.

The loop you have outputs the bottom twenty not the top twenty.

You do not need two separate variables to do this-- for logic purposes.

Try using a reverse iterator or a normal iterator (you'll need to adjust the sort... greater than).
Topic archived. No new replies allowed.