Loading words from a file into an array

I am currently in intro programming and I'm really having trouble with pointers and arrays.
For this assignment, I'm not sure if I'm loading the words into the array/pointer and if I am randomizing them correctly since I keep getting segmentation faults. Any help would be GREATLY appreciated.

REQUIREMENTS:
1. In class Words, create two private int variables minlen, maxlen, count, and a pointer to a string (string *choices)

2. Create a private function int count_candidates(). This function will open the file (file.txt) and count all the words in the file whose lengths are between minlen and maxlen. This function will return the resulting count. If you use a string to load the words from the file, you can use its public function .length() to determine the word’s length.

3. Create a private function void load_words(). In this function, create a dynamic string array of size count. Have choices point to this newly created array. This function will open the file (file.txt) again. Similar to count_candidates(), this function will go through the entire file identifying all the words whose lengths are between minlen and maxlen. Add the qualifying words into the choices array as they are identified.

4. Create a function string pick_word(). This function will return a single random string out of the choices array.

5. In the constructor, call count_candidates() and load_words().
You will need to dispose of choices in a destructor.
For this milestone, use main to instantiate an instance of Words. Ask the user for two integer values as min and max. Pass these two values into your Words instance’s constructor. Also, test to see if you can display a single word by calling pick_word().



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

#include <iostream>
#include <cstring>
#include <fstream>
#include <ctime>

using namespace std;

class Words
{
     private: 
         int minlen, maxlen; //input in main
         int count = 0;
         string word;
         string * choices;

     // counts words with lengths between minlen and maxlen 
     int count_candidates()
     {
         ifstream fin;
         fin.open("file.txt")
         if (fin.is_open()){ 
             while (fin >> word){
                 if(word.length() > minlen && word.length() < maxlen)
                 count++;
         }
         }
         fin.close();
         return count;
     } 

     // load the qualifying words into the array
     void load_words()
     {
         string array[count];
         choices = array;
    
         ifstream fin;
         fin.open("file.txt");
         if (fin.is_open()){
             while (fin >> word){
                 if(word.length() > minlen && word.length() < maxlen){
                   choices[count] = word;  
         }
         }             
         }
         fin.close();
     }
     
     public:
         Words(int min, int max)
         { 
             minlen = min;
             maxlen = max; 
         }

         // returns random string from choices array
         string pick_word()
         {
             load_words();
             count_candidates();
             int result;
             result = rand()%count;     
             
             return choices[result];
         }

       ~Words() {delete [] choices;}        
};  

int main()
{
     srand(time(NULL));
     int min, max;
 
     cout << "Enter min: ";
     cin >> min;
     cout << "Enter max: ";
     cin >> max;

     Words words(min, max);
     cout << words.pick_word() << endl;

     return 0;   
}



INPUT FILE SAMPLE: file containing ALL the words valid in Words With Friends
it has around 172820 words.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
adaptivities
adaptivity
adaptor
adaptors
adapts
adaxial
add
addable
addax
addaxes
added
addedly
addend
addenda

Last edited on
choices[count] = word;
You never allocated memory for choices so it points to some random memory. The point of the function int count_candidates() is to find out the number of words so that you can allocate the the right amount of memory.
Hello milkitaa,

Your program reads an input file. Please share this input file, or at least a good sample, so everyone will know what you are working with and not have to guess at what this file contains.

As Thomas1965 pointed out string array[count]; C++ does not allow a VLA, (Variable Length Array), "count" must be a variable defined as a constant or a number like (10) or (100). Which you have not done. You do properly delete this memory, which you do not have, in the dtor.

In "main" you never create an object of the class or call any of its public functions, so you never read the file in the first place.

In the lines fin.open("file.txt") you are missing the ";" at the end. A compile should have caught this so that you could have fixed it before you posted the program.

Andy
Hello milkitaa,

Since this is an assignment for class it would also help if you post the full requirements for the program so everyone will know what you have to do.

Andy
Hello milkitaa,

I thought I had it all figured out until I reread step 5 and realized that 2 of the functions need to be called from the ctor.

I will have to work on that in the morning.

Thank you for the requested info and the input file is just fine for testing.

Andy
Thanks Andy, I appreciate it.
Hello milkitaa,

I think I have it working right. At least to a way I think it should be. I am not saying it is the best way, but it works.

To start with:
1
2
3
4
5
6
7
8
9
#include <iostream>
#include <string>

#include <fstream>
#include <ctime>    // <--- For "time".
#include <cstdlib>  // <--- For "srand" and "rand".
//#include <cstring>

using namespace std;

I added "cstdlib". Do not count on your compiler and header files to cover this for you. I know that VS will include "cstdlib" when I include "iostream", but not all compiler will do this.

For what its worth these include files are in 2 sections. The first is what I have found to be used in most all programs. Although the header files I include is a bit larger. The second group are the includes needed by this program.

In either group the order is not important, but in the first group I have found that alphabetical order helps to remind you what is missing.

Line 9 is a topic all to its-self. You may find this useful reading http://www.cplusplus.com/forum/beginner/258335/ It is the most recent post that I have seen.

Something that you may not be aware of:
1
2
3
4
class Words
{

};

Everything between the {}s is private by default. So when you wrote:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class Words
{
	private:  // <--- Already private at this point. But OK to leave.
		int minlen, maxlen; //input in main
		int count;
		string * choices;

		int fileStatus;
		const std::string inFileName{ "file.txt" };

		int count_candidates()
		{
			std::string word;

			ifstream fin;
			fin.open(inFileName);
			//ifstream inFile(inFileName);  // <--- An alternative to what is above.

			if (!fin)
			{
				std::cout << "\n     File \"" << inFileName << "\" did not open! in function \"count_candidates\".\n";

				return 1;
			}

			while (fin >> word)
			{
				if (word.length() > minlen && word.length() < maxlen)
				{
					count++;  // <--- Part of class.
					//std::cout << word << '\n'; // <--- Used for testing. Comment or remove when finished.
				}
			}

			fin.close();

			return 0;  // <--- Function ended with no problems.
		}

The comment on line 3 should explain it.

Lines 4, 5 and 6 are the variables required by step 1. I found the string word to not have any use here or need.

The next 2 variables I added. The "fileStatus" because of where the 2 private functions are called from and the string to make it easier to us. Now you have only 1 place to change the file name if it is ever needed.

The variable "word" is defined in the function that needs it and when the function ends so does the variable. This way it is not hanging around being unused.

Line 17 tends to speak for its-self. In time this is what you are likely to use more often.

When you open a file for input is is a must to check that it is open and ready to use. Line 19 is the simplest way to do this. There are other ways, but it is more work than you need.

Point 2 is covered with this private function, but I have a problem with it returning "count" when the function already has access to the private variable "count" and your original code used this variable. Si I used the return value a bit differently. More on that when I get to the function call.

Line 35 is not needed as the file stream will close when the function looses scope, but it is OK if you want to leave it.

The "load_words" function is much the same with a couple of changes.
1
2
3
4
5
void load_words()
{
	size_t index{};
	string word;
	choices = new std::string[count];

The first 2 variables are used only in the function. There is no need to define them anywhere else.

The 3rd line cover this part of point 3
create a dynamic string array of size count.


In the if statement I changed to this: choices[index++] = word;. Not a good idea to use "count" here because you would have to set it to (0) zero. Better to keep the value that is in "count" and use a different variable to be safe. By using the post ++ you add 1 to the variable after it has been used. No need for extra code for this.

The other difference here is that I did not check if the file was opened. If it did not work in the "count_candidates" function you will not get this far and since this is to be a "void" return value there is no way to return anything.

Your original "pick_word" function you are doing more than you need to. The 2 function calls need to be in the overloaded ctor and you can simply write: return choices[rand() % count];. This does the same as what you started out with. To keep what you have started with I would write: int result = rand() % count;. This does the same as the 2 line that you have.

I added this: int GetFileStatue() { return fileStatus; } to deal with the file not opening correctly.

Foe the overloaded ctor:
1
2
3
4
5
6
7
8
9
10
11
12
Words(int min, int max)
{
	minlen = min;
	maxlen = max;
	count = 0;
	fileStatus = 0;

	fileStatus = count_candidates();

	if (!fileStatus)
		load_words();
}

This is where I used the return value of the "count_candidates" function differently. Since "fileStatus" is a class variable you just use it. Nothing special is needed. The if statement determines if the "load_words" function should be called. If not you return to "main".

"main" I set up this way:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int main()
{
	//srand(time(NULL));
	srand(static_cast<size_t>(time(nullptr)));

	int min{ 3 }, max{ 10 };
	

	//cout << "Enter min: ";
	//cin >> min;

	//cout << "Enter max: ";
	//cin >> max;

	Words words(min, max);

	if (words.GetFileStatue())
		return 1;

	std::cout << "\n The random word is: " << words.pick_word() << '\n';

	//words.DisplayList(); // <--- Used for testing. Comment or remove when finished.

	return 0;
}

Starting with "size_t", and "size_type", these are aliases for an "unsigned int". As it was mentioned in the requirements the ".length()" and ".size()" functions return a "size_t" value. Also the ".length()" and ".size()" functions return the same number, i.e., the number of elements in the variable.

The way I have written "srand" tends to be the most portable way to do this and if you watch this video you will find the same information. https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful
Also the video will explain the problems of using "rand" and if you know wnt to expect it helps.

Giving "min" and "max" a value and commenting out the input just makes it easier to check the rest of the program without having it enter something each time the program runs. When ready you can remove what is between the {}s for "min" and "max" just leaving the empty {}s. Not really necessary in this program, but a good idea to always initialize your variables. From C++11 on the empty {}s make it easier.

After getting a value for "min" and "max" what you did not do was to create an object of the class, see step 5.

Line 17 is where I use the return value of the "count_candidates" function and the variable "fileStatus" to determine if the program should continue. If not this is the best place to leave the program so that everything is destroyed before the program ends.

If the program does not end this is when you call the "pick_word" function to get a random word. And as you can see you can do this easily in the "cout" statement.

Last part. When it comes to using {}s be consistent. Changing styles in the code makes it harder to read. Have a look at: https://en.wikipedia.org/wiki/Indentation_style#Brace_placement_in_compound_statements Of all the choices I prefer the "Allman" style the best. Along with the proper indenting it is the easiest to read and that should be your first goal to make the code easy to read. This is more for your benefit first then others.

I realize that I was not in the class leading up to this program, so my interpretation may be different than what is expected and that you may not be able to use what I have done.

If you have a problem let me know.

Andy
Starting with "size_t", and "size_type", these are aliases for an "unsigned int".

Not quite true. The size_t and size_type are implementation defined unsigned types, but not necessarily an unsigned int, they could be any unsigned type. They are usually an alias for either an unsigned int or unsigned long, but could also be an unsigned long long.
@Andy I told you long ago that size_t is not necessarily unsigned int. It is often unsigned long as it is on my system.

Is it really unsigned int on Windows? Even on 64-bit Windows? Is int 64 bits?

If you really want unsigned int, just say unsigned (the int isn't needed, so you can save a little typing).
Last edited on
Regarding what size_t is, let's check cppreference:
https://en.cppreference.com/w/cpp/types/size_t

cppreference wrote:
typedef /*implementation-defined*/ size_t;

1. It is an unsigned integer type. What type is up to the implementation.

2. The bit width of size_t is not less than 16. (since C++11)

Interesting how size_t is defined in hoary old C headers.
Topic archived. No new replies allowed.