the format of the input matters...
does it double space after a period? Do you need to detect and ignore non-words, like a number or email address?
for a simple sentence, counting the spaces will work, and adjust for the left and right extreme ends.
eg
this has four words(endl)
----s----s----s 3 spaces, 4 words, that pattern holds up fairly well in standard text, is it good enough for yours? do you understand how cin works and what it does when it hits a space?
for statistics, eg how many times does 'the' appear... you do need to store something. At that point, the problem needs a lot more words to describe what exactly you need to do.
Well the stipulations of it, it would have to read in a txt file, then count how many occurrences a word appears, it can be case insensitive or case sensitive. It would have to print out the words in least appearance to most with the word count next to each word.
ok, that is a bit more involved. you need to use a to-upper or to-lower across each word to normalize the case, and store an entry for each word next to a count of it in some sort of data structure.
the modern c++ way to do this with minimal coder effort would be to use a map. Are you allowed to use modern tools or is this a 'both hands tied behind my back' school problem?
it may take a bit of doing to print every word not in the file, there are online dictionaries I guess that you can pull from. Are you sure about that hello count zero entry :P
That's a little more specific than "counts words".
You are asking to histogram a file.
When you say "dynamically-allocated arrays" do you mean build yourself a tree or linked list that you can modify? Or just one big array that you allocate once at the beginning?
You need to define what is meany by a 'word' - or get your professor etc to define what is meant. At it's simplest a word is a sequence of chars delimited by either a white space (space, tab, newline) or begin of text or end of text? Is that sufficient. Do you need to remove punctuation from within the word?
PS this question has been asked previously on this forum.
As a starter, perhaps this. It will count the number of words (delimited by white-space, converted to LC and ignoring non-alpha chars) in a std::string and display the count for each different word. You'll need to add to read from a file and to display/sort the list in count order if required. This uses dynamic memory allocation in the list class to store the words/cnt. I'm assuming you can use std::string...
#include <iostream>
#include <string>
#include <cctype>
class MyList {
public:
MyList() {};
~MyList() {
while (head) {
constauto cur {head};
head = head->next;
delete cur;
}
}
void add(const std::string& wd) {
for (auto cur = head; cur; cur = cur->next)
if (cur->wd == wd) {
++cur->cnt;
return;
}
head = new Node(wd, head);
}
void display() {
for (auto cur {head}; cur; cur = cur->next)
std::cout << cur->cnt << " " << cur->wd << '\n';
}
private:
struct Node {
Node* next {};
std::string wd;
size_t cnt {};
Node() {}
Node(const std::string& w, Node* nxt, size_t ct = 1) : next(nxt), wd(w), cnt(ct) {}
};
Node* head {};
};
int main() {
const std::string text {"these are words in the sentence. These are also words in a different sentence! "};
MyList words;
bool st {};
std::string wd;
for (auto chp {text.c_str()}; *chp; ++chp) {
if (std::isspace(static_cast<unsignedchar>(*chp))) {
if (st) {
words.add(wd);
wd.clear();
st = false;
}
} else {
st = true;
if (std::isalpha(static_cast<unsignedchar>(*chp)))
wd += static_cast<char>(std::tolower(static_cast<unsignedchar>(*chp)));
}
}
if (!wd.empty())
words.add(wd);
words.display();
}
1 different
1 a
1 also
2 sentence
1 the
2 in
2 words
2 are
2 these
@seeplus I don't think OP is at linked-list level yet. Or that this class is using high-level C++ constructs yet. Think:
1 2 3 4 5 6 7 8 9 10
constunsigned MAX_NUM_WORDS = 5000; // maximum number of word/int pairs
char** words = newchar*[MAX_NUM_WORDS]; // array of (char array)
int* counts = newint [MAX_NUM_WORDS]; // array of int
unsigned num_words = 0; // how many word/int pairs do I have?
...
delete[] counts; // delete array of int/count
while (num_words --> 0) delete[] words[num_words]; // delete each word
delete[] words; // delete array of words
I am unsure how OP is expected to get text from the file. One character at a time would be easy here...
If wanted, the entire file can be loaded into memory in one go:
The returned string can be treated as a 1-D array of the file's content.
1 2 3 4 5 6 7 8 9
auto s = read( argv[1] );
for (unsigned n = 0; n < s.size(); n++)
{
if (std::isprint( s[n] )) std::cout << "'" << s[n] << "' ";
else std::cout << "^" << (s[n] + 'A') << " ";
if (std::isalnum( s[n] )) std::cout << "is alpha-numeric.\n";
else std::cout << "is neither a letter nor a digit.\n";
}
But, you can certainly just read from file directly:
1 2 3 4 5 6 7 8 9
char c;
while (f.get( c ))
{
if (std::isprint( c )) std::cout << "'" << c << "' ";
else std::cout << "^" << (c + 'A') << " ";
if (std::isalnum( c )) std::cout << "is alpha-numeric.\n";
else std::cout << "is neither a letter nor a digit.\n";
}
yea who knows what he is allowed to do. this could be 5 lines or 500... and range from aggravating to simple in terms of how a 'word' is defined.
I never liked the is-digit is alpha etc tools. You invariably end up checking the same value multiple times with those things ... I prefer a lookup of 256 entries with the desired results given. That can even toupper / tolower WHILE determining what it is etc, one op for all. If only unicode would cooperate with that kind of processing :(
¿what's your issue?
¿are you able to extract all the words from input into an array?
¿are you able to extract a single word from input?
¿is the problem with expanding the array dynamically?
¿you do have the array but don't know how to count similar words?
¿you can count them but don't know how to sort them?
¿what's your issue?
> For example:
> Hello 0
> Hi 1
> There 2
> He 3
Suppose that your input was «He Hi He There He», I don't understand why your output has «Hello»
¿why did you count Hello?
I need to print the word occurence in order from least to most occured. Right now it prints from most to least and ican't figure it out. much help will be greatly appreciated.
if anyone is on discord, i'd like to join up with one of you and we can go over the issue on there. just reply with your discord and i'll shoot you a dm