I don't understand what does this piece

I looks like it takes a string and makes sure the string exists in the linked list one character at a time, each node holding a character of the string.

line 11 checks if the character is in the unordered_map. If it is, don't do anything, if not, then add it.

This can be useful for keeping a dictionary of words. You can go through the linked list and check if you have a certain word available or not.

Good ways to understand code are

a) trace through the code in the debugger if the code is part of a compilable program. Look at the code flow and the values of the various variables etc.

b) trace through the code using a pen & paper. Follow the code as it would be executed and keep track of variables etc to see what is being done.

Do you guys know why the program is using an unordered_map when the second member of the map isn't used ?

Last edited on

Unordered map is just convenient for keeping an array of characters. It also let's you follow the characters to see what string you create from it.

For example, say you wanted to see if you had the word "hello" in your tree.

First, search the starting node (root node) for 'h', if you find it, enter the Tree node associated with that character. Then search for 'e' in that node, if you find it, enter the Tree node associated with that character, and so on.

Basically, every character has a Tree node which will lead you to all the other characters that come after it which will replicate the strings as they were inserted.

The second member is used.

L12 and L14 both refer to the second member. The syntax

map[x] = y

means for the key (char in this case) specified by x, set the value (ie the second member Tree*) to y.

y = map[x];

means for the key specified by x, set y to the value (ie the second member)

In both cases, if x isn't already an existing key, then the item is created.

Thanks guys for very detailed answers, another question is, I modify the program so it can add the meaning to the word (like a dictionary):

struct Tree{
    unordered_map<char, Tree*> map;
    string meaning;
};

void insert(Tree*& root, const string& str, const string& meaning) {
    Tree* temp = root;
    for (int i = 0; i < str.length(); i++) {
        char x = str[i];
        if (temp->map.find(x) == temp->map.end())
            temp->map[x] = newNode();

        temp = temp->map[x];
    }
    temp->meaning = meaning;
}

It means each character of a word is stored in a node of the tree, but for the meaning, where does the tree store them ?

Last edited on

This *can* work, but you need a boolean value as well. The boolean value will be to indicate whether the char is the last letter in the word.

So if you input "hello", 'o' would have the bool set to true and you could the string a value. Note that a bool value of true doesn't mean there aren't more letters after - there could be another word inserted that's longer, for example:

"at" and "attic".

Thank you, I understand what you're saying, so I modify my program:

struct Tree{
    unordered_map<char, Tree*> map;
    string meaning;
    bool isEndOfWord;
};

void insert(Tree*& root, const string& str, const string& meaning) {
    if (root == NULL)
        root = newNode();

    Tree* temp = root;
    for (int i = 0; i < str.length(); i++) {
        char x = str[i];
        if (temp->map.find(x) == temp->map.end())
            temp->map[x] = newNode();

        temp = temp->map[x];
    }

    temp->meaning = meaning;
    temp->isEndOfWord = true;
}

int main()
{
    Tree* root = NULL;

    insert(root, "dictionary", "a book giving information on particular subjects");
    insert(root, "computer", "a device designed to perform calculations");
}

Can you guys explain when running the main function, what will the tree look like? What is the tree's root storing ? Thanks for your help

The root will be storing the first letter of every single word you insert. Each of those characters will then have its own branch leading down to its word and other words which begin with the same letters.

For example, if you input "at" "attic" "hello" "coffee" "coke":

https://pasteboard.co/sQvWCjwDL0Lc.png

At least, I believe that's what it would look like, I haven't run the code.

Last edited on

Are these words in the same tree or different trees ?

Last edited on

Duthomhas (13310)

Adding a meaning marks the end of word. You don't need an extra boolean value.

Also, the root node has no associated character or meaning. All words begin with children of the root.

I have left this thread alone a few times without expressing the disquiet deep, deep in my soul for seeing a C++ container used with manual memory management. It hurts.

Especially as an unordered_map automagically does it for you. Whomever wrote that original piece of code is actually going out of his|her way to avoid letting C++ do things for you.

Here is how I would do it. Pretty sure this is C++11, but the oldest I could check it against was C++14. You should be compiling against the latest standard you can anyway.

#include <string>
#include <unordered_map>
#include <vector>

using Meaning = std::string;  // I would personally use a more interesting structure...
                              // but for this example simple strings will do.
struct Words
{
  using Children = std::unordered_map <char, Words> ; // (behold, automatic self-referential memory management!)
  using Meanings = std::vector <Meaning>            ; // (because words often have multiple meanings)

  Children children;
  Meanings meanings;
};

As a matter of The Stupid™ making life stupid, the GNU Standard Library has a bug in this regard. If you are compiling against it (for example, if you are compiling on Linux) then it won’t work. Either use std::map or use Boost’s implementation.

1
2
3

#include <map>
...
  using Children = std::map <char, Words> ;

1
2
3

#include <boost/unordered_map.hpp>
...
  using Children = boost::unordered_map <char, Words> ;

Maybe someday they’ll fix it. Anyhow...

This tree structure is called an "n-ary tree", meaning that each node may have any number of children, including zero.

A node with zero children is a "leaf", of course, which clearly signifies the end of a word. But as already said in this thread, a word also may end somewhere other than a leaf node.

Your desire to add information about the meaning of a word serves a good purpose: the presence of associated meaning(s) also clearly signifies the end of a word.

With those preliminaries out of the way, let us revisit the Insert function and fix a couple things:

void Insert( Words & words, const std::string & word, const Words::Meanings & meanings )
{
  // Seek (add, if necessary) nodes for each letter in the word.
  // In the end, node will point to the final letter in the word
  // (which may or may not be a leaf node).
  Words * node = &words;
  for (auto c : word)
  {
    Words & parent = *node;          // the current node is a parent node
    node = &(parent.children[ c ]);  // (or if it isn't already then make it one)
  }
  // Now that we have our 'last letter in the word' node, add the
  // word's meanings. Again C++ makes life really, really easy here.
  node->meanings = meanings;
}

That parent node bit makes things a little easier to grok, but it could have just as easily been written as:

1
2

  for (auto c : word)
    node = &(node->children[ c ]);

It compiles to exactly the same thing. The important part is the std::unordered_map::operator [], which will automatically insert a new node if the argument index does not already exist. Convenient, since that is exactly the behavior we want for the Insert function.

Finding a word is similar, but with one significant difference:

const Words::Meanings & FindWord( Words & words, const std::string & word )
{
  // If the 'word' is not in our 'words', then we have no meanings to return.
  // We cannot return a null reference, so here is our empty list of meanings:
  static Words::Meanings no_meanings;

  // Finding the last node in the word is similar to before, except now
  // we need to be a little more careful not to add a node if it isn't
  // already there.
  //
  // The original code made the find and add separate actions, but all that
  // did was make the lookup happen _twice_. Instead we will just lookup once.
  //
  Words * node = &words;
  for (auto c : word)
  {
    auto child = node->children.find( c );
    if (child == node->children.end())
      return no_meanings;
    node = &(child->second); // (this is how std::unordered_map iterators work)
  }
  // If we get this far then we can return the word's list of meanings.
  // (If it is a word, it will have a list of meanings. If it is NOT a word,
  // it will have an empty list of meanings. Get it? Life is easy.)
  return node->meanings;
}

While I am making a bit of a stink about performing two lookups, in the case of unordered_map it doesn’t actually matter much. A lookup is an O(1) (constant time) operation.

Let us put it all to the test!

#include <iostream>

int main()
{
  Words words;
  Insert( words, "hello",
  {
    "excl. A greeting, usually spoken.",
    "n.    Reference to such a greeting.",
  } );
  Insert( words, "hell",
  {
    "n.    Spiritual suffering, or a place where such suffering occurs.",
    "n.    The place where wicket souls go to suffer after death (religious).",
    "excl. Utterance used to express annoyance, surprise, and/or emphasis.",
  } );
  Insert( words, "jello",
  {
    "n.    a fruit-flavored gelatin dessert made from a commercially prepared powder. See \"Jell-O\"."
  } );
  Insert( words, "Jell-O",
  {
    "      ...is delicious. Eat some:\n"
    "        1 pkg Green Jell-O\n"
    "        1 cup cottage cheese\n"
    "        8 oz  crushed pineapple\n"
    "        Make the jello as per instructions.\n"
    "        Refrigerate for 30 minutes to 1 hour.\n"
    "        Stir in cottage cheese and pineapple.\n"
    "        Refrigerate until fully set.\n"
    "        Serve on a lettuce leaf."
  } );

  auto PrintMeanings = [&words]( auto word )
  {
    auto meanings = FindWord( words, word );
    if (meanings.empty())
      std::cout << "\n\"" << word << "\" is not a word in my vocabulary.\n";
    else
    {
      std::cout << "\n" << word << ":\n";
      for (auto meaning : meanings)
        std::cout << "  " << meaning << "\n";
    }
  };
  PrintMeanings( "hello" );
  PrintMeanings( "hell" );
  PrintMeanings( "jello" );
  PrintMeanings( "jell" );
  PrintMeanings( "Jell-O" );
}

Enjoy!

I have left this thread alone a few times without expressing the disquiet deep, deep in my soul for seeing a C++ container used with manual memory management. It hurts.

Seconded! I guess this code was found somewhere on the Internet...

The moral of the story is that code found on the Internet may not be the 'right' way to do things and should be treated with suspicion until proven innocent.

Last edited on

Adding a display function for the ADT can be instructive. Consider:

#include <string>
#include <unordered_map>
#include <vector>
#include <iostream>
#include <cctype>

using Meaning = std::string;

struct Words {
	using Children = std::unordered_map <unsigned char, Words>;
	using Meanings = std::vector <Meaning>;

	Children children;
	Meanings meanings;
};

void Insert(Words& words, const std::string& word, const Words::Meanings& meanings) {
	auto* node { &words };

	for (unsigned char c : word)
		node = &(node->children[static_cast<unsigned char>(std::tolower(c))]);

	node->meanings = meanings;
}

const Words::Meanings& FindWord(const Words& words, const std::string& word) {
	static Words::Meanings no_meanings;

	const auto* node { &words };

	for (unsigned char c : word)
		if (const auto child { node->children.find(static_cast<unsigned char>(std::tolower(c))) }; child != node->children.end())
			node = &(child->second);
		else
			return no_meanings;

	return node->meanings;
}

void display(const Words& words) {
	if (!words.meanings.empty())
		std::cout << "* ";

	for (const auto& [let, child] : words.children) {
		std::cout << let;
		display(child);
	}
}

int main() {
	Words words;

	Insert(words, "hello",
		{
		  "excl. A greeting, usually spoken.",
		  "n.    Reference to such a greeting.",
		});

	Insert(words, "hell",
		{
		  "n.    Spiritual suffering, or a place where such suffering occurs.",
		  "n.    The place where wicket souls go to suffer after death (religious).",
		  "excl. Utterance used to express annoyance, surprise, and/or emphasis.",
		});

	Insert(words, "jello",
		{
		  "n.    a fruit-flavored gelatin dessert made from a commercially prepared powder. See \"Jell-O\"."
		});

	Insert(words, "Jell-O",
		{
		  "      ...is delicious. Eat some:\n"
		  "        1 pkg Green Jell-O\n"
		  "        1 cup cottage cheese\n"
		  "        8 oz  crushed pineapple\n"
		  "        Make the jello as per instructions.\n"
		  "        Refrigerate for 30 minutes to 1 hour.\n"
		  "        Stir in cottage cheese and pineapple.\n"
		  "        Refrigerate until fully set.\n"
		  "        Serve on a lettuce leaf."
		});

	const auto PrintMeanings { [&words](const auto& word) {
		if (const auto meanings { FindWord(words, word) }; meanings.empty())
			std::cout << "\n\"" << word << "\" is not a word in my vocabulary.\n";
		else {
			std::cout << "\n" << word << ":\n";
			for (const auto& meaning : meanings)
				std::cout << "  " << meaning << "\n";
		} } };

	display(words);

	return 0;

	PrintMeanings("hello");
	PrintMeanings("hell");
	PrintMeanings("jello");
	PrintMeanings("jell");
	PrintMeanings("Jell-O");
}

which displays:


hell* o* jello* -o*

where * indicates a meaning is available.

NB This would usually be all wrapped into a class with .insert(), .find() etc methods.

Last edited on

Duthomhas (13310)

You changed it in a significant way: “jell-o” is not a word. “Jell-O” is. Capitalization matters.

Also, I bet this would be an interesting place for coroutines. Maybe I’ll mess with that in a bit (now that I’ve installed MSVC 2020, heh).

Jell-o is not an English word... It's Jelly :)

jonnin (11497)

its a word like pepsi is a word, a brand name adopted into the extended language, or a proper noun that is near universally known.

I got nothing on the container. Storing strings this way may save a little space of repeated sequences but was it worth it with all the weirdness? You would probably save more space by tying the same definition string to multiple synonyms with a token that can be replaced with the word in question (eg #w# ... "n. A greeting: Joebob gave a loud "#w#" when entering the room". replace with hello, wazzup, howdy, ... etc)
this saves like 30 characters every repeat, while hell vs hello saves 1.

Duthomhas (13310)

“Jell-O” is a proper noun (a registered trademark), and “jello” is a common noun (a generic trademark). They are distinct words. Google around "trademark degeneration" for more.

Honestly, storing a list of words this way doesn’t save you much for the grief. Just have a list of words in a hash map. Or at the very simplest, a deque you can binary search.

Point is, there’s no point trying to optimize something outside of a proper use case. The current use case is academic study.

Optimizing strings is a sticky problem, but I bet storing definitions with a dictionary could save a lot of space by simply using an indexed lookup for each word in the sentence.

[edit]
Also, I have learned C++ coroutines do not play nice with recursion, and according to Mr Chen, aren’t likely to any time soon. It can be done, but may not be worth the grief — especially as an recursive routine can be rewritten as iterative... which is easier said than done for anything non-trivial.

Last edited on