hash table, help with transfer

Pages: 12
Hi, I have a hash table consisting of char string. I do not know how to redo the numbers so I can upload to function Insert (9798), and compare them. Please help :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include <stdio.h>
#include <string.h>
#include "list"
#include <stdlib.h>

class List
{
	private:
		struct ListItems
		{
			char* data;
			ListItems* next;
			ListItems* prev;
		};
		ListItems* head;

	public:
		List();
		bool Search(char* x);
		void Insert(char* x);
};

List::List()
{
	head = NULL;
}

bool List::Search(char *x)
{
	ListItems* p;
	for (p = head; p != NULL; p = p->next)
		if (p->data == x)
			return true;

	return false;
}

void List::Insert(char *x)
{
	ListItems* p;
	p = new ListItems;
	p->data = x;
	p->next = head;
	if (head != NULL)
		head->prev = p;
	head = p;
	p->prev = NULL;
}

/*
	Ukazka, jak implementovat hashovani se separatnim retezenim
	Program k clanku pro Linuxsoft
	Autor: Petr Sklenicka
*/

int Hash(char* data);
void InsertHash(char* data);
bool SearchHash(char* data);

const int n = 50;				// velikost tabulky
List HashTable[n];				// hashovaci tabulka

int main()
{
	// ukazka ulozeni a nasledneho vyhledani zaznamu

	
	InsertHash("some test");
	bool p = SearchHash("some test");
	p == true ? printf("Nalezeno\n") : printf("Nenalezeno\n");

	return 0;
}

int Hash(char* data)
{
	int i, hashKey;

	for (int i = hashKey = 0; i < 10; i++)
		hashKey = (3 * hashKey + data[i]) % n;
	
	return hashKey;
}

void InsertHash(char* data)
{
	int hashKey = Hash(data);
	HashTable[hashKey].Insert(data);
}

bool SearchHash(char* data)
{
	int hashKey = Hash(data);
	return HashTable[hashKey].Search(data);
}
I think you mean to process strings, but you are not. You are processing char pointers.

Conisder this code:
1
2
InsertHash("some test");
bool p = SearchHash("some test");

Functions InsertHash and SearchHash may or maynot be passed the same value. The compiler may chose to use String Pooling and use the same string in which case the two functions will receive the same value, or the compiler may not use that sort of optimisation and give you two different numbers.

The linked list class doesn't store strings either, it just stores char pointers.
@kbw

Actually, it looks ok, because if you look at Lines 77-82 (implementation of Hash), the OP only calculates the hash on data[i], the individual characters, so that's fine. More troubling is the hardcoding of 10 on Line 79.

As for storing char pointers, that's ok, too, if the OP isn't trying to mimic the value-semantics of STL.

@derata
The only problem I had when compiling your program was:

warning: deprecated conversion from string constant to 'char*'

on two lines.
But if I convert all the char* to const char*, the warning goes away, and your code seems to run fine (as expected).

I don't quite understand your question, though.

Are you asking about how to update Insert() so it works even if you have a Hash-collision?
(Excuse my English, I'm from Czech Republic)
Thank you. I fixed it differently. The program now reads characters from the file. But knows his words, so that puts each character into its own list. But I do not want. I need to be one word in one list. And if the word is repeated by adding it to the list.
For example: "Mouse, Mouse: Car. Computer," contains a list of Mouse, List of two computer ...

Use it for calculating the entropy so I need to know the number of different words and the same number of words.
I thought that distinguish the following words:
But it does not work, because each letter written elsewhere. How to solve this problem?


1
2
3
4
character = file->get(); //read character
   else
      return true; //other end of the word
   if (((character>'a')&&(character<'z'))||((character>'A')&&(character<'Z')))


function hash table, I use words as ASCII:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#include <stdio.h>
#include "list"
#include <stdlib.h>
#include <iostream>
#include <string>
#include <fstream>
using namespace std;

class List
{
private:
	struct ListItems
	{
		int data;
		ListItems* next;
		ListItems* prev;
	};
	ListItems* head;

public:
	List();
	bool Search(int x);
	void Insert(int x);
};

List::List()
{
	head = NULL;
}

bool List::Search(int x)
{
	ListItems* p;
	for (p = head; p != NULL; p = p->next)
		if (p->data == x)
			return true;

	return false;
}

void List::Insert(int x)
{
	ListItems* p;
	p = new ListItems;
	p->data = x;
	p->next = head;
	if (head != NULL)
		head->prev = p;
	head = p;
	p->prev = NULL;
}

int Hash(int* data);
void InsertHash(int data);
bool SearchHash(int data);

const int n = 1000000;				// velikost tabulky
List HashTable[n];				// hashovaci tabulka

int main()
{
    char str[256];
	cout << "Zadejte jmeno souboru: ";
    cin.get(str,256);
	
	ifstream soubor;

	soubor.open (str,ifstream::in );
	
	if (soubor.is_open())
	{
		cout << "VTisk reteyce char ye souboru primo bo to neumim vloyit do tabulky" << endl;

		while (!soubor.eof())
		{	    
			    int word;
                int character = soubor.get();       // get character from file
                if (soubor.good())
                cout << character;
				word = character;
				
				InsertHash(word);
		}
	  cout << endl;
	}

	// ukazka ulozeni a nasledneho vyhledani zaznamu

	bool p = SearchHash(98);
	p == true ? printf("Find\n") : printf("Not find\n");

	return 0;
}

int Hash(int data)
{
	int i, hashKey;


	return data;
}

void InsertHash(int data)
{
	int hashKey = Hash(data);
	HashTable[hashKey].Insert(data);
}

bool SearchHash(int data)
{
	int hashKey = Hash(data);
	return HashTable[hashKey].Search(data);
}


Last edited on
ok, I'm trying to understand your new code

so far, I've just learned a little Czech
znak == character
slovo == word

I will look at it some more, but you know you can accomplish what you want with stl map, right?

Also, can you post a sample text file that you are using to test and what you expect the output to be after you run your program?
I do not know how this is done using maps :( as you have some sample code?

Using a simple input file. test.txt 'Mouse, car, mouse"
I have a problem with the fact that each character is stored separately. I need to save the whole word as an index of the table, another index of the next word. Same words, same index or the same word + +

example

mouse - index 1 , (twice)
car - index 2

na výstupu očekávám celkový počet slov, a počet stejných slov. To be able to perform calculations of entropy

I do not know if I explained it well in English

The program sometimes works incorrectly. This is probably a mistake the size of a table and insert a character in ASCII
Last edited on
ok - this should be enough for you to get what you need with a little modification

you do, however, need to read up and understand stl map, vector, and iterators

I am posting this code because with your initial code for Hash, I assume that you are either an advanced student or you have had other programming languages before - so I expect you to be able to understand this code with a little study. If you have questions, please post.

junk.txt
Mouse Mouse Car Computer Computer


output
Car 2
Computer 3
Computer 4
Mouse 0
Mouse 1


main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
#include <string>
#include <fstream>
#include <map>
#include <vector>

using namespace std;

struct Position
{
  vector< unsigned > m_positions;
};

int main()
{
  map< string, Position > myMap;  // myMap is like a hash whose key is string, value is Position

  string word;
  unsigned idx = 0;  // the line-number of junk.txt starting from 0
  ifstream ifs;
  ifs.open( "junk.txt", ifstream::in );

  if ( ifs.is_open() ) {  // open file
    while ( !ifs.eof() ) {  // loop until eof 
      char character = static_cast<char>( ifs.get() );  // cast int back to char
			
      if ( ifs.good() ) {
        if ( isalpha( character )) { // still same word
          word.append( 1, character );  // add the character to end of word
        } else { // new word
          if ( myMap.find( word )==myMap.end() ) {  // not found
            Position p;  // instantiate a position record
            p.m_positions.push_back( idx );  // add idx to the end of the vector
            myMap[ word ] = p;  // assign (copy) the value p to the key word in myMap
          } else { // found
            myMap[ word ].m_positions.push_back( idx );  // add idx to the end of the vector
          }
          word = "";  // reset because we're finished with word  
          idx++;
        }
      }
    }
  }

  // output ---------------------------------------------------------------------
  map< string, Position >::const_iterator it1;  // declare an iterator for looping over map
  vector< unsigned >::const_iterator      it2;  // declare an iterator for looping over vector

  for ( it1=myMap.begin(); it1!=myMap.end(); ++it1 ) {  // iterate over key, value in myMap
    const string&   key   = it1->first;  // it1->first points to the key
    const Position& value = it1->second;  // it2->second points to the value
    for ( it2=value.m_positions.begin(); it2!=value.m_positions.end(); ++it2 ) {  // iterate over m_positions
      cout << key << " " << *it2 << endl;  // it2 points to the value (item in the vector)
    }
  }
  return 0;
}
Last edited on
Thank you very much. I'll try to remake it according to my needs. I hope I can handle it. If I had a problem I do the query.

Last edited on
np - gl

here are some useful links:

http://www.cplusplus.com/reference/stl/vector/
http://www.cplusplus.com/reference/stl/map/

vector is a contiguous chunk of memory, but it may grow - so it's a good replacement for array[]

map is a red-black tree, which technically is not a hashmap, but functions in a similar way
it allows you to map an object to another object ( string to Position ) in our case
Last edited on
Hi,I do not know why but I can send PM for you.

Could you comment on that code in detail?. To my understanding the code faster
ok - I have added more comments

please ask specific questions if there is something you do not understand
I do not know how to translate into Czech "instantiate a position record"


1
2
3
4
5
if ( myMap.find( word )==myMap.end() ) {  // not found
.........
} else { // found
            myMap[ word ].m_positions.push_back( idx );  // add idx to the end of the vector
          }


I do not know if I understand this: If I find the word. Give him the position. Assign the key value.
else:?

I have read something about the iterator and undestand. It is something as poiter for container. Listing of values ​​works: IT1 + + indicates a memory which is followed


Your code is better but harder for me to understand him :D :D
Last edited on
yes, iterators are very similar to pointers

you can think of myMap.end() as NULL so Line 1 above is saying that when I look for word in myMap, if the result is NULL, I didn't find it

so in that block you have in Line 2, I must initialize myMap[ word ] because right now, there is nothing there (not found). I initialize it by creating a Position instance and I add idx to the end of the vector inside this Position. By my Line 34, I initialize the value entry at myMap[ word ] by assigning my instance to it.

myMap[ word ] = p; // assign (copy) the value p to the key word in myMap

calls the assignment operator, but it actually copy-constructs p and puts it into myMap[word]

On my Line 36, it is simpler, because at myMap[word] (found), I already have a Position record there. Since it's there already, I can just add the new idx to the end of the vector:

myMap[ word ].m_positions.push_back( idx ); // add idx to the end of the vector
"instantiate a Position record"

ok - Lines 9-12 define a Position struct, but no memory is allocated/used

Line 32 actually allocates memory for an automatic/local variable of type Position called p
This is "instantiating a Position struct (or record)"
Last edited on
may someone help me w/hangman
idx = index?

So I understand the code (good start). In the morning, I'll write a function to list the number of words. and function of the number of identical words
(We have 2 in the morning)
Ps: I do not know what wrong with the PM ...
yes, idx is short for index

great - glad to hear that you understand it better - gl!

it's 7AM here 8^P I will need to sleep soon, too...
How to modify this section to print only the words that occurred the first time? Then "if" I would use to list quantity
I'm trying it this way. When the input "mouse mouse car"

1
2
3
4
 for ( it1=myMap.begin(); it1!=myMap.end(); ++it1 ) {
    const string&   key   = it1->first;
    const Position& value = it1->second;
    for ( it2=value.m_positions.begin(); it2!=value.m_positions.end(); ++it2 )


output would look like:

mouse - 2*
car - 1*
Total word - 3


The same quantity of words trying to use this:
1
2
3
4
5
6
if (key == key) 
		  {
			  int quantity = 1;
			  quantity++;
		      cout << quantity;
			 	 cout << key << " " << *it2 << endl; 





Last edited on

It's hard for me to learn how to work with these functions. I wanted to solve the problem of entropy differently than binary tree. But I guess I'll have to finish the tree. I do not have much time (1 day: D)
Pages: 12