Hi.I'm implementing the hash map for my own purpose, and my method for this is using a vector of maps
(vector< map<Type,Value> >) and for the reason i have trouble in this code when i'm debugging:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ValueT& operator[](const KeyT& key){
check_expand_need();
auto bucket = find_or_allocate(key);
cerr << 5 << " ";
auto it = table[bucket].find(key);
ValueT val = ValueT();
cerr<<6<< " ";
if (it == table[bucket].end()){
cerr<<6.5<< " ";
table[bucket].insert(Pair(key, ValueT()));//where it crashed
cerr<<7<<"\n";
_num_filled++;
return val;
}
cerr<<7.5<<"\n";
return it->second;
}
And i tested on stderr, it printed up to 6.5, which means it crashed on insertion of map.
Do you have suggetions how to fix this problem ?
And my project that i want to make a hash map just like unordered_map, separate-chaining for hashing collision, but with exception that each of bucket will be a Map STL, so i wish i can know the implementation of the original one especially the hashing collision, on how they can solve the memory violation :<<
It's difficult to say what the problem is without seeing the rest of the code. The only obvious mistake that I can see is that you return a reference to a local variable on line 13.
Yes the line 13 is a warning for return a reference to a local variable, and i don't want to make the second time find operation after the map insertion to reduce the complexity of the operator[].
But the problem is that i'm not sure each of a bucket in a vector will be another STL container, it can be OK right ?
I have checked the code and it cannot out of bounds, since i had a mod function just like hash map, for the buckets of a vector.
But i'm not sure each of bucket can be another STL container, and i'm only make the vector.reserve(), the bucket (or Map) i just make normal insert, so i think the problem will be the insertion, just insert the first element and crash with that error code.
Print the value of bucket to be certain that it isn't out of bounds.
Yes, you can store a map in a vector.
Just because the crash is at line 10 doesn't mean that the bug is there. It's entirely possible that you've corrupted the heap earlier. I'd start by fixing the reference to the local at line 13.
std::map has a [] operator that inserts a default-initialized value if none is present for the key. Why not use it?
Yes the line 13 is a warning for return a reference to a local variable, and i don't want to make the second time find operation after the map insertion to reduce the complexity of the operator[].
The insert function returns a std::pair where the first member is an iterator to the newly inserted element. Use that to return a reference to the value in the map.
1 2
auto p = table[bucket].insert(Pair(key, ValueT()));
return p.first->second;
I have checked the code and it cannot out of bounds, since i had a mod function
Note that the % operator returns the remainder after division which means that if one of the operands are negative you will get a negative result.
i'm only make the vector.reserve()
Note that reserve only allocate memory, it does not construct any objects, so you need to actually resize the vector (using constructor or the resize function) or insert the elements (using push_back or similar) before you can actually use the vector elements.
OK, now i get it, so which means after the allocation, the buckets contain the "nullptr", not a map itself, OK i will check it.
Ah since it was an experiment, i make a strong test case that make unordered_map TLE, and it doesn't contain negative number, so it is guaranteed the hash value cannot be negative.
Anyway that is an interesting fact so i will check it.
Yes, line 8, at first i used table.clear(), but i want also to free up the memory, so i have used the trick swap the "null" vector one, that i learned from overflow. But anyway there is a relation with @Peter87's advice so i will check again this problem.
Line 23, again that maybe my misunderstanding, the allocation one. I think that after make the reserve, i will have some buckets for the hash map, but i didn't know that they are "nullptr".
Ah for the the find_empty_bucket, it didn't give any error, anyway:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
size_t find_empty_bucket(const KeyT& key){
auto hash_value = _hasher(key);
int offset = 0;
auto bucket = hash_value & _mask;
while (true) {
if (table[bucket].empty()){/// optimization
if (offset > _max_probe_length)
_max_probe_length = offset;
return bucket;
}
if (table[bucket].size() < _limit){ /// tune up
if (offset > _max_probe_length)
_max_probe_length = offset;
return bucket;
}
bucket++; offset++;
if (bucket==_num_buckets) bucket = 0;
}
}
OK thanks guys, problem is fixed now :)), i have some misunderstanding about the reserve the allocation, which cause this fallure.
But sadly :(, this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map, i only made a limit of 4 elements in a bucket, but still slow :(
OK, now i get it, so which means after the allocation, the buckets contain the "nullptr", not a map itself
It contains uninitialized memory where it can construct the maps (i.e. sizeof(std::map<Type,Value>) * table.capacity() bytes). Even if the maps are implemented using pointers those will not be null because they have not been initialized yet.
But sadly :(, this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map, i only made a limit of 4 elements in a bucket, but still slow :(
Trying to beat unordered_map (which is also a hash table) will not be easy. It has been implemented by people that know their stuff much better than you (no offense). I've heard some complaints about unordered_map, that it is forced to use a linked list for each bucket which makes it not so "cache-friendly". Using a binary search tree (which is how std::map is implemented) would have similar "problems". Anyway, I don't think it means unordered_map is terribly slow in general, especially not for small test programs where everything fits in cache.
this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map
If you have a good hashing function and the right number of buckets, unordered_map will be faster than map because you'll compare fewer elements.
In my experience, you usually know ahead of time roughly how many items to expect in a hash table and you can initially size it with an appropriate number of buckets. Or you know when the table is full (or nearly so) and you can adjust the number of buckets then. I aim for buckets = num_items/4.
Hash tables can be bad if hash() takes a lot more time than operator<(). For example, if the key is an array of 100 integers then you might be able to tell if one element is less than the other after comparing 1 or 2 values in the array, but hash() might always requires you to look at all 100 values.
Hash tables can be bad if hash() takes a lot more time than operator<(). For example, if the key is an array of 100 integers then you might be able to tell if one element is less than the other after comparing 1 or 2 values in the array, but hash() might always requires you to look at all 100 values.
The < operator of many (all?) of the standard library containers do lexicographical comparison, which means it will have to step through each pair of characters in order until it finds a mismatch (or until the end of one of the containers is reached). This is unfortunately not always optimal. In the general case it would be better to first compare the size, and only step through the elements if the sizes are the same. For file paths, that are likely to have the same initial sequence, it would also make sense to compare the characters in the reverse order.
you can compare 8 bytes or less at a go with a 64 bit int. Should be a linear 8x speedup on the average if the strings are usually 8+ long.
What exactly are you looking to do again?
If you are only inserting a small # of known values, a DIY lookup table is going to be faster than a map. If the values are not known, DIY is going to be challenging.