The insertion in map STL cause code 0xC0000005

Hi.I'm implementing the hash map for my own purpose, and my method for this is using a vector of maps
(vector< map<Type,Value> >) and for the reason i have trouble in this code when i'm debugging:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ValueT& operator[](const KeyT& key){
		check_expand_need();  
                auto bucket = find_or_allocate(key);
		cerr << 5 << " ";
		auto it = table[bucket].find(key);
		ValueT val = ValueT();  
                cerr<<6<< " ";
		if (it == table[bucket].end()){
                   cerr<<6.5<< " ";
                   table[bucket].insert(Pair(key, ValueT()));//where it crashed  
                   cerr<<7<<"\n";
                   _num_filled++; 
                   return val;
		}
                cerr<<7.5<<"\n";
		return it->second;
	}


And i tested on stderr, it printed up to 6.5, which means it crashed on insertion of map.

Do you have suggetions how to fix this problem ?

And my project that i want to make a hash map just like unordered_map, separate-chaining for hashing collision, but with exception that each of bucket will be a Map STL, so i wish i can know the implementation of the original one especially the hashing collision, on how they can solve the memory violation :<<

Thanks for your help.
Last edited on
It's difficult to say what the problem is without seeing the rest of the code. The only obvious mistake that I can see is that you return a reference to a local variable on line 13.
Check if you're not trying to access/change/return a member of a vector that is either out of bounds or unitialised?
@Peter87

Yes the line 13 is a warning for return a reference to a local variable, and i don't want to make the second time find operation after the map insertion to reduce the complexity of the operator[].

But the problem is that i'm not sure each of a bucket in a vector will be another STL container, it can be OK right ?
@HOOG0

I have checked the code and it cannot out of bounds, since i had a mod function just like hash map, for the buckets of a vector.

But i'm not sure each of bucket can be another STL container, and i'm only make the vector.reserve(), the bucket (or Map) i just make normal insert, so i think the problem will be the insertion, just insert the first element and crash with that error code.
Print the value of bucket to be certain that it isn't out of bounds.

Yes, you can store a map in a vector.

Just because the crash is at line 10 doesn't mean that the bug is there. It's entirely possible that you've corrupted the heap earlier. I'd start by fixing the reference to the local at line 13.

std::map has a [] operator that inserts a default-initialized value if none is present for the key. Why not use it?
1
2
3
4
5
ValueT& operator[](const KeyT& key){
    check_expand_need();  
    auto bucket = find_or_allocate(key);
    return table[bucket][key];
}
Yes the line 13 is a warning for return a reference to a local variable, and i don't want to make the second time find operation after the map insertion to reduce the complexity of the operator[].

The insert function returns a std::pair where the first member is an iterator to the newly inserted element. Use that to return a reference to the value in the map.

1
2
auto p = table[bucket].insert(Pair(key, ValueT()));
return p.first->second;


I have checked the code and it cannot out of bounds, since i had a mod function

Note that the % operator returns the remainder after division which means that if one of the operands are negative you will get a negative result.


i'm only make the vector.reserve()

Note that reserve only allocate memory, it does not construct any objects, so you need to actually resize the vector (using constructor or the resize function) or insert the elements (using push_back or similar) before you can actually use the vector elements.
Last edited on
@dhayden

I have tested the bucket and is in bound, since i got the hash value and take the mod, so the bucket isn't out of bound.

Yes my big concern is the allocation, or heap allocation that you previously mentioned. Hmm i think this piece of code can contain the bug:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
void reserve(size_t num_elems){
		size_t required_buckets = num_elems + (num_elems>>1) + 1;
		if (required_buckets <= _num_buckets) return;
		size_t num_buckets = 4;
		while (num_buckets < required_buckets) { num_buckets <<=1; }
                cerr<<1<<" ";
                vector<PairT> old_table(table); /// the temp one
                vector<PairT>().swap(table); /// deallocate
                table.reserve(num_buckets); /// make the reserve

                auto old_num_buckets = _num_buckets;
		if (table.capacity() != num_buckets) {
			vector<PairT>().swap(table);
			throw std::bad_alloc(); /// oh no :( 
		}cerr<<2<<" ";

		_num_filled  = 0;
		_num_buckets = num_buckets;
		_mask        = _num_buckets - 1;
		_max_probe_length = -1;
                cerr <<3<< " ";
                /// the table is the main thing, old_table is temporary
		for (size_t src_bucket = 0; src_bucket < old_num_buckets; src_bucket++) {
                      for(auto it = old_table[src_bucket].begin();
                           it != old_table[src_bucket].end();it++){
                                auto bucket = find_empty_bucket(it->first);
                                table[bucket][it->first]=it->second; _num_filled++;
                           }/// this is where the rehashing
		}
		vector<PairT>().swap(old_table); /// for deletion
		cerr<<4<<" ";
    }


And in stderr it print up to 4, which means go thourgh, but the insertion is still crash for the first attempt.

So can you give some suggestion about this ?

And also everyone can give me some advise too.
You want to use resize instead of reserve, and size instead of capacity.
Line 8: Why not just do table.clear()?

At line 23, are you certain that old_table contains old_num_buckets? Why are you recording old_num_buckets anyway? Why not just use old_table.size()?

Can you post the code for find_empty_bucket? Since you're in the middle of rehashing, I wonder if it looses track of how many buckets exist.
@Peter87

OK, now i get it, so which means after the allocation, the buckets contain the "nullptr", not a map itself, OK i will check it.

Ah since it was an experiment, i make a strong test case that make unordered_map TLE, and it doesn't contain negative number, so it is guaranteed the hash value cannot be negative.

Anyway that is an interesting fact so i will check it.
@dhayden

Yes, line 8, at first i used table.clear(), but i want also to free up the memory, so i have used the trick swap the "null" vector one, that i learned from overflow. But anyway there is a relation with @Peter87's advice so i will check again this problem.

Line 23, again that maybe my misunderstanding, the allocation one. I think that after make the reserve, i will have some buckets for the hash map, but i didn't know that they are "nullptr".

Ah for the the find_empty_bucket, it didn't give any error, anyway:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
size_t find_empty_bucket(const KeyT& key){
		auto hash_value = _hasher(key); 
                int offset = 0;
		auto bucket = hash_value & _mask;
		while (true) {
                      if (table[bucket].empty()){/// optimization
                          if (offset > _max_probe_length)
                              _max_probe_length = offset;
                           return bucket;
		      }
		      if (table[bucket].size() < _limit){ /// tune up
                            if (offset > _max_probe_length)
                                 _max_probe_length = offset;
                            return bucket;
		      }
		      bucket++;   offset++;
		      if (bucket==_num_buckets) bucket = 0;
	  }
}
OK thanks guys, problem is fixed now :)), i have some misunderstanding about the reserve the allocation, which cause this fallure.

But sadly :(, this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map, i only made a limit of 4 elements in a bucket, but still slow :(

Maybe the open-adressing is better.
Last edited on
OK, now i get it, so which means after the allocation, the buckets contain the "nullptr", not a map itself

It contains uninitialized memory where it can construct the maps (i.e. sizeof(std::map<Type,Value>) * table.capacity() bytes). Even if the maps are implemented using pointers those will not be null because they have not been initialized yet.

But sadly :(, this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map, i only made a limit of 4 elements in a bucket, but still slow :(

Trying to beat unordered_map (which is also a hash table) will not be easy. It has been implemented by people that know their stuff much better than you (no offense). I've heard some complaints about unordered_map, that it is forced to use a linked list for each bucket which makes it not so "cache-friendly". Using a binary search tree (which is how std::map is implemented) would have similar "problems". Anyway, I don't think it means unordered_map is terribly slow in general, especially not for small test programs where everything fits in cache.
Last edited on
this hash table is very slow, even slower than unordered_map, i want to use the fast lookup of map
If you have a good hashing function and the right number of buckets, unordered_map will be faster than map because you'll compare fewer elements.

In my experience, you usually know ahead of time roughly how many items to expect in a hash table and you can initially size it with an appropriate number of buckets. Or you know when the table is full (or nearly so) and you can adjust the number of buckets then. I aim for buckets = num_items/4.

Hash tables can be bad if hash() takes a lot more time than operator<(). For example, if the key is an array of 100 integers then you might be able to tell if one element is less than the other after comparing 1 or 2 values in the array, but hash() might always requires you to look at all 100 values.
dhayden wrote:
Hash tables can be bad if hash() takes a lot more time than operator<(). For example, if the key is an array of 100 integers then you might be able to tell if one element is less than the other after comparing 1 or 2 values in the array, but hash() might always requires you to look at all 100 values.


The < operator of many (all?) of the standard library containers do lexicographical comparison, which means it will have to step through each pair of characters in order until it finds a mismatch (or until the end of one of the containers is reached). This is unfortunately not always optimal. In the general case it would be better to first compare the size, and only step through the elements if the sizes are the same. For file paths, that are likely to have the same initial sequence, it would also make sense to compare the characters in the reverse order.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
struct PathCmp
{
	bool operator()(const std::string& path1, const std::string& path2)
	{
		if (path1.size() != path2.size())
		{
			return path1.size() < path2.size();
		}
		
		for (auto it1 = path1.rbegin(), it2 = path2.rbegin(); it1 != path1.rend(); ++it1, ++it2)
		{
			if (*it1 != *it2)
			{
				return *it1 < *it2;
			}
		}
		
		return true;
	}
};

//std::map<std::string, int> pathToIntMap;
std::map<std::string, int, PathCmp> betterPathToIntMap;
Last edited on
you can compare 8 bytes or less at a go with a 64 bit int. Should be a linear 8x speedup on the average if the strings are usually 8+ long.

What exactly are you looking to do again?
If you are only inserting a small # of known values, a DIY lookup table is going to be faster than a map. If the values are not known, DIY is going to be challenging.
Topic archived. No new replies allowed.