Exit condition for 'std::unordered_map' iterator loop with a step size > 1

Suppose I have an std::unordered_map containing a large number of elements. I need to process each element and I want to speed up the overall process by using n threads, so that each thread only needs to process map.size() / n elements. The idea is that thread #id will start at an offset of id, where id is the zero-based thread index. Also, all threads need to advance their iterators with a step size of n. The iterator can be advanced by k postions by using std::next(iter, k). So, the per-thread loop would look pretty much like this:
1
2
3
4
5
6
7
8
9
10
static std::unordered_map<K, V> g_input;

void thread_main(void)
{
    for (std::unordered_map<K, V>::const_iterator iter = std::next(g_input.cbegin(), id);
        iter != g_input.cend(); iter = std::next(iter, n))
    {
        /* ... */
    }
}


But: What is the proper (recommended) way to check the loop exit condition, in this scenario?

Clearly, iter != map.cend() is wrong for a step size greater than 1.

I think that an invocation of std::next(iter, k) that would move the iterator past map.cend() is undefined behavior 😨

Also, a check like iter >= map.cend() is not supported...
Last edited on
If you use std::ranges::next (added in C++20) instead of std::next then you can pass the end iterator as a third argument. Then it will return the end iterator instead of causing undefined behaviour.

1
2
3
4
5
for (std::unordered_map<K, V>::const_iterator iter = std::next(g_input.cbegin(), id);
    iter != g_input.cend(); iter = std::ranges::next(iter, n, g_input.cend()))
{
    /* ... */
}

https://en.cppreference.com/w/cpp/iterator/ranges/next
Last edited on
Note that if id > g_input.size() then you would still run into the same problem when initializing iter so if you haven't ensured this cannot happen before entering the loop then you might want to also change the first std::next in the same way.
Last edited on
I need to process each element and I want to speed up the overall process


Have you considered std::for_each using a parallel execution policy?
https://en.cppreference.com/w/cpp/algorithm/for_each
https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag
https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t
What is in the unordered_map? What is the key type, value type; how big is each thing? How expensive, processing-wise, is the body of each iteration? I think we need more context to get a better idea of what is going on.
Maybe that fancy execution_policy stuff could help, but I imagine you will get tons of cache misses while iterating through an unordered_map.

If you could pre-sort the stuff in the unordered_map into a regular array/vector and then process on that, it may be faster.
Last edited on
If you use std::ranges::next (added in C++20) instead of std::next then you can pass the end iterator as a third argument. Then it will return the end iterator instead of causing undefined behaviour.

Didn't know about std::ranges::next(). Exactly what I was looking for 👍


Have you considered std::for_each using a parallel execution policy?

Nope. But I will give it a try!


What is in the unordered_map? What is the key type, value type; how big is each thing? How expensive, processing-wise, is the body of each iteration? I think we need more context to get a better idea of what is going on.

It contains hash values (uint64_t hash[6]) as keys, and the corresponding input strings as values.

Size is about a few millions.

I need to compare each hash value (key) to every other hash value, in order to find the pair with minimal hamming distance.

So, I actually have an "outer" and "inner" loop. The "inner" iterator starts at std::next(iter_outer).

Multi-threading the "outer" loop, as described above, gives me a decent speed-up of ~ ×10, on my 16 core machine.
Last edited on
If you could pre-sort the stuff in the unordered_map into a regular array/vector and then process on that, it may be faster.

Indeed, using separate std::vectors for the keys (hash values) and for the strings gave me another ×10 speed-up 😲

For my application, a map was not absolutely required.

(I didn't expect that iterating over an std::vector is so much faster, compared to an std::unordered_map)
Last edited on
C++23:
for( auto&& [k,v] : g_input | std::views::as_const | std::views::stride(3) ) { /* ... */ }
https://en.cppreference.com/w/cpp/ranges/stride_view
Topic archived. No new replies allowed.