I am amazed at performance promotion by using contiguous data structure (std::vector) instead of linked data structures.
I often have no choice but to use bounded non-sequential index (e.g. 1,5, 10, 12 ..) for data collections. In such a situation, associative array is well-known technique as the below,
1 2 3 4 5
std::unordered_map<int, Data> dat;
dat[1]= Data();
dat[5]= Data();
// ... and so on
// and loop access to dat
When I change it into contiguous way, I can obtain dramatic speed up for (sequential) accessing each element (probably due to cache efficiency)
1 2 3 4 5 6 7 8
std::map<int, int> index;
std::vector<Data> dat;
int ind = 0;
index[1] = ind; ind++; dat.push_back(Data());
index[5] = ind; ind++; dat.push_back(Data());
// ... and so on
// and loop access to dat via index;
My question is, is there more smart way to construct bounded non-sequential index mapping to sequential index, not by std::map(or std::unordered_map) but by cheaper way (such as hash functions).
(index range is bounded R: [min_index : max_index] (interval is irregular), R->[0,1,2,....])
Yes, it is. std::unordered_map is hash techiniques (1) hasing "key" by hash function (2) process hash table (collision check/store etc).
I wonder if my first way is to do something with unnecessary process (e.g. is hash table needed, otherwise just hash function works well? do more suitable hash table exist?).
I appologize for my ambiguous questions. But, I think this indexing problems seem common to programming. I wonder if there is more sophisticated way.
I am willing to consider the data structure, compressed sparse row (CSR) as you told.
BTW, I have noticed that some (originally) linked data structures are proposed to be represented as contiguous form, which not only save memory space, but also lead to efficient computations based on sequential & cache efficiency. But, such a contiguous form seems to entail usage limitations (e.g. size of elements is previously known, dynamic construction is inefficient, random access is inefficient, just for sequential access etc.)
I would like to be able to use them properly (with knowledge of trade-off and limitation) in case-by-case manner.
(If someone knows summary or review of such a data structure, I hope you tell me)