I was writing an abstraction for an ICU resource bundle (UTF-8 text file containing composite data). I would like to preserve the order of the elements found in the file for diff purposes but I also would like to provide efficient look up for a given key. I was originally using a std::list, which worked fine however look ups were linear. Rather than implementing a cross reference with hashes, I modified it to use a boost::unordered_map.
The "unordered map" still appears to be sorting (or at least changing the order of) my data. Is that normal or am I inserting the elements out of order?
unordered_map doesn't maintain the order of insertion. Unordered in this case means that the observable order of elements (i.e. when you enumerate them) is unspecified and arbitrary. In fact, I would expect that the order of elements in an unordered_map can change during the lifetime of the map, due to rehashing when resizing the map (that's implementation dependent though)
However...
another guy said:
Read about Boost.Multiindex. It gives you an opportunity to create a container that has both access to data by key (like std::map) and sequenced acess to data (like std::list).
//...
#include <string>
#include <boost/shared_ptr.hpp>
//...
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/identity.hpp>
//...
class Table : public Base {
// Boost.MultiIndex is used in this context to provide an interface for
// logarithmic look-up while preserving the sequence of insertion (desired
// for diff purposes)
typedef boost::multi_index::multi_index_container<
boost::shared_ptr<Base>,
boost::multi_index::indexed_by< // index
// sequenced (ordered of insertion)
boost::multi_index::sequenced<>, // 0
// sorted by boost::shared_ptr<Base> ?
boost::multi_index::ordered_unique< boost::multi_index::identity< boost::shared_ptr<Base> > > // 1
>
> value_type;
// enumeration for value_type::nth_index
enum value_type_indexes {
SEQUENCED,
SORTED
};
//...
protected:
value_type contents;
//...
};
Just for reference, I was trying to use BOOST_FOREACH for [sequenced] traversal but have found a few major oversights:
//#include <boost/foreach.hpp>
//...
// note that BOOST_FOREACH did compile below, but was not properly extended to support
// this custom container. BOOST_FOREACH always iterated through the 0 indexed interface,
// which introduced the subtle limitation that the table could not be traversed using the SORTED interface
/*BOOST_FOREACH( const value_type::nth_index<SEQUENCED>::type::value_type & i, contents )
{
// i is the contained element (shared_ptr<Base>)
}*/
// the following two examples traverse the table contents using either interface
for( value_type::nth_index<SEQUENCED>::type::iterator i( contents.get<SEQUENCED>().begin() );
i != contents.get<SEQUENCED>().end();
++i )
{
// i is an iterator to the contained element (shared_ptr<Base>)
}
//...
for( value_type::nth_index<SORTED>::type::iterator i( contents.get<SORTED>().begin() );
i != contents.get<SORTED>().end();
++i )
{
// i is an iterator to the contained element (shared_ptr<Base>)
}
Here's an example insertion:
1 2 3 4 5
boost::shared_ptr<Base> resource;
//...
// note that you do not have to use a specific interface here
//contents.get<SEQUENCED>().push_back( resource );
contents.push_back( resource );
Now, if I understand this correctly, the container (contents) has two interfaces: sequenced and sorted by element (shared_ptr<Base>).
Is there a way to sort the second one by a string member in the Base class? For example, if the element type were just Base (not shared_ptr<Base>), I could provide a less than operator.
Perhaps there's a way to specify my own function object for sorting, similar to a standard associative container. Any help is appreciated; I'll keep reading the documentation for now.
EDIT: I've shown above a way to avoid the hard-coded constants for the indexes, using multi_index::tag<>.
All you do is create an empty struct; tag<> just needs some arbitrary type to differentiate on.
Thanks! The tags make sense; I've seen something similar before.
The only things that seem to really obfuscate my code are the shared pointers. I'm using them for polymorphism, although they may not be entirely necessary. I don't think I can use boost::multi_index::member because of the extra level of indirection the pointers add.
Ok, just FYI from my snippets above, to remove the enumeration I added:
1 2
struct sequenced_tag {};
struct sorted_tag {};
...and replaced all occurances of the enumorators with the cooresponding tag, and changed nth_index to just index.
What I have works, it's just not as clean as I would like. I think type erasure is cool and all, but I'm not convinced the final source would be cleaner. For as simple as the data is, it might be worthwhile to create a single unified type.
Well, I'll work with it a little bit and maybe branch off and try to incorporate type erasure.
Here's a sample look-up, which is kind of nuts because of the shared pointers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
//...
virtual boost::shared_ptr<Base> find( const std::string & key ) const
{
// the container's find method uses the comparison functor less_by_name,
// which expects two boost::shared_ptr<Base> arguments. create a
// dummy object with the target name and pass it to find
boost::shared_ptr<Base> p( new Dummy( key ) );
typedef value_type::index<sorted_tag>::type::const_iterator const_iterator;
const_iterator i( contents.get<sorted_tag>().find( p ) );
return i != contents.get<sorted_tag>().end() ? *i : end(); // Base::end() returns a default-constructed shared_ptr<Base>
}
//...
There has got to be a better way (this is just a proof of concept). I'm questioning the efficiency gain since I am now creating an object for each look-up. The data file has about 300,000 resources...
Also note that find is returning shared pointers, not iterators. I think my design could benefit by actually implementing iterators and making the interface conform to the STL interfaces.
If the set of types to be stored is finite small, what about a boost::variant?
What are you trying to store in the container?
Come to think of it, type erasure is probably more efficient than boost::shared_ptr to begin win.
Each shared_ptr allocation requires two memory allocations. Each type erasure instance requires
only one. The difference is on a shared_ptr you can call functions directly. With type erasure
each function call on the underlying type requires two functions calls, one of them being a virtual
function. But my guess is that one memory allocations equals thousands of virtual function calls.
Note that each of the resources have three parts: a key, type, and value.
My current design is templated by type with an additional bool template parameter to flag aliases (because aliases are stored as a string, too). I'm not confident that this is the best way to model the data. My goal is to be able to read in the file, store it in a data structure, and allow the user to look-ups specific keys (like "en_US/Numbers/1") and change their names/values/composition.