boost::unordered_map / boost::multi_inde

boost::unordered_map / boost::multi_index_container

moorecm (1932)

I was writing an abstraction for an ICU resource bundle (UTF-8 text file containing composite data). I would like to preserve the order of the elements found in the file for diff purposes but I also would like to provide efficient look up for a given key. I was originally using a std::list, which worked fine however look ups were linear. Rather than implementing a cross reference with hashes, I modified it to use a boost::unordered_map.

The "unordered map" still appears to be sorting (or at least changing the order of) my data. Is that normal or am I inserting the elements out of order?

Last edited on

m4ster r0shi (2201)

Hmmm....

some guy said:

unordered_map doesn't maintain the order of insertion. Unordered in this case means that the observable order of elements (i.e. when you enumerate them) is unspecified and arbitrary. In fact, I would expect that the order of elements in an unordered_map can change during the lifetime of the map, due to rehashing when resizing the map (that's implementation dependent though)

However...

another guy said:

Read about Boost.Multiindex. It gives you an opportunity to create a container that has both access to data by key (like std::map) and sequenced acess to data (like std::list).

Both found here:

http://stackoverflow.com/questions/1605807/boostunordered-map-maintains-order-of-insertion

moorecm (1932)

Nice. That's a strange name for an arbitrarily-ordered container...

I'll check out the documentation on Boost.MultiIndex.

Last edited on

jsmith (5804)

+1 boost::multi_index_container.

Longest typedef you'll ever create, but really powerful.

moorecm (1932)

Ok, here's what I've got (adapted slightly from the tutorial):

//...
#include <string>
#include <boost/shared_ptr.hpp>
//...
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/identity.hpp>
//...
class Table : public Base {

    // Boost.MultiIndex is used in this context to provide an interface for
    // logarithmic look-up while preserving the sequence of insertion (desired
    // for diff purposes)
    typedef boost::multi_index::multi_index_container<
        boost::shared_ptr<Base>,
        boost::multi_index::indexed_by<                                                                    // index
            // sequenced (ordered of insertion)
            boost::multi_index::sequenced<>,                                                               // 0
            // sorted by boost::shared_ptr<Base> ?
            boost::multi_index::ordered_unique< boost::multi_index::identity< boost::shared_ptr<Base> > >  // 1
        >
    > value_type;
    // enumeration for value_type::nth_index
    enum value_type_indexes {
        SEQUENCED,
        SORTED
    };
    //...
protected:
    value_type contents;
    //...
};

Just for reference, I was trying to use BOOST_FOREACH for [sequenced] traversal but have found a few major oversights:

//#include <boost/foreach.hpp>
//...
    // note that BOOST_FOREACH did compile below, but was not properly extended to support
    // this custom container.  BOOST_FOREACH always iterated through the 0 indexed interface,
    // which introduced the subtle limitation that the table could not be traversed using the SORTED interface
    /*BOOST_FOREACH( const value_type::nth_index<SEQUENCED>::type::value_type & i, contents )
    {    
        // i is the contained element (shared_ptr<Base>)
    }*/
    
    // the following two examples traverse the table contents using either interface
    for( value_type::nth_index<SEQUENCED>::type::iterator i( contents.get<SEQUENCED>().begin() );
         i != contents.get<SEQUENCED>().end();
         ++i )
    {   
        // i is an iterator to the contained element (shared_ptr<Base>)
    }
    //...
    for( value_type::nth_index<SORTED>::type::iterator i( contents.get<SORTED>().begin() );
         i != contents.get<SORTED>().end();
         ++i )
    {   
        // i is an iterator to the contained element (shared_ptr<Base>)
    }

Here's an example insertion:

    boost::shared_ptr<Base> resource;
//...
    // note that you do not have to use a specific interface here
    //contents.get<SEQUENCED>().push_back( resource );
    contents.push_back( resource );

Now, if I understand this correctly, the container (contents) has two interfaces: sequenced and sorted by element (shared_ptr<Base>).

Is there a way to sort the second one by a string member in the Base class? For example, if the element type were just Base (not shared_ptr<Base>), I could provide a less than operator.

Perhaps there's a way to specify my own function object for sorting, similar to a standard associative container. Any help is appreciated; I'll keep reading the documentation for now.

Last edited on

moorecm (1932)

Here is the working example that solves the question above using a user-defined comparison functor:

//...
#include <string>
#include <boost/shared_ptr.hpp>
//...
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/identity.hpp>
//...
class Base {
    //...
protected:
    std::string name;
    //...
};
//...    
class Table : public Base {
    
    // comparison functor for sorting Tables by member name lexicographically
    template <typename T>
    struct less_by_name : public std::binary_function<T, T, bool>
    {   
        typename less_by_name<T>::result_type operator()( const typename less_by_name<T>::first_argument_type & lhs,
                                                          const typename less_by_name<T>::second_argument_type & rhs ) const {
            return lhs->name < rhs->name;
        }   
    };  
    
    // Boost.MultiIndex is used in this context to provide an interface for
    // logarithmic look-up while preserving the sequence of insertion (desired
    // for diff purposes)
    typedef boost::multi_index::multi_index_container<
        boost::shared_ptr<Base>,
        boost::multi_index::indexed_by<         // index
            // sequenced (ordered of insertion)
            boost::multi_index::sequenced<>,    // 0
            // sorted by member name
            boost::multi_index::ordered_unique< // 1
                boost::multi_index::identity< boost::shared_ptr<Base> >,
                Table::less_by_name< boost::shared_ptr<Base> >
            >
        >   
    > value_type;
 
    // enumeration for value_type::nth_index
    enum value_type_indexes
    {
        SEQUENCED,       
        SORTED
    };

    //...
protected:
    value_type contents;
    //...
};

Last edited on

moorecm (1932)

Now I just need to figure out how to do some look-ups!

jsmith (5804)

Sorry, I just now saw this. Here's some examples of what you are trying to do:

struct Person {
    std::string lastName;
    std::string firstName;
    uint8_t      age;

    friend std::ostream& operator<<( std::ostream& os, const Person& p )
       { return os << p.lastName << ", " << p.firstName << ", Age: " << p.age; }
};

struct record_order;
struct last_name_order;
struct age_order;

typedef boost::multi_index_container<
    Person,
    boost::multi_index::indexed_by<
        boost::multi_index::sequenced< boost::multi_index::tag< record_order > >,
        boost::multi_index::ordered_non_unique< boost::multi_index::tag< last_name_order >,
            boost::multi_index::member< Person, std::string, &Person::lastName > >,
        boost::multi_index::ordered_non_unique< boost::multi_index::tag< age_order >,
            boost::multi_index::member< Person, uint8_t, &Person::age > >
    >
> Database;


typedef Database::index< last_name_order>::type LastNameIndex;
typedef LastNameIndex::iterator                 LastNameIter;

Database db;
LastNameIter i = db.get<last_name_order>().find( std::string( "Smith" ) );
while( i != db.get<last_name_order>().end() && i->lastName == "Smith" )
{
    std::cout << "Found person: " << *i << std::endl;
    ++i;
}

EDIT: I've shown above a way to avoid the hard-coded constants for the indexes, using multi_index::tag<>.
All you do is create an empty struct; tag<> just needs some arbitrary type to differentiate on.

Last edited on

moorecm (1932)

Thanks! The tags make sense; I've seen something similar before.

The only things that seem to really obfuscate my code are the shared pointers. I'm using them for polymorphism, although they may not be entirely necessary. I don't think I can use boost::multi_index::member because of the extra level of indirection the pointers add.

Last edited on

jsmith (5804)

Consider then using type erasure to allow you to store actual instances in the container
rather than pointers.

I know you read my article on type erasure, but here's a link to it:
http://www.cplusplus.com/forum/articles/18756/

moorecm (1932)

Ok, just FYI from my snippets above, to remove the enumeration I added:

1
2

struct sequenced_tag {};
struct sorted_tag {};

...and replaced all occurances of the enumorators with the cooresponding tag, and changed nth_index to just index.

What I have works, it's just not as clean as I would like. I think type erasure is cool and all, but I'm not convinced the final source would be cleaner. For as simple as the data is, it might be worthwhile to create a single unified type.

Well, I'll work with it a little bit and maybe branch off and try to incorporate type erasure.

moorecm (1932)

Here's a sample look-up, which is kind of nuts because of the shared pointers:

//...
    virtual boost::shared_ptr<Base> find( const std::string & key ) const
    {       
        // the container's find method uses the comparison functor less_by_name,
        // which expects two boost::shared_ptr<Base> arguments.  create a
        // dummy object with the target name and pass it to find
        boost::shared_ptr<Base> p( new Dummy( key ) );
         
        typedef value_type::index<sorted_tag>::type::const_iterator const_iterator;

        const_iterator i( contents.get<sorted_tag>().find( p ) );
        return i != contents.get<sorted_tag>().end() ? *i : end(); // Base::end() returns a default-constructed shared_ptr<Base>
    }
//...

There has got to be a better way (this is just a proof of concept). I'm questioning the efficiency gain since I am now creating an object for each look-up. The data file has about 300,000 resources...

Also note that find is returning shared pointers, not iterators. I think my design could benefit by actually implementing iterators and making the interface conform to the STL interfaces.

Last edited on

jsmith (5804)

If the set of types to be stored is ~~finite~~ small, what about a boost::variant?

What are you trying to store in the container?

Come to think of it, type erasure is probably more efficient than boost::shared_ptr to begin win.
Each shared_ptr allocation requires two memory allocations. Each type erasure instance requires
only one. The difference is on a shared_ptr you can call functions directly. With type erasure
each function call on the underlying type requires two functions calls, one of them being a virtual
function. But my guess is that one memory allocations equals thousands of virtual function calls.

Last edited on

moorecm (1932)

Yes, Boost.Variant is an option. The data is relatively simple and could be modeled in a number of ways.

Here is a sample data file that exhibits each of the possible resource types (table, string, int, alias):


en_US : table {
    Numbers : table {
        1 : string { "one" }
        2 : string { "two" }
        3 : string { "three" }
    }
    SameStuff : alias { "en_US/Numbers" }
    ONE : int { 1 }
}

Note that each of the resources have three parts: a key, type, and value.

My current design is templated by type with an additional bool template parameter to flag aliases (because aliases are stored as a string, too). I'm not confident that this is the best way to model the data. My goal is to be able to read in the file, store it in a data structure, and allow the user to look-ups specific keys (like "en_US/Numbers/1") and change their names/values/composition.

Last edited on

Topic archived. No new replies allowed.

boost::unordered_map / boost::multi_index_container

C++

Forum