Hello Everyone. Could anyone please help me solve this problem. I am actually new to using STL. I have been given a task of searching a file of over 60000 records. The data extracted from the file is stored in struct and that struct is in turn stored in vectors i-e a vector of struct having over 60000 struct elements. I have to look for duplicates in that vector of structs and storing those duplicates in another container so that the original vector is left with only those elements which have no duplicates. For example:
OrgVector = 1 2 3 4 1 5 6 2 4 7 9 10 1
DupVectpr = 1 1 1 2 2 4 4
and the original vector is now left with
OrgVector = 3 5 6 7 9 10
the struct is something like this
struct MyPred
{
int x, y, z;
};
then I have to find the difference between x and z for each struct element stored in the DupVector and will push_back the vector again to the original vector (from the DupVector) where the difference between x and z of its struct element would be the minimum. I have done the same thing with simple ints and even string type of vectors but am unable to do the same thing for vector of structs. And is there any fast way of searching 60000 records for duplicates because simple find/find_if would be too slow since they perform linear search. I tried multimaps where it returned all values against a single key and it was a really very quick search i-e I stored same numbers in multimap both as its key and value and the equal_range returned all values against a single key. E-g considering the above vector
Values against key 1: 1 1 1
Values against key 2: 2 2
Values against key 4: 4 4
But again I could not do it with vectors of structs. Could anyone please help do this assignment. Any help along with code would be greatly appreciated because it is basically the syntax that I am stuck in
I would be tempted to use a std::multiset for your values and a std::set for your unique values.
In order to make this work with a struct you need to define how objects of your struct are to be compared:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
struct MyPred
{
int x, y, z;
// a comparison function that will compare the values of x, y and z.
booloperator<(const MyPred& p) const
{
// some clever math
}
};
std::multiset<MyPred> pred_set;
std::set<MyPred> unique_pred_set;
I want to compare struct values of one element of the vector with the rest of the vector elements like this
struct MyPred
{
int x, y, z;
// a comparison function that will compare the values of x, y and z.
bool operator<(const MyPred& p) const // why '<'? why not '==' ?
{
// some clever math
return (MyPred.x == p.x)
// but for all 3 member i-e x, y and z. So is it possible to do this comparison like:
if(MyPred.x == p.x && MyPred.y == p.y && MyPred.z == p.z)
{
return true;
}
else
return false;
}
};
More over since I am new to STL so I dont know much about it. Could you plzz write to me a small sample code especially the overloaded operator method in struct and the point from where I would call this method? I shall be grateful.
And the struct values are already stored in vectors. I have to find out the duplicates. Removing those duplicates, finding out the difference between x and z and then push_back that vector (with minimum x-z difference) to the original vector again. Is it possible to temporarily store those struct vector elements in some other STL containers and after achieving the desired result, storing the elements again in the original vector?
If you want to store a struct in an ordered STL container then you need to overload < and not == because the STL needs to be able to place the values in order. So it need to know what value is considered less than another value. You can not put a struct value into an ordered STL container without telling how to decide which value is lower than another value.
However if you avoid sorting the values or putting them into an ordered container then you could do this without overloading <. But finding unique values in unordered data is much more time consuming.
To overload the < operator you need to decide what makes one of your struct values less than another. That depends on what these numbers mean. A simple and fast method of making sure every struct value is unique compared with every other is to give each component value a different order of importance in the comparison. So if s.x < u.x it doesn't matter what y and z are because x is more important than them:
struct MyPred
{
int x, y, z;
booloperator<(const MyPred& p) const
{
// x overrides y and z
if(x < p.x) returntrue;
if(x > p.x) returnfalse;
// x must be equal
// y overrides z
if(y < p.y) returntrue;
if(y > p.y) returnfalse;
// x and y must be equal
// its down to z now
if(z < p.z) returntrue;
if(z > p.z) returnfalse;
// x and y and z must be equal
returnfalse;
}
};
You don't need to call the < function explicitly, the STL container will call it when it needs to. Actually that function is called every time you test if one struct value is less that another like this:
1 2 3 4 5 6
MyPred a, b;
if(a < b) // this calls a.operator<(b);
{
// do stuff
}
oh thank you so much. It was of great help. What if the struct contains string type of variables? How can we sort string type of struct variables? and will this method be fast enough to go through more than 60000 vector elements where each element is of type struct and struct contains 16 string type of variables?
Thank you again for make me understand through these bits of codes :)
The std::string class already has the < overloaded so you can compare them just like I compare the int types in my example. Remember the strings only need to be tested until one is found to be (lexically) less than another so they are not always as slow as you might imagine. Obviously the more complex the struct the more time it will take to compare two of its values. Sometimes only part of the struct is important when it comes to comparing them. You can select only those member variables that together form a unique value with respect to your data set.
So if your struct contains information taken from a database, for example, you could overload the < operator such that it only compares the member variable that contains the unique key of that record.
To find the duplicates efficiently you can try std::sort() on the data or try putting the data into a unique ordered container like std::set<MyPred>.
Thank you so much for clarifying your point with bits of codes. That's really helpful :). Let me try to explain the whole scenario here.
I am totally new to using STL.
I have a vector which contains struct type of elements and that struct contains 16 string type of members. There are over 60000 elements in that vector. The vector contains duplicate elements (of struct). I have to remove all the duplicates along with the actual element (having duplicates) and store them in another vector or any container. I copied the whole vector into a multimap. Key and values both contain the same vector e-g in case of simple integers:
k v
1 1
2 2
3 3
4 4
1 1
3 3
5 5
5 5
and I want it to return:
p k v
1 1 1 1 (there are two 1s, one at pos 1 and the other one at pos 5)
2 2 2
3 3 3 3 3 (there are three 3s, at pos 3, pos 6 and pos 9)
4 4 4
5 1 1 1
6 3 3 3 3
7 5 5 5
8 5 5 5
9 3 3 3 3
it works fine with simple ints or string type of vectors but it doesnt work with struct type of elements and instead returns:
k v
1 1
2 2
3 3
4 4
1 1
3 3
5 5
5 5
that is NO difference.
Afterwards I want this multimap to contain only the duplicate elements i-e
p k v
1 1 1 1 (there are two 1s, one at pos 1 and the other one at pos 5)
3 3 3 3 3 (there are three 3s, at pos 3, pos 6 and pos 9)
7 5 5 5
struct contains string type of variables e-g a, x, y, z and if string x, y, z (i-e excluding variable a) are all equal to each other then convert a and z to int (which I am sure there would be some function in C++ for this type of conversion) and fnd out the the difference between a and z of each such duplicate and then push_back the vector with minimum a-z difference to the original vector. could you please help me do this?
I can't see how your example using int is working to be honest. Your equal_range() is comparing two iterators rather than the values that they point to.
For example the iterator at first position in the multimap is having a value '1' which would now be the key value, it then returns all the values stored against the key '1'. Since both key and value stores the same vector iterators, the key '1' will return all 1's from the values.
k v
1 1
2 2
3 3
4 4
1 1
3 3
5 5
5 5
and the result
p k v
1 1 1 1 (there are two 1s, one at pos 1 and the other one at pos 5)
2 2 2
3 3 3 3 3 (there are three 3s, at pos 3, pos 6 and pos 9)
4 4 4
5 1 1 1
6 3 3 3 3
7 5 5 5
8 5 5 5
9 3 3 3 3
could you please suggest any other method as I am not at all an expert. I am new to it. Please also bear in mind that the file contains over 60000 records so the searching process should be really fast.
I think you would benefit from using a std::multiset<> rather than a std::multimap<> because it is designed to achieve what you are using your std::multimap<> for:
int main()
{
std::vector<int> vPred;
vPred.push_back(1);
vPred.push_back(2);
vPred.push_back(3);
vPred.push_back(4);
vPred.push_back(1);
vPred.push_back(3);
vPred.push_back(5);
vPred.push_back(5);
// The values need to be in order for unique_copy() to work
std::sort(vPred.begin(), vPred.end());
// A vector to store the unique values only
std::vector<int> uPred;
// Copy the unique values into the new vector
std::unique_copy(vPred.begin(), vPred.end(), std::back_inserter(uPred));
for(std::vector<int>::iterator i = uPred.begin(); i != uPred.end(); ++i)
{
std::cout << *i << '\n';
}
}
thank you very much for taking time. I have to use struct having string type of variables, not just simple ints or strings and this is where I get into trouble ;'( i-e working with struct type of STL containers, in this case vectors or mulitset/maps. Could you please help me with that. I have already tried to explain the actual scenario of what I want to achieve.
could you please show me how would you do the same thing with structs having string type of variables and how to write the over loaded method == and how do I call it from main?