MultiMap removing duplicates

Pages: 1234... 6
Here is my first example using a struct with strings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

struct MyPred
{
	std::string x;
	std::string y;

	MyPred(const std::string& x, const std::string& y): x(x), y(y) {}

	bool operator==(const MyPred& p) const
	{
		return x == p.x && y == p.y;
	}

	bool operator<(const MyPred& p) const
	{
		if(x < p.x) return true;
		if(x > p.x) return false;
		if(y < p.y) return true;
		if(y > p.y) return false;
		return false;
	}
};


int main()
{
	std::vector<MyPred> vPred;
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("b", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("b", "b"));

	// The values need to be in order for equal_range() to work
	std::sort(vPred.begin(), vPred.end());

	std::pair<std::vector<MyPred>::iterator, std::vector<MyPred>::iterator> ret;

	for(std::vector<MyPred>::iterator i = vPred.begin(); i != vPred.end(); i = ret.second)
	{
		std::cout << "[" << i->x << ", " << i->y << "] => ";
		ret = std::equal_range(i, vPred.end(), *i);
		for(std::vector<MyPred>::iterator j = ret.first; j != ret.second; ++j)
		{
			std::cout << "[" << i->x << ", " << i->y << "]";
		}
		std::cout << '\n';
	}
}
[a, a] => [a, a][a, a]
[a, b] => [a, b][a, b]
[b, a] => [b, a]
[b, b] => [b, b]

And again you can do this to collect all the unique ones:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
struct MyPred
{
	std::string x;
	std::string y;

	MyPred(const std::string& x, const std::string& y): x(x), y(y) {}

	bool operator==(const MyPred& p) const
	{
		return x == p.x && y == p.y;
	}

	bool operator<(const MyPred& p) const
	{
		if(x < p.x) return true;
		if(x > p.x) return false;
		if(y < p.y) return true;
		if(y > p.y) return false;
		return false;
	}
};


int main()
{
	std::vector<MyPred> vPred;
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("b", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("b", "b"));

	// The values need to be in order for equal_range() to work
	std::sort(vPred.begin(), vPred.end());

	std::vector<MyPred> uPred;
	std::unique_copy(vPred.begin(), vPred.end(), std::back_inserter(uPred));

	for(std::vector<MyPred>::iterator i = uPred.begin(); i != uPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}
}
[a, a]
[a, b]
[b, a]
[b, b]
Here is a version that puts duplicate values uniquely into one vector and unique values into another vector:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

#include <string>
#include <vector>
#include <iostream>
#include <algorithm>

struct MyPred
{
	std::string x;
	std::string y;

	MyPred(const std::string& x, const std::string& y): x(x), y(y) {}

	bool operator==(const MyPred& p) const
	{
		return x == p.x && y == p.y;
	}

	bool operator<(const MyPred& p) const
	{
		if(x < p.x) return true;
		if(x > p.x) return false;
		if(y < p.y) return true;
		if(y > p.y) return false;
		return false;
	}
};


int main()
{
	std::vector<MyPred> vPred;
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("b", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("b", "b"));

	// The values need to be in order for equal_range() to work
	std::sort(vPred.begin(), vPred.end());

	std::vector<MyPred> uPred; // values that were always unique
	std::vector<MyPred> dPred; // values that were duplicated

	std::pair<std::vector<MyPred>::iterator, std::vector<MyPred>::iterator> ret;

	for(std::vector<MyPred>::iterator i = vPred.begin(); i != vPred.end(); i = ret.second)
	{
		ret = std::equal_range(i, vPred.end(), *i);
		if(ret.second - ret.first == 1)
		{
			uPred.push_back(*i);
		}
		else
		{
			dPred.push_back(*i);
		}
	}

	std::cout << "vPred: Sorted input\n";
	for(std::vector<MyPred>::iterator i = vPred.begin(); i != vPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}

	std::cout << "dPred: Only the values that were duplicated\n";
	for(std::vector<MyPred>::iterator i = dPred.begin(); i != dPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}

	std::cout << "uPred: Only the values that were unique\n";
	for(std::vector<MyPred>::iterator i = uPred.begin(); i != uPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}
}
vPred: Sorted input
[a, a]
[a, a]
[a, b]
[a, b]
[b, a]
[b, b]
dPred: Only the values that were duplicated
[a, a]
[a, b]
uPred: Only the values that were unique
[b, a]
[b, b]
The first example you have written is exactly what I want. THANK A TON :D A MILLIONS OF TONSSSS.

How fast will it work if there are over 60000 vector elements??? And can I populate the vector like this:
1
2
3
4
MyPred pred;

pred.x = "a";
pred.y = "b";

and so on ...

then
 
vPred.push_back(pred);


By string I mean anything in double quotation marks e-g "a", "1", "a1", "z10b", "f:x". How can I sort this type of string?
You will just have to put it to the test to see how fast it works. I can't think how you might do it more efficiently. You can certainly populate your vector as you suggest. And anything in double quotes will work. The string comparison doesn't care if the string contains letters, numbers or symbols. They are all just a sequence of character codes from its perspective.
Last edited on
and please also help me how would you separate all duplicates (not uniquely) i-e if there are 4 times ("a","a") at any random position, the program should store 4 times ("a","a") while deleting it from the original vector.

Thanks in advance. I AM HAPPPYYY :D
With std::equal_range() if the difference between the second and first iterators is exactly 1 then the value was unique. So in the first example you could push the value onto a new vector instead of printing it out but only if ret.second - ret.first != 1.
1
2
3
4
5
6
7
8
9
		ret = std::equal_range(i, vPred.end(), *i);
		
		if(ret.second - ret.first != 1) // duplicates
		{
				for(std::vector<MyPred>::iterator j = ret.first; j != ret.second; ++j)
				{
					dPred.push_back(*i); //put each duplicate onto a new vector
				}
		}

I haven't tested that but I think it should work.
IT IS WORKINGGGGGGG :D

1
2
3
4
5
6
7
8
9
10
11
12
13

if(ret.second - ret.first != 1) // duplicates
{
		for(std::vector<MyPred>::iterator j = ret.first; j != ret.second; ++j)
		{
			dPred.push_back(*i); //put each duplicate onto a new vector
		}
}
else if(ret.second - ret.first == 1)
{
	uPred.push_back(*i);
}


Please dont go away. Let me try it with 60000 values :D
The three vectors are occupying spaces in memory where as I need only one vector in the end. The one having unique variables (having minimum a-z difference). Is there any solution? are pointers helpful in this case?
and like I said struct contains string type of variables e-g a, x, y, z and if string x, y, z (i-e excluding variable a) have duplicates then convert a and z to int and find out the the difference between a and z of each such duplicate and then push_back the vector with minimum a-z difference to the original vector. How do I compare only the x, y, z member variables for equality and not member 'a'???
Well you could rely on the fact that the duplicates will be equal to or fewer in number than your original vector and so store then back into the original vector overwriting the original data. The way the algorithm works allows you to do this because the values you are reading from are always higher up in the vector than the ones you will be writing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

int main()
{
	srand(0);
	std::vector<MyPred> vPred;
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("a", "a"));
	vPred.push_back(MyPred("b", "a"));
	vPred.push_back(MyPred("a", "b"));
	vPred.push_back(MyPred("b", "b"));

	std::vector<MyPred> uPred;

	// The values need to be in order for equal_range() to work
	std::sort(vPred.begin(), vPred.end());

	std::pair<std::vector<MyPred>::iterator, std::vector<MyPred>::iterator> ret;

	// Iterator to overwrite into original vector
	std::vector<MyPred>::iterator u = vPred.begin();

	for(std::vector<MyPred>::iterator i = vPred.begin(); i != vPred.end(); i = ret.second)
	{
		ret = std::equal_range(i, vPred.end(), *i);
		if(ret.second - ret.first != 1)
		{
			for(std::vector<MyPred>::iterator j = ret.first; j != ret.second; ++j)
			{
				// overwrite original vector vPred with unique values
				*u++ = *i;
			}
		}
		else
		{
			uPred.push_back(*i);
		}
	}
	// Reset the size of original vector vPred
	vPred.resize(u - vPred.begin());

	std::cout << "vPred: Sorted input\n";
	for(std::vector<MyPred>::iterator i = vPred.begin(); i != vPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}

	std::cout << "uPred: Only the values that were unique\n";
	for(std::vector<MyPred>::iterator i = uPred.begin(); i != uPred.end(); ++i)
	{
		std::cout << "[" << i->x << ", " << i->y << "]" << '\n';
	}
}

This will save you one vector.
heidiK wrote:
How do I compare only the x, y, z member variables for equality and not member 'a'???

Just change the operator functions in your struct definition to only compare the variables that you want to compare.
Also for memory considerations you can use pointers so that you can delete a vector when you are finished with it:
1
2
3
4
5
6
7
8
std::vector<MyPred>* vPred = new std::vector<MyPred>;

vPred->push_back(MyPred("a", "b"));

// ... stuff ...

delete vPred;
Actually a better method would be to put the algorithm into a function so that the working vectors go out of scope when the function exits.
(107) std::vector<MyPred> vPred;

c:\dup_vect3.cpp(107) : see reference to class template instantiation 'std::vector<_Ty>' being compiled

What is this error?
Just change the operator functions in your struct definition to only compare the variables that you want to compare.

Oh YESS :P thanks
I don't know. Maybe something you did in your struct definition.
This is the reason given on one of the websites. Do you understand what does it mean ;'(

1
2
3
4
5
 It is the line with std::vector<size_t> that triggers the warning.

std::vector<size_t> is expanded to std::vector<size_t, std::allocator<size_t>>.

std::allocator<size_t> should have the pointer and const_pointer typedefs to be size_t* and const size_t*, and not unsigned* and const unsigned* as it is in my example. 
Can you post your struct definition?
I didnt change anything ;'(

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct MyPred
{
	std::string x;
	std::string y;

	MyPred(const std::string& x, const std::string& y): x(x), y(y) {}

	bool operator==(const MyPred& p) const
	{
		return x == p.x && y == p.y;
	}

	bool operator<(const MyPred& p) const
	{
		if(x < p.x) return true;
		if(x > p.x) return false;
		if(y < p.y) return true;
		if(y > p.y) return false;
		return false;
	}
};
Pages: 1234... 6