MultiMap removing duplicates

Pages: 123456
The first and last string elements of each line are start date/time and end date/time respectively. I have to find the duplicates in original vector where all string elements in a line are the same except the first element i-e start date/time. I have to find the difference between the minits of the start date/time and the minits of the end date/time and push_back the element from the duplicates to the original vector where the time difference is the minimum. e-g

Start date/time of '1' element is 2006/06/01 16:34:43
End date/time of that element is 2006/06/01/ 16:55: 51

Start date/time of duplicate element (of '1') is 2006/06/01 16:24:43
End date/time of that element is 2006/06/01/ 16:55: 51 (same as '1')

I now have to find the difference between 34 (minits of '1' element's start date/time) and 55 (minits of '1' element's end date/time) which is equal to 21

and the difference between 24 (minits of duplicate (of '1') element's start date/time) and 55 (minits of duplicate (of '1') element's end date/time) which is equal to 31.

since 21 is less than 31 so I will push_back the element with time difference of 21 to the original vector and will discard all its duplicates.

like wise finding the time difference between all the duplicates of element '1' as it can have many duplicates and not just one.

Could you help me do this? I hope I have explained well. For the time being I have put random int+string combination to struct first (start date/time) and last (end date/time) member variables instead of proper date/time to find the difference between the int part of that variable.
Please dont go awayyyy. I have to submit it tomorrow. I have already spent weeks on it, but unsuccessful. Please do it for me.
Could you please have a look at the code and tell me the reason why it is giving me an error? And also what should I do next? Please guide.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
int timeDiff(std::string startDateTime, std::string endDateTime)
{
	std::string strtYYYY, strtMM, strtDD, endYYYY, endMM, endDD;
	int strtY, strtM, strtD, endY, endM, endD;

	strtYYYY = startDateTime.substr (0, 4);
	cout << "String Start Year = " << strtYYYY << endl;
	strtY = atoi(strtYYYY.c_str()); 
	cout << "int Start Year = " << strtY << endl;

	strtMM = startDateTime.substr (5, 2);
	cout << "String Start Month = " << strtMM << endl;
	strtM = atoi(strtMM.c_str()); 
	cout << "int Start Month = " << strtM << endl;

	strtDD = startDateTime.substr (8, 2);
	cout << "String Start Day = " << strtDD << endl;
	strtD = atoi(strtDD.c_str()); 
	cout << "int Start Day = " << strtD << endl;

	endYYYY = endDateTime.substr (0, 4);
	cout << "String End Year = " << endYYYY << endl;
	endY = atoi(endYYYY.c_str()); 
	cout << "int end Year = " << endY << endl;

	endMM = endDateTime.substr (5, 2);
	cout << "String End Month = " << endMM << endl;
	endM = atoi(endMM.c_str()); 
	cout << "int End month = " << endM << endl;

	endDD = endDateTime.substr (8, 2);
	cout << "String End Day = " << endDD << endl;
	endD = atoi(endDD.c_str()); 
	cout << "int End Day = " << endD << endl;

	struct tm timeinfo;
	double dif;

	timeinfo.tm_sec;
	timeinfo.tm_min;
	timeinfo.tm_hour;
	timeinfo.tm_mday;
	timeinfo.tm_mon;
	timeinfo.tm_year;

	time_t time = mktime( &timeinfo );
	dif = difftime (time); // *** it is giving me error at this line. Which two arguments (time_t) can I pass to it according to the code I have written?

	return 1; // should actually be the time difference
}
I can't give this my full attention today. I have made some comments in the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// return a time_t rather than an int
time_t timeDiff(std::string startDateTime, std::string endDateTime)
{
	std::string strtYYYY, strtMM, strtDD, endYYYY, endMM, endDD;
	int strtY, strtM, strtD, endY, endM, endD;

	strtYYYY = startDateTime.substr (0, 4);
	cout << "String Start Year = " << strtYYYY << endl;
	strtY = atoi(strtYYYY.c_str()); 
	cout << "int Start Year = " << strtY << endl;

	strtMM = startDateTime.substr (5, 2);
	cout << "String Start Month = " << strtMM << endl;
	strtM = atoi(strtMM.c_str()); 
	cout << "int Start Month = " << strtM << endl;

	strtDD = startDateTime.substr (8, 2);
	cout << "String Start Day = " << strtDD << endl;
	strtD = atoi(strtDD.c_str()); 
	cout << "int Start Day = " << strtD << endl;

	endYYYY = endDateTime.substr (0, 4);
	cout << "String End Year = " << endYYYY << endl;
	endY = atoi(endYYYY.c_str()); 
	cout << "int end Year = " << endY << endl;

	endMM = endDateTime.substr (5, 2);
	cout << "String End Month = " << endMM << endl;
	endM = atoi(endMM.c_str()); 
	cout << "int End month = " << endM << endl;

	endDD = endDateTime.substr (8, 2);
	cout << "String End Day = " << endDD << endl;
	endD = atoi(endDD.c_str()); 
	cout << "int End Day = " << endD << endl;

	struct tm timeinfo;
	double dif;

	// You have not put anything in these variables. After meticulously extracting
	// the day, month and year you have not used them
	timeinfo.tm_sec = 0;
	timeinfo.tm_min = 0;
	timeinfo.tm_hour = 0;
	timeinfo.tm_mday = strtD;
	timeinfo.tm_mon = strtM;
	timeinfo.tm_year = strtY - 1900; // don't forget to subtract 1900 from this one
	timeinfo.tm_isdst = 0; // this must be set

	// this only gives you the start time you need to do it again
	// to get an end time with your endD, endM and endY variables
	time_t start_time = mktime( &timeinfo );


	// difftime takes two variables, not one. You can't calculate a difference between only one thing.
	// dif = difftime (time);

	// also, you don't need difftime() just subtract your start_time from your end_time to get the difference in seconds

	return end_time - start_time;
}
Last edited on
I understand. Thanks anyway

The time difference it shows is 0 where as it should have been 45 minits

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
time_t timeDiff(std::string startDateTime, std::string endDateTime)
{
	std::string strtYYYY, strtMM, strtDD, endYYYY, endMM, endDD;
	std::string strtSec, strtMin, strtHr, endSec, endMin, endHr;
	int strtY = 0;
	int strtM = 0;
	int strtD = 0;
	int endY = 0;
	int endM = 0;
	int endD = 0;
	int strtSc = 0;
	int strtMn = 0;
	int strtHrs = 0;
	int endSc = 0;
	int endMm = 0;
	int endHrs = 0;

	/*strtYYYY = startDateTime.substr (0, 4);
	cout << "String Start Year = " << strtYYYY << endl;
	strtY = atoi(strtYYYY.c_str()); 
	cout << "int Start Year = " << strtY << endl;

	strtMM = startDateTime.substr (5, 2);
	cout << "String Start Month = " << strtMM << endl;
	strtM = atoi(strtMM.c_str()); 
	cout << "int Start Month = " << strtM << endl;

	strtDD = startDateTime.substr (8, 2);
	cout << "String Start Day = " << strtDD << endl;
	strtD = atoi(strtDD.c_str()); 
	cout << "int Start Day = " << strtD << endl;

	endYYYY = endDateTime.substr (0, 4);
	cout << "String End Year = " << endYYYY << endl;
	endY = atoi(endYYYY.c_str()); 
	cout << "int end Year = " << endY << endl;

	endMM = endDateTime.substr (5, 2);
	cout << "String End Month = " << endMM << endl;
	endM = atoi(endMM.c_str()); 
	cout << "int End month = " << endM << endl;

	endDD = endDateTime.substr (8, 2);
	cout << "String End Day = " << endDD << endl;
	endD = atoi(endDD.c_str()); 
	cout << "int End Day = " << endD << endl;*/
	
	strtHr = startDateTime.substr (11, 2);
	cout << "String Start Hour = " << strtHr << endl;
	strtHrs = atoi(strtHr.c_str()); 
	cout << "int Start Hour = " << strtHrs << endl;

	strtMin = startDateTime.substr (14, 2);
	cout << "String Start Minits = " << strtMin << endl;
	strtMn = atoi(strtMin.c_str()); 
	cout << "int Start Minits = " << strtMn << endl;

	strtSec = startDateTime.substr (17, 2);
	cout << "String Start Seconds = " << strtSec << endl;
	strtSc = atoi(strtSec.c_str()); 
	cout << "int Start Seconds = " << strtSc << endl;

	endHr = endDateTime.substr (11, 2);
	cout << "String End Hour = " << endHr << endl;
	endHrs = atoi(endHr.c_str()); 
	cout << "int end Hour = " << endHrs << endl;

	endMin = endDateTime.substr (14, 2);
	cout << "String End Minits = " << endMin << endl;
	endMm = atoi(endMin.c_str()); 
	cout << "int End Minits = " << endMm << endl;

	endSec = endDateTime.substr (17, 2);
	cout << "String End Seconds = " << endSec << endl;
	endSc = atoi(endSec.c_str()); 
	cout << "int End Seconds = " << endSc << endl;
		
	struct tm strtTimeinfo, endTimeinfo;
	double dif;
		
	strtTimeinfo.tm_sec = strtSc;
	strtTimeinfo.tm_min = strtMn;
	strtTimeinfo.tm_hour = strtHrs;
	strtTimeinfo.tm_mday = strtD;
	strtTimeinfo.tm_mon = strtM;
	strtTimeinfo.tm_year = strtY - 1900; // don't forget to subtract 1900 from this one
	strtTimeinfo.tm_isdst = 0; // this must be set

	endTimeinfo.tm_sec = endSc;
	endTimeinfo.tm_min = endMm;
	endTimeinfo.tm_hour = endHrs;
	endTimeinfo.tm_mday = endD;
	endTimeinfo.tm_mon = endM;
	endTimeinfo.tm_year = endY - 1900;
	endTimeinfo.tm_isdst = 0;

	time_t start_time = mktime( &strtTimeinfo );
	time_t end_time = mktime( &endTimeinfo );

	//dif = difftime (end_time, start_time);
	dif = end_time - start_time;

	cout << "Time differenec = " << dif << endl;
	cout << endl;

	return end_time - start_time;
}
Line 79, dif should be of type time_t
 
time_t dif;

Also you have not put valid information in all your fields. Your year is set to -1900 which is wrong. You can't do this without putting a legal date into your timeinfo struct. The year should be something like 2010 - 1900. But you have the year as 0 - 1900.
ALRIGHTTT :D Got it
int Start Year = 2006
int Start Month = 6
int Start Day = 3
int end Year = 2010
int End month = 3
int End Day = 1
int Start Hour = 17
int Start Minits = 11
int Start Seconds = 45
int end Hour = 19
int End Minits = 56
int End Seconds = 45

time difference = 118205100


what does it represent 118205100?

this is the output now
and please before you leave today, could you please guide me through the rest of the assignment. When/from where to call this method? What after it returns the difference? How do I store it against the respective element? How do I find out which one of the duplicates has the minimum time difference and how do I push it back to the original vector?

Many Thanks
The time difference is the number of seconds between the two dates.

How come you have been given such a complex assignment being that you are such a novice?
This isnt a university assignment. I have recently started my job after graduation, just a month and they have asked me to go through the code they are working on and correct what is incorrect. Nobody briefed me about anything at all. I found these duplicates in the output file. They have extensively used STL where as I havent done anything before in STL. There is NOOOO one to help. I dont know why everyone says that "I DONT KNOW" as if it is some kind of a uni assignment and I may score more than they do. It is an application that everyone is working on. We are a SO CALLED team. I am of course a junior programmer, new to professional world. I have been trying to do this task for 2 weeks but without any success. THANK YOU A LOT for helping me. I want to do it as soon as possible before the boss notices it and say something to me that I might not like ;'(. Please help me do this task. PLEASEEE
I didn't think this task was a university assignment because of the nature of the data you are dealing with. It sounds like real-world data. But it is normal in a work environment for new programmers to have some guidance. Although I have worked places where people have been deliberately less helpful than they should be for whatever reasons of insecurity they might have.

It sounds like your having a bit of a baptism of fire. I think you are going to have to face the fact that you can't always deliver the software before the deadline. Be positive and mention how much progress you have made. Explain the difficulty in getting help from those who should be helping you. But do it tactfully. Don't come over like you're criticising anyone even if they thoroughly deserve it. Mention how 'busy' they have been or whatever...

Write a progress report. Even if they didn't ask for it. Break the task down into logical units and mark the parts you have achieved as [DONE]. Make a note of information you need or people you need help from to complete each of the remaining parts on the report. Print it up. Hand it to your boss. Tell him you need a few more days to complete.

The best thing you can do in my opinion is let your boss know that you are moving forward and that you are being well organised and keeping him informed of your progress. While bosses often don't like to be flooded with information they do like to be kept in the loop.


With your data. Am I right in that from the list of duplicates for a given end-time you are seeking to select only one of them, and you are selecting the one whose start time is closest to the end time?
Can you post some real data?
Thank you sooo much for your guidance. That's so very kind of of you. I am happy and relieved :). I will write a progress report and will also let my boss know so that he knows that I have been doing and have done something. You sound like my elder brother other wise everyone goes against me, even my younger sister :P.

Thank you so much again :) BUT please do help me :D, dont leave me alone in this baptism of fire :D
With your data. Am I right in that from the list of duplicates for a given end-time you are seeking to select only one of them, and you are selecting the one whose start time is closest to the end time?

That's exactly what I want. I want just one copy of each record, in case of duplicates selecting the one whose start time is closest to the end time. I will send you the real data tomorrow morning as soon as I turn my PC on :P, first thing in the morning.

aaaaaaaaaaaaaaaaaaaaaaa I am happy hahaha, jumping with joy.
I think this may do roughly what you need. I think ultimately we can put the filtered duplicates directly into the vector with all the originally unique values:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <ctime>
#include <iomanip>
#include <cstdlib>

bool get_time(const std::string& s, time_t& time)
{
	tm date;
	date.tm_isdst = 0;
	std::istringstream iss(s);
	std::string line;
	char c;
	if(!(iss >> date.tm_year >> c)) return false;
	if(!(iss >> date.tm_mon >> c)) return false;
	if(!(iss >> date.tm_mday >> c)) return false;
	if(!(iss >> date.tm_hour >> c)) return false;
	if(!(iss >> date.tm_min >> c)) return false;
	if(!(iss >> date.tm_sec)) return false;
	date.tm_year -= 1900;
	time = mktime(&date);

	return true;
}

bool time_diff(const std::string& a, const std::string& z, time_t& diff)
{
	time_t atime;
	time_t ztime;
	if(get_time(a, atime) && get_time(z, ztime))
	{
		diff = std::abs(ztime - atime);
		return true;
	}
	return false;
}

struct MyPred
{
	std::string a;
	std::string x;
	std::string y;
	std::string z;

	MyPred(const std::string& a, const std::string& x, const std::string& y, const std::string& z): a(a), x(x), y(y), z(z) {}

	bool operator==(const MyPred& p) const
	{
		return x == p.x && y == p.y && z == p.z; // a == p.a &&
	}

	bool operator<(const MyPred& p) const
	{
		//if(a < p.a) return true;
		//if(a > p.a) return false;
		if(x < p.x) return true;
		if(x > p.x) return false;
		if(y < p.y) return true;
		if(y > p.y) return false;
		if(z < p.z) return true;
		if(z > p.z) return false;
		return false;
	}
};


int main()
{
	std::vector<MyPred>* vPred = new std::vector<MyPred>;
	vPred->push_back(MyPred("2010/01/01 00:00:00", "a", "a", "2010/01/01 00:00:00"));
	vPred->push_back(MyPred("2010/01/02 00:00:00", "a", "b", "2010/01/02 00:00:00"));
	vPred->push_back(MyPred("2010/01/03 00:00:00", "b", "a", "2010/01/03 00:00:00"));
	vPred->push_back(MyPred("2010/01/04 00:08:00", "b", "b", "2010/01/04 00:10:00"));
	vPred->push_back(MyPred("2010/01/04 00:11:00", "b", "b", "2010/01/04 00:10:00"));
	vPred->push_back(MyPred("2010/01/04 00:14:00", "b", "b", "2010/01/04 00:10:00"));

	// The values need to be in order for equal_range() to work
	std::sort(vPred->begin(), vPred->end());

	std::vector<MyPred> uPred; // values that were always unique
	std::vector<MyPred> dPred; // values that were duplicated

	std::pair<std::vector<MyPred>::iterator, std::vector<MyPred>::iterator> ret;

	for(std::vector<MyPred>::iterator i = vPred->begin(); i != vPred->end(); i = ret.second)
	{
		ret = std::equal_range(i, vPred->end(), *i);

		if(ret.second - ret.first != 1) // duplicates
		{
			time_t diff; // general diff register
			time_t min_diff; // register smallest difference in time
			std::vector<MyPred>::iterator min_iter; // register corresponding iterator
			std::vector<MyPred>::iterator j; // range iterator

			// initialise min register to the first difference
			// in our range
			time_diff(ret.first->a, ret.first->z, min_diff);

			// iterate over the range of duplicates finding a
			// the smallest difference as we go and noting the
			// corresponding iterator
			for(min_iter = j = ret.first; j != ret.second; ++j)
			{
				// get difference
				time_diff(j->a, j->z, diff);

				// is it smaller than our current minimum?
				if(diff < min_diff)
				{
					min_diff = diff; // keep it as our new minimum
					min_iter = j; // remember the iterator to the smallest difference so far
				}
			}
			// push the value recorded to be smallest onto our vector
			dPred.push_back(*min_iter);
		}
		else if(ret.second - ret.first == 1)
		{
			uPred.push_back(*i);
		}
	}

	std::cout << "vPred: Sorted input\n";
	for(std::vector<MyPred>::iterator i = vPred->begin(); i != vPred->end(); ++i)
	{
		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
	}

	std::cout << "dPred: Only the values that were duplicated\n";
	for(std::vector<MyPred>::iterator i = dPred.begin(); i != dPred.end(); ++i)
	{
		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
	}

	std::cout << "uPred: Only the values that were unique\n";
	for(std::vector<MyPred>::iterator i = uPred.begin(); i != uPred.end(); ++i)
	{
		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
	}

	delete vPred;
}
vPred: Sorted input
[2010/01/01 00:00:00, a, a, 2010/01/01 00:00:00]
[2010/01/02 00:00:00, a, b, 2010/01/02 00:00:00]
[2010/01/03 00:00:00, b, a, 2010/01/03 00:00:00]
[2010/01/04 00:08:00, b, b, 2010/01/04 00:10:00]
[2010/01/04 00:11:00, b, b, 2010/01/04 00:10:00]
[2010/01/04 00:14:00, b, b, 2010/01/04 00:10:00]
dPred: Only the values that were duplicated
[2010/01/04 00:11:00, b, b, 2010/01/04 00:10:00]
uPred: Only the values that were unique
[2010/01/01 00:00:00, a, a, 2010/01/01 00:00:00]
[2010/01/02 00:00:00, a, b, 2010/01/02 00:00:00]
[2010/01/03 00:00:00, b, a, 2010/01/03 00:00:00]
YESS This is what I need. THANKKK YOUUUUU :D

I added another duplicate for "a, a" but it still looks for just "b, b":

 
vPred->push_back(MyPred("2010/01/01 00:10:00", "a", "a", "2010/01/01 00:15:00"));


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
if(ret.second - ret.first != 1) // duplicates
		{
			time_t diff; // general diff register
			time_t min_diff; // register smallest difference in time
			std::vector<MyPred>::iterator min_iter; // register corresponding iterator
			std::vector<MyPred>::iterator j; // range iterator

			// initialise min register to the first difference
			// in our range
			time_diff(ret.first->a, ret.first->z, min_diff);

			// iterate over the range of duplicates finding a
			// the smallest difference as we go and noting the
			// corresponding iterator
			for(min_iter = j = ret.first; j != ret.second; ++j)
			{
				// get difference
				time_diff(j->a, j->z, diff);

				// is it smaller than our current minimum?
				if(diff < min_diff)
				{
					min_diff = diff; // keep it as our new minimum
					min_iter = j; // remember the iterator to the smallest difference so far
				}
			}
			// push the value recorded to be smallest onto our vector
			dPred.push_back(*min_iter);
That doesn't look like a duplicate to me. The end time is unique? I thought to be a duplicate everything had to be identical except the start time?
YYYYESSSSSSSSSSS YOU ARE ABSOLUTELY RIGHT. I am OUT OF MY MIND. Let me smash my head into the wall. I am so sorryyyy. I think my mind is asleeeppp now. You now know my assignment better then myself :D. Thank you so much. I will now change your code according the code that they have already been working on. I am sure it will work :) and will give you the good news :P. I will show it to my senior colleague first to let him know that YES I HAVE DONE IT :D (of course you have done it).

THANKS THANKS THANKSSS. I wish I could send you some gift :D
You need to study this code and thoroughly understand it. Let your superior know that you found the task difficult because you are still learning the standard libraries and their system. Tell them you sourced help from internet forums, that is simply being resourceful. You don't want them thinking you are more capable than you are but let them know that you are pro-actively learning what you need to know.

Also, don't forget the technique I used to overwrite the original vector. That will save memory and speed things up a little. But if you do apply that to your routine be careful how you do it because its more dangerous. The main thing you might forget is resizing your original vector at the end because it will have shrunk. In order to do that your struct needs a default constructor (even though it won't be invoked).

Pages: 123456