Optimising Merge Sort

For a bit of fun I coded my own implementation of the merge sort algorithm, after watching a lecture from ocw.mit.edu (really cool site!) which discussed it.

My code is below (minus a main which just populates a list randomly and calls mergesort on it). I tested it in comparison to insertsort (which I also coded) and it works much faster on my machine for large lists (N > 1000 or something), as expected.

My question: Does anyone have any suggestions for optimising it further?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#define SUCCESS 1

template <typename T>
int mergesort(T * list, const int numElements)
{
	// If our list has 0 or 1 elements, return (i.e. already sorted)
	if(numElements < 2)
		return SUCCESS;

	// We need to 'divide and conquer' our list
	// Calculate the number of elements we'll need in each of our two smaller lists
	int numInBottom = (int)ceil((double)numElements/2.0);	// Any way to do this better with ints?
	int numInTop = numElements - numInBottom;

	// Run merge sort on both
	mergesort<T>(list, numInBottom);	// Bottom list
	mergesort<T>(list + numInBottom, numInTop);		// Top list

	// Merge the results
	merge<T>(list, list+numInBottom, numInBottom, numInTop);

	return SUCCESS;
}

// Merges two sorted lists into one. Places the result at the start of the 'bottom' list.
// After returning successfully, the 'bottom' list has (numInBottom + numInTop) elements.
// Assumes parameters 'bottom' and 'top' are pre-sorted.
template <typename T>
int merge(T * bottom, T * top, const int numInBottom, const int numInTop)
{
	// Indexers for bottom and top lists
	int bIndex = 0, tIndex = 0;

	// Create a new collection to store our result temporarily
	T * result = new T[numInBottom+numInTop];

	while((bIndex < numInBottom) && (tIndex < numInTop))
	{
		// Find which of our current pair of elements should be put into our result next
		if(bottom[bIndex] < top[tIndex])
			result[(bIndex++)+tIndex] = bottom[bIndex];
		else
			result[bIndex+(tIndex++)] = top[tIndex];
	}

	// At this point one of the lists is empty.
	// Concatenate what's left of the other one onto the end of our result.
	if(bIndex < numInBottom)
		memcpy(result + (bIndex+tIndex), bottom + bIndex, sizeof(T)*(numInBottom - bIndex));
	else
		memcpy(result + (bIndex+tIndex), top + tIndex, sizeof(T)*(numInTop - tIndex));

	// Copy our result back to the original list
	memcpy(bottom, result, sizeof(T)*(numInTop+numInBottom));

	// Deallocate our dynamically-allocated memory
	delete [] result;

	return SUCCESS;
}
does adding a thread reduce the complexity from n log n to n/2 * log n ?
Hmmmm...I'm not sure. Anyone? Nice idea though, I might see if I can find something on that.

I remember reading on this site somewhere that my style on line 12 (converting to double then back to int) is to be avoided generally. If I have two ints a and b, what's the best way of getting ceil (a/b)? There's no overload for ceil that takes two ints... :(
No need for ceil here, or for any other bit magic. Simply dividing with 2 will do:
int numInBottom = numElements/2;
Most compilers will further optimize this with bitshifting int numInBottom = numElements>>1;.

EDIT: I'm not sure how safe it is to use memcpy. Memcpy doesn't call constructors, it only makes binary copies of the object. If the object has dynamic memory management, (or maybe inheritance?), this sort function might break it. I recommend using std::copy() instead, which will boil down to memcpy() anyway wherever possible.
Last edited on
Cant you just write (this equivalent to your ceil version, but R0mai is correct.)

 
((numElements%2 == 0) ? (numElements/2) : (numElements/2 + 1))


Sorry i do not have much time today, i will write some assumption later or tomorrow.

Maikel
Last edited on
http://www.cplusplus.com/reference/algorithm/merge/
This does exactly what your merge function does, only faster and safer.

I made this based on your original:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
template<typename T>
void mergesort(T * list, const int numElements) {
    if (numElements < 2)
        return;

    int numInBottom = numElements >> 1;
    int numInTop = numElements - numInBottom;

    mergesort<T> (list, numInBottom);
    mergesort<T> (list + numInBottom, numInTop);

    T * result = new T[numInBottom + numInTop];
    std::merge(list, list + numInBottom, list + numInBottom, list + numInBottom + numInTop, result);
    std::copy( result, result + numInBottom + numInTop, list );

    delete [] result;
}


It runs ~5% faster than your original.
I think the bottleneck might be in the dynamic memory allocation.

Further improvement :
Using iterators (as in STL algorithms) instead of pointers will make more general and easier to use.
Last edited on
All great info...thanks guys!
Topic archived. No new replies allowed.