Porting C++ project from VS 6.0 to VS 2010 brought to slower code

I ported one project from Visual C++ 6.0 to VS 2010 and found that a critical part of the code (scripting engine) now runs in about three times slower than in was before.
After some research I managed to extract code fragment which seems to cause the slowdown. I minimized it as much as possible, so it ill be easier to reproduce the problem.
The problem is reproduced when assigning a complex class (Variant) which contains another class (String), and the union of several other fields of simple types.

Playing with the example I discovered more "magic":
1. If I comment one of unused (!) class members, the speed increases, and the code finally runs faster than those complied with VS 6.2
2. The same is true if I remove the "union" wrapper"
3. The same is true event if change the value of the filed from 1 to 0

I have no idea what the hell is going on.
I have checked all code generation and optimization switches, but without any success.

The code sample is below:
On my Intel 2.53 GHz CPU this test, compiled under VS 6.2 runs 1.0 second.
Compiled under VS 2010 - 40 seconds
Compiled under VS 2010 with "magic" lines commented - 0.3 seconds.

The problem is reproduces with any optimization switch, but the "Whole program optimization" (/GL) should be disabled. Otherwise this too smart optimizer will know that out test actually does nothing, and the test will run 0 seconds.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include		<windows.h>
#include		<stdio.h>
#include		<stdlib.h>

class String
{
public:
	char	*ptr;
	int		size;

	String() : ptr(NULL), size( 0 ) {};
	~String() {if ( ptr != NULL ) free( ptr );};
	String& operator=( const String& str2 );
};

String& String::operator=( const String& string2 )
{
	if ( string2.ptr != NULL )
	{
		// This part is never called in our test:
		ptr = (char *)realloc( ptr, string2.size + 1 );
		size = string2.size;
		memcpy( ptr, string2.ptr, size + 1 );
	}
	else if ( ptr != NULL )
	{
		// This part is never called in our test:
		free( ptr );
		ptr = NULL;
		size = 0;
	}

	return *this;
}


struct Date
{
	unsigned short			year;
	unsigned char			month;
	unsigned char			day;
	unsigned char			hour;
	unsigned char			minute;
	unsigned char			second;
	unsigned char			dayOfWeek;
};


class Variant
{
public:
	int				dataType;
	String			valStr; // If we comment this string, the speed is OK!
	
	// if we drop the 'union' wrapper, the speed is OK!
	union
	{
		__int64		valInteger;

		// if we comment any of these fields, unused in out test, the speed is OK!
		double		valReal;
		bool		valBool;
		Date		valDate;
		void		*valObject;
	};

	Variant() : dataType( 0 ) {};
};


void TestSpeed()
{
	__int64				index;
	Variant				tempVal, tempVal2;

	tempVal.dataType = 3;
	tempVal.valInteger = 1; // If we comment this string, the speed is OK!

	for ( index = 0; index < 200000000; index++ )
	{
		tempVal2 = tempVal;
	}
}

int main(int argc, char* argv[])
{
	int			ticks;
	char		str[64];

	ticks = GetTickCount();

	TestSpeed();

	sprintf( str, "%.*f", 1, (double)( GetTickCount() - ticks ) / 1000 );
	
	MessageBox( NULL, str, "", 0 );

	return 0;
}

When you have performance problems, you should compare the generated assembler code (/Fa).
A random guess, what happens when you rename your String class?

The problem is reproduces with any optimization switch, but the "Whole program optimization" (/GL) should be disabled. Otherwise this too smart optimizer will know that out test actually does nothing, and the test will run 0 seconds.

That's usually not a good idea. That the test takes longer than 0.0 seconds shows that some very important optimizations are not performed without /GL, such as some inlining. Technically, there's nothing in here that would require whole program optimization, but I suppose VC++ has its own way of doing things.

FYI, this test takes 0.3 seconds on my machine using VS2008 at /O2 (without including windows.h).
The assembler code shows that both Variant::operator= and String::operator= are inlined, although it also shows that the non-inlined variant of Variant::operator= is suboptimal (copying each union member separately instead of copying the union block in one go). All the loop is doing is setting a register to 0 (ptr of an imaginary String instance) and calling free if it isn't 0 (haha), which is also suboptimal, of course.
Last edited on
The solution has been found.

The problem was in default copy constructor generated by the compiler for "union" member.
VC++ 6.0 performs byte-to-byte copying, which is logical, because union members cannot have constructors.
VC++ 2010, seeing that the union contains float member, adds additional copying instruction for this float member (FLD, FSTP).

Since in my example the union uses only __int64 member, the float member contains some denormalized junk value, which causes slow copying.
This explains why setting __int64 member to zero speeds up execution.

I have found that /fp:strict compiler options solves the problem. Writing own copy constructor which do byte-to-byte copying of the union, also solves the problem.

This is similar to to this problem
http://connect.microsoft.com/VisualStudio/feedback/details/238546/incorrect-code-generation-for-union-copy-assignment-in-visual-studio-2005-c-compiler
but I am nor exactly sure.
That's interesting. How does it know it needs to copy valReal? I would have thought that valDate would eventually lead to trouble, being a struct held by value in the union; but I guess not (yet).

Also, I thought the optimizer ought to work out that tempVal2 isn't being used and remove the loop. VS2008's optimizer is quite good at that sort of thing.
Last edited on
Topic archived. No new replies allowed.