Is this performance gain possible?

I have an application I'm working on which does a lot of data processing. It's threaded, and the number of threads used in a given run can be set as a command line argument.

Performance is measured in iterations per second, or how many loops the program completes every second.

With 128 threads, I can get about 3.5 million ips, the processor usage is usually between 95 and 99 percent constantly.

Recently, I got the idea to increase the number of threads by a large margin. I tried using 51,200 threads, and the number of iterations per second was reported as 451 million. (Memory usage is also several hundred times greater with the increased thread count.) A run with this number of threads results in the process consuming a constant 99% of the CPU.

Which begs the question, is this feasible, or is the program somehow incorrectly calculating the number of iterations per second with the higher thread count?

I'm confused, the way I always thought of threading was that eight threads working on the same task should theoretically yield the same performance as eight hundred. How could this kind of performance gain be possible?
Last edited on
800 threads would be slower than 8 because of all the overhead (unless you have 800 cores). It is clearly reporting a wrong number, probably because the clock you are using is on a per-thread basis, so each thread gets a full second of operation, and all that work is reported as happening in a second whereas it actually occurs over many seconds.

You will get maximal performance using as many threads as the number of cores on your system (or perhaps one less, leaving one for the OS) .
Last edited on
I tried using 51,200 threads
For the love of all that's holy! I mean, seriously. 128 is already way too much, but 50k? You weren't actually hoping to run a supercomputer with a desktop computer*, were you?

The OS is usually idle or near-idle, and even when it isn't, its operations have a higher priority than the user's. For n cores, the maximum number of threads that could possibly give a performance gain (because not all problems can be efficiently parallelized) is n. Anything more is just a waste of memory.


*Actually, Nvidia now manufactures desktop supercomputers with 128-960 threads. The 128 threads model is just a graphics card which you can plug into any motherboard (if I'm reading the Wikipedia article correctly).
Unless you are using file IO functions, like fread and fwrite, for example. One thread can perform calculations on the CPU while the other thread is waiting (because the hard disk is quite slow).

Try using the multimedia timer (winmm.lib) if you need a higher precision.
128 is already way too much, but 50k?

Still, it's an interesting experiment, though I'm amazed it ran at all.

OP: Note that you only got 128 times the apparent performance gain (remembering that there is actually no gain), but with 400 times the number of threads. That discrepancy is an indication of the overhead. This means that your program actually ran about 3 times slower, which is the kind of result you would expect.

Last edited on
Topic archived. No new replies allowed.