Time Precision in Gaming: GNU MPFR vs. GNU MP Bignum

Hello everyone,

I'd like to discuss an interesting comparison between the GNU MPFR (Multiple Precision Floating-Point Reliable) Library and the GNU MP Bignum Library, specifically in terms of their precision when using rdtsc as a source of time for gaming.

As we know, in gaming, precision timing is paramount, especially in high-performance games where latency or a few microseconds can dramatically impact gameplay. Here, the selection of the right library can significantly improve precision and performance.

The key difference between the GNU MPFR and GNU MP Bignum libraries lies in their core functionalities: while GNU MP Bignum is built to handle large integer operations, GNU MPFR is designed for high-precision floating-point arithmetic. This distinction is crucial when considering precision for time tracking.

Firstly, the MPFR library supports exact rounding, which is not available in MP Bignum. Given that time measurement often involves fractional values, MPFR's ability to handle these via floating-point arithmetic and exact rounding can result in more accurate timing data.

Moreover, the MPFR library follows the IEEE 754-2008 standard for floating-point arithmetic, which assures well-defined semantics for every operation and every rounding mode. This rigor can lead to enhanced precision and consistency, vital for game timing where exactness is required.

On the other hand, although MP Bignum is robust for large integer operations, it does not maintain these exact semantics for every operation and may be less precise when dealing with non-integer or fractional time units, which are common in game timing.

To put it succinctly, when using rdtsc as a source of time in gaming, the GNU MPFR library seems to offer higher precision than the GNU MP Bignum library due to its specific support for high-precision floating-point arithmetic and adherence to strict standards.

That being said, these libraries serve different purposes and both have their strengths. However, for precision time tracking in games, GNU MPFR appears to be a more suitable choice.

I welcome any thoughts, experiences, or insights on this subject. Looking forward to a fruitful discussion!

Best,
[haha01]
As rdtsc() works in whole numbers (EDX and EAX registers used as a 64 bit counter) why do you need floating-point arithmetic for timing?
Neither is a "big number" library required to do 64-Bit integer arithmetic!

Anyway, there are a lot of problems with using RDTSC directly. It counts CPU cycles, not time in terms of (micro)seconds. This means that it will run slower or faster, when the CPU frequency changes. And, on all modern processors, the CPU frequency changes all the time! Furthermore, in general, RDTSC is not synchronized between different CPU (cores). So, if your thread gets moved from one CPU (core) to another, which can happen at any time, the RDTSC values that you see are likely to be discontinuous...

You probably want to look into clock_gettime() with CLOCK_MONOTONIC on Linux/Unix, or QueryPerformanceCounter() on Windows:
https://linux.die.net/man/3/clock_gettime
https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter

See also:
https://learn.microsoft.com/en-us/windows/win32/dxtecharts/game-timing-and-multicore-processors

Also be aware: Just because the clock value is returned in units of microseconds or even nanoseconds, this does not mean that the clock actually has a resolution (precision) of 1 microsecond or even 1 nanosecond. It's probably a lot "coarser".
Last edited on
@seeplus

The use of floating-point arithmetic in the timing code is essential for transforming raw CPU cycles, obtained from the RDTSC instruction, into a time value that is more universally understandable, like seconds or milliseconds.

When you fetch the current CPU cycle count via RDTSC, the returned value is essentially the number of clock cycles since the CPU was reset. However, this value is specific to the machine it's executed on and the number of clock cycles isn't inherently meaningful as a measurement of time.

By knowing the clock frequency of the CPU (how many cycles are executed per second), we can convert the raw cycle count into a duration. This is done by dividing the cycle count by the clock frequency, which gives us a time duration in seconds.

However, the division here necessitates the use of floating-point arithmetic. This is because the clock frequency is typically in the order of billions (GHz), and the result of this division will often be a fraction when the cycle count is less than the clock frequency (i.e., the measured duration is less than a second).

Furthermore, the floating-point representation allows for high-precision timing, which is critical in performance measurement, benchmarking, or other scenarios where minute differences in execution time matter. the MPFR/GMP library is used for these floating-point operations to provide even higher precision than standard double-precision floating point can offer. This may be necessary for extremely fine-grained timing measurements.
@kigar64551

While it's true that using the RDTSC instruction can come with some complications due to variable CPU frequencies and lack of synchronization across different CPU cores, these potential issues are acknowledged and addressed in my approach.

Firstly, my timing measurements are consistently performed on a single, specific core to prevent discrepancies that might arise due to switching between different cores during execution. By "pinning" the measurement to the first core, I ensure that there are no discontinuities in the RDTSC values due to thread migration.

Additionally, I am fully aware of the limitations imposed by dynamic CPU frequencies. However, this timing method has proven to be extremely effective in the context of gaming platforms, where the raw performance and high-precision timing that RDTSC provides can be of great benefit. The RDTSC instruction provides a high-resolution, low-overhead method of measuring time, which is ideal for the performance-critical context of gaming.

Of course, it's important to keep in mind that while this approach works well in my specific use case, it may not be universally applicable or recommended for all scenarios. The use of RDTSC requires a nuanced understanding of its limitations and potential pitfalls to be used effectively.

In conclusion, the RDTSC instruction is a powerful tool for timing when used with an understanding of its intricacies and careful consideration of its potential limitations.
If you insist that time is expressed in units of "seconds", then obviously you will need floating-point math in order to retain millisecond (or microsecond) information. But, just as well, you can decide to represent time as units of "milliseconds", "microseconds" or even "nanoseconds" – in which case using integers will provide sufficient resolution (precision) for all practical purposes. For example, Windows typically uses a 64-Bit integer counter of "100-nanosecond intervals" (since January 1, 1601) to represent time values:
https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-filetime

Similarly, Linux/Unix uses timespec with a "whole seconds" field plus a "nanoseconds" (rest) field, both of which are integer values:
https://linux.die.net/man/3/clock_gettime

Of course, you need to take care when converting from "CPU cycles" (or "timer ticks") to the desired time unit, but it can be done with integer math just fine! This is how MSVCRT computes the clock() value from the "high precision" timer value and its frequency:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Scales a 64-bit counter from the QueryPerformanceCounter frequency to the
// clock() frequency defined by CLOCKS_PER_SEC.
static long long scale_count(long long timer_count)
{
    long long scaled_count = (timer_count / source_frequency) * CLOCKS_PER_SEC;

    // To minimize error introduced by scaling using integer division, separately
    // handle the remainder from the above division by multiplying the left-over
    // counter by the destination frequency, then diviting by the input frequency:
    timer_count %= source_frequency;
    scaled_count += (timer_count * CLOCKS_PER_SEC) / source_frequency;

    return scaled_count;
}

Here you could replace CLOCKS_PER_SEC with whatever you like, e.g. use 1000000, if you want time in "microsecond" units.
Last edited on
thank you but rdtsc give me happiness in gaming the precision from MPFR is just brilliant
being a topic on windows, the biggest issue I have with RDTSC is that you can't use it in 64 bit visual studio, because that does not allow for inline assembly, forcing you to find some way to deal with the issue. Nothing I do is worth all the aggravations of trying to use rdtsc these days. It may be the best way, and worth it for some users in some code, but the high res clock provided now does all I need with less hand waving and 'what if the next cpu does blah, will it still work' concerns.

If it works for you for what you need to do and is the best thing you can find for the job, have at it. But IMHO your code is at risk for fragility, portability, reusability, and maintenance concerns, so whatever you are doing needs to be worth those risks.

rdtsc was just about the only way and a true godsend back in the single core cpu 300mhz pentium era.
Last edited on
Here's a simple procedure written in MASM that retrieves the timestamp counter:

1
2
3
4
5
6
7
8
9
10
.code

getTSC	proc
rdtsc
shl rdx, 32
or rax, rdx
ret
getTSC	endp

end


return full 64 bit rdtsc and works amazing !

i also disable ACPI interrupts directly from the interrupt controller which give me no interrupt when reading the rdtsc instruction
https://wiki.osdev.org/APIC

by raising Task Priority Register to max 0xff no interrupts occur

u can do it through cr8 as well + disable hardware interrupts but i found out its not much necessary to disable hardware interrupts cause its add latency
1
2
3
4
5
6
auto currentIrql = __readcr8();
__writecr8(HIGH_LEVEL);
_disable();
getTSC();
_enable();
__writecr8(currentIrql);


now rdtsc reading correctly without any interrupts :)
Last edited on
are you using it for multiple seconds of data? If its all subsecond, you only need the low 32 bits and can probably write it to do that a little faster.
i need full precision from rdtsc and only 64 bit do it
Topic archived. No new replies allowed.