Random numbers

Pages: 123

Oh. Maybe I could explain my goal. I am not trying to find a best way to generate some random numbers. I really like this one given above :

https://cplusplus.com/forum/windows/284587/#msg1232737

I am thinking about the process - a way to understand the main principle.
I want to understand... searching/frisking/excavating - no more. Thank you for your sharing. I appreciate it ++

kigar64551 (853)

Still, using whatever() % N to map the output of whatever() into the 0 to N-1 range works. That's trivial.

But, if whatever() is not a "random" source, then whatever() % N will not be random either! (despite being inside the desired range)

So, if you want to use the whatever() % N approach, then at least pick a whatever() that is a "proper" random source.

(And the system clock certainly is not a "proper" random source)

Last edited on

Geckoo (362)

I agree Kigar64551 - system clock is not a good way to do the work. The next alternative is just another stupidity from me - and forgive me about that. I though that we could use a Random Access Memory address in order to generate a seed which we could "tortured" to generate a random value. I could be nailed to the door of the C++ church for this code, but it works... Just another sharing ++

#include <iostream>

class testRand
{
public:
    int randInRange(int min, int max)
    {
        return rollTheDice() % static_cast<unsigned long long>(max + 1 - min) + min;
    }

private:
    uintptr_t rollTheDice()
    {
        uintptr_t rnd = reinterpret_cast<uintptr_t>(&this->lol);

        rnd = ((rnd * rnd) / 100) % 10000;
        delete lol; // required ???

        return rnd;
    }

    const int* lol = new int; // as  a seed
};

int main()
{
    for (int i = 0; i < 10; ++i)
    {
        testRand* tr = new testRand();
        std::cout << "Random Value : " << tr->randInRange(0, 100) << std::endl;
        delete tr;
    }  

    return 0;
}

Random Value : 32
Random Value : 77
Random Value : 28
Random Value : 7
Random Value : 57
Random Value : 15
Random Value : 18
Random Value : 63
Random Value : 93
Random Value : 54

Last edited on

kigar64551 (853)

Allocating the buffer for lol once (per object instance), but delete'ing it in every invocation of rollTheDice() certainly is wrong ☹️

Furthermore, the initial content of a buffer allocated on the heap, via new operator, is "undefined", but certainly not random!

Here "undefined" means that the C runtime does not give you any guarantees what the initial content of the buffer will be. But, for example, initializing the buffer will all zero's would be perfectly allowed — although that would be quite the opposite of random!

If you need a proper "seed", then use the OS entropy source, i.e. /dev/urandom on Linux/BSD or RtlGenRandom() on Windows.

Last edited on

Peter87 (11264)

kigar64551 wrote:
Here "undefined" means that the C runtime does not give you any guarantees what the initial content of the buffer will be.

The standard calls it an "indeterminate value" and accessing such a value leads to undefined behaviour.

Not sure if Geckoo's code has changed, but right now it's not running into this particular problem. Instead it reads the address of lol. Not the address stored in lol but the actual address where lol itself is stored. This makes the use of new/delete within the testRand class totally unnecessary. Could just get rid of lol all together and use this instead of &lol. Wouldn't make a difference.

Still, not the best source of randomness.
This is the sort of output I get when I run the program:

Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28
Random Value : 28

The number is different each time I run the program but each run always seems to produce the same "random" number (probably because it reuses the same memory location for all testRand objects).

Last edited on

kigar64551 (853)

You are right. He is not using the content of the heap-allocated buffer. And he is not even using the address of the heap allocated buffer. He is using the address of the variable "lol" where the address of the buffer is stored - which makes the new totally pointless!

The code probably should have been:

1
2

const int* lol = new int;
uintptr_t rnd = reinterpret_cast<uintptr_t>(this->lol);

But even that would not be random at all, since memory addresses are not random.

First of all, memory addresses are usually "aligned" to a multiple of 4 or 8, which means that the "lowest" bits of the memory address will always be 0. But even the "higher" address bits are far from being random. Neither is the initial content of the heap-allocated buffer!

int main()
{
    for (size_t i = 0; i < 100; ++i)
    {
        int *p = new int;
        printf("%016X -> %08X\n", reinterpret_cast<uintptr_t>(p), *p);
        //delete p;
    }
}

$ ./memtest.exe
000000004A3424C0 -> 4A342830
000000004A3424E0 -> 4A343840
000000004A343840 -> 4A326410
000000004A343860 -> 4A326410
000000004A343880 -> 4A326410
000000004A3438A0 -> 4A326410
000000004A3438C0 -> 4A326410
000000004A3438E0 -> 4A326410
000000004A343900 -> 4A326410
000000004A343920 -> 4A326410
000000004A343940 -> 4A326410
000000004A343960 -> 4A326410
000000004A343980 -> 4A326410
000000004A3439A0 -> 4A326410
000000004A3439C0 -> 4A326410
000000004A3439E0 -> 4A326410
000000004A343A00 -> 4A326410

Last edited on

deleted account xyzzy (5768)

Learn C++ has a decent tutorial/history lesson on how to generate pseudo-random numbers:

https://www.learncpp.com/cpp-tutorial/introduction-to-random-number-generation/

The follow-up lesson on using the C++ stdlib Mersenne Twister implementations is a good read, it delves into the intricacies of seeding:

https://www.learncpp.com/cpp-tutorial/generating-random-numbers-using-mersenne-twister/

Geckoo (362)

Good. Finally we come back to the main source - the Mersenne Twister. I tried many alternatives and some of them I shared with you, but really... I think that actually there is no better solution. A few lines of code, an efficient readability and it rocks. Thank you for the links ++

Last edited on

jonnin (11497)

MT is a little slow, it does a lot of ops per value created, but its all in what you want. There isnt much better, as you said.

On random memory locations :) In school, in DOS, we did a snowflake fractal assignment and I colored it using an uninitialized set of RGB off the call stack. This gave the same color at the same recursive depth, but different colors all around, sort of at random. It was a very pretty exploit of how things worked back then. These days, such would probably just give all black or something as a lot of memory is zeroed by the OS or (???) and its not as reliable that you can get interesting values.

Peter87 (11264)

jonnin wrote:
MT is a little slow, it does a lot of ops per value created, but its all in what you want.

I have heard this many times but my own tests have not really showed it being the case. At least not when mt19937 uses 32-bit integers.

https://quick-bench.com/q/JaL4oF7nwySbxd7yoQH71Celfbc

Unfortunately the standard types of the random generators are defined to use std::uint_fast32_t which ends up being std::uint64_t on 64-bit Linux. This leads to mt19937 using twice as much memory (5 KB instead of 2.5 KB) and to run much slower.

In the test above I have therefore defined my own type 'compact_mt19937' that uses std::uint32_t instead.

using compact_mt19937 = std::mersenne_twister_engine<std::uint32_t, 32, 624, 397, 31,
                             0x9908b0df, 11,
                             0xffffffff, 7,
                             0x9d2c5680, 15,
                             0xefc60000, 18, 1812433253>;

As you can see this actually beats std::rand() and the much simpler std::minstd_rand. I think it was a mistake by the standard to use std::uint_fast32_t here and if you care about performance you should do the same as I did above.

Just to be fair I did the same thing with minstd_rand and created 'compact_minstd_rand' but that doesn't seem to have had much of an impact.

Note that I used GCC above. Rerunning the benchmark with Clang libc++ shows a much closer result between mt19937 and minstd_rand. Don't pay too much attention to small difference because it can be noise. If you check "Clear cached results" and press "Run benchmark" you might get a slightly different result next time.

The above was just a "microbenchmark" and might not be representative of how it will perform in a real setting. It's possible that the compiler is not able to optimize as good when it's not called in a tight loop and the bigger memory usage might lead to cache misses in some situations.

To test a somewhat more real situation I modified my own game "Brum Brum Rally". The function that I tested was Map::generate(). Normally it keeps generating random "maps" until it finds one that satisfies the user's preferences, and if it takes too long time it will just return the best one so far. I modified it so that it always runs 100 000 iterations and measured how long time it took to run. The game uses std::rand() but to test I swapped it out for the other random engines. I ran it three times for each engine and got the result below:

        std::rand(): 319ms 337ms 320ms
    compact_mt19937: 204ms 204ms 203ms
compact_minstd_rand: 189ms 190ms 189ms

So minstd_rand was a little faster in this situation but not by a lot.

The above used the default compiler settings for the project (-O2 -march=native ...) but I know some people prefer -O3 so just to test I reran the tests with -O3 and got the following results:

        std::rand(): 330ms 314ms 313ms
    compact_mt19937: 191ms 191ms 192ms
compact_minstd_rand: 183ms 183ms 182ms

Not much difference.

These numbers gives a hint about what kind of performance difference you can expect, and to me it seems like the superior quality of mt19937 is well worth the potential small loss of speed, but it's possible that it will be different in other programs that use random numbers in a different way.

Last edited on

Geckoo (362)

Just a few words about my previous useless code. Apparently the Random Acces Memory is not so random that it pretends?

#include <iostream>

class testRand
{
public:
    int randInRange(int min, int max)
    {
        return rollTheDice() % static_cast<unsigned long long>(max + 1 - min) + min;
    }

private:
    uintptr_t rollTheDice()
    {
        uintptr_t rnd = reinterpret_cast<uintptr_t>(&this->lol);

        rnd = ((rnd * rnd) / 100) % 10000;
        delete lol; // required ???

        return rnd;
    }

    const int* lol = new int; // as  a seed
};

int main()
{
    for (int i = 0; i < 10; ++i)
    {
        testRand* tr = new testRand();
        std::cout << "Random Value : " << tr->randInRange(0, 100) << std::endl;
        delete tr;
    }  

    return 0;
}


Random Value : 42
Random Value : 2
Random Value : 36
Random Value : 2
Random Value : 52
Random Value : 43
Random Value : 2
Random Value : 1
Random Value : 95
Random Value : 9


Random Value : 19
Random Value : 86
Random Value : 2
Random Value : 100
Random Value : 8
Random Value : 54
Random Value : 8
Random Value : 86
Random Value : 86
Random Value : 8

Last edited on

Peter87 (11264)

Geckoo wrote:
Apparently the Random Access Memory is not so random that it pretends?

"Random-access" refers to the fact that you can access things "at random" with basically no difference in speed.

Arrays and std::vectors provide random-access to their elements because you can access any element directly by index. Compare this to std::list which cannot provide random-access because it has to step through one element after another until it reaches the element you're looking for.

Last edited on

kigar64551 (853)

As said before, your code reinterpret_cast<uintptr_t>(&this->lol) takes the address of the variable "lol" (inside the testRand object), not the address of the dynamically allocated int which is stored in "lol". So, that approach is pretty pointless, to begin with :-)

But, even if you would look at reinterpret_cast<uintptr_t>(this->lol), i.e. the address of the int that was dynamically allocated on the heap (via new operator), then it still would not be random at all! As said before, memory addresses are far from being random!

As said before, memory addresses are usually aligned, i.e. they are restricted to exact multiples of 4 or 8 !!!

The term "random access" in random access memory means that the memory can be read and written, as opposed to "read-only" memory or "write-only" memory. It does not mean that the (initial) content of that memory, or addresses pointing to that memory, are random!

(BTW: in German "random access" translates to "wahlfreier Zugriff", which means sth. akin to "can be accessed anywhere at any time")

Last edited on

Geckoo (362)

So there is some language abuse. The name seems to explain that the memory sections are available randomly - and it's not true. The german translation is really better. In French we say "mémoire vive" (dynamic or living memory) and in Russian запоминающее устройство с произвольным доступом (memory device with arbitrary access). Thank you for your clever and interesting explanations ++

Last edited on

Peter87 (11264)

https://en.wikipedia.org/wiki/Random_access

Peter87 (11264)

I mean that's the original explanation for why it's called RAM. We of course now have many other types of memory that also allow "random-access" even though we don't call them RAM.

Last edited on

jonnin (11497)

There are lots of old things where the name stuck but is a bit odd. WORM disks, remember those? The original CD writers (before the rewriteable ink came out) were really CDWORM but got named CDROM instead. Anyway, random in RAM just means you can jump to any random valid location instantly. It made sense at the time.

As for the MT benchmark, I never thought to try 32 bit, that is a very interesting test. It may be wired to implementation somehow, but its really good info, thank you.

Peter87 (11264)

Geckoo wrote:
The name seems to explain that the memory sections are available randomly - and it's not true.

Do you realize that RAM is a piece of hardware?

Your whole program (the machine instructions, the stack, the heap, etc.) are normally stored in RAM while it is running, but modern computers use virtual memory so there is nothing preventing the OS from swapping out some of it to disk for a while (usually only happens when you run out of free RAM). This does not affect the addresses you see in your program because those are virtual addresses, not real physical addresses. It's even possible that two programs use the same virtual addresses but they are being mapped to different physical memory locations.

That being said, the mapping of virtual memory is done in big chunks so it's likely that two virtual addresses that are close to each other are also close in physical memory.

So what decides what virtual memory addresses the objects get stored at?

Well, you probably already know that local variables are being stored on "the stack" and if you know how a stack works (that you can only "push" and "pop" at the top) then you should know why the addresses of local variables are not very random. For example, local variables declared in the same function normally ends up right next to each other, and a local variable declared in a loop will normally have the same address on each iteration.

As for objects created with new it's not so obvious. It's not like "hey, RAM give me a piece of memory to store my int!". Instead it's more like a function call, to a function written by your compiler/standard library implementers, to handle all this complicated stuff of keeping track of allocated memory, which is in use and which is free, and to find an appropriate piece of memory to let you use. I don't know exactly how it works, it depends on how it's implemented, but as far as I know there is no attempt at making the addresses appear "random". If the program doesn't have enough memory it can request more from the operating system (which is what handles the interaction between the hardware and the software) but this is done in large chunks so the addresses you get will mostly be down to how the new and delete "functions" happen to be implemented.

Last edited on

Peter87 (11264)

jonnin wrote:
As for the MT benchmark, I never thought to try 32 bit, that is a very interesting test. It may be wired to implementation somehow, but its really good info, thank you.

Note that there is a 64-bit version of Mersenne Twister. In the standard library it's called std::mt19937_64.

std::mt19937 is the 32-bit version of Mersenne Twister, so obviously it's meant to be implemented using 32-bit integers. Using 64-bit integers to implement it is not optimal.

The reason why the standard didn't specify the random generator types to use the fixed-sized integers (std::uintN_T) is because those are optional and might not be supported by all hardware (imagine a computer that used 24-bit bytes, then there would be no way to implement a 32-bit integer).

That's why they decided to use the "fast" integer types (std::uint_fastN_T) instead. Those are guaranteed to be available everywhere.

But the problem with the "fast" integer types is that it's not always obvious to decide which type is the fastest. It depends on how you use them. And due to ABI compatibility they essentially need to be same size for the whole CPU family. That means that std::uint_fast32_t has to be the same size on all Linux computers running a x86-64 CPU even though there are many such CPUs, old and new, from many different vendors with possibly varying performance characteristics. It seems like most implementations just settled for making std::uint_fast32_t a typedef for long.

I think it would have been better if the standard defined the random generators to use the std::uint_leastN_t types instead, or leaving up to the implementation to decide.

But it's too late to change now. Those who care will just have to do what I did and look up the typedefs on this page
https://en.cppreference.com/w/cpp/numeric/random#Predefined_random_number_generators
and create their own with the integer types replaced.
Or they can use another library.

Last edited on

deleted account xyzzy (5768)

Peter87 wrote:
imagine a computer that used 24-bit bytes, then there would be no way to implement a 32-bit integer

No need to imagine a 24-bit computer.

http://vipclubmn.org/CP24bit.html

There were 30-bit computers, I used to maintain one type back in the mid 1980's. The CP-901.

http://vipclubmn.org/CP30bit.html

Pages: 123