[try Beta version]
Not logged in

 
is there any bias random number generator in c+++

Nov 21, 2022 at 1:16am
Is there Bias random number generator to use in c++
that can skew the generated numbers towards one end?

for e.g. If I generate 100 random numbers between 1 to 1500 then I want most of the number generated to be below 400.
Is it possible to do it in c++?
Nov 21, 2022 at 1:27am
yes. Some of the more useful ones generate floating point instead of integers, though, like the piecewise over intervals.
it may be easier to manually do this, but you need to define your terms carefully. what means most, more than 50% but does it really mean 75%? 90%? 53.25%?
do you need a specific curve fitting? Something else?

if trying to fit the decimal piecewise isn't useful
I mean this would do what you asked:
get a random value from say 1-100.
if its more than 53, generate a value from 1-1500 //or 400-1500 if you prefer.
if its < 53, generate a value from 1-400
but its crude because your requirements are not firm enough to dig deeper...
any additional info?
Last edited on Nov 21, 2022 at 1:40am
Nov 21, 2022 at 1:47am
Let's say I have I need to generate 100 numbers with minimum value of 0 and maximum value of 1440. The mean of the 100 numbers is 150 and the standard deviation is 25.

Is there any way to generate such numbers that conform to the above rule?
Last edited on Nov 21, 2022 at 1:47am
Nov 21, 2022 at 2:22am
You should look at the various distributions available in <random>, there are quite a number.

https://en.cppreference.com/w/cpp/numeric/random

Specifically look at what std::normal_distribution offers, mean and standard deviation are selectable.

https://en.cppreference.com/w/cpp/numeric/random/normal_distribution

C++ offers a wide assortment of ways to generate pseudo-random numbers compared to what the C stdlib provides, I'd suggest you spend some time playing around with what's available.

Neither C nor C++ has random number generation capabilities that are cryptography robust, the generators and distributions generate by known and easily repeatable patterns.

For a bit of a refresher course on the basics of C++ random number generation:

https://www.learncpp.com/cpp-tutorial/introduction-to-random-number-generation/
https://www.learncpp.com/cpp-tutorial/generating-random-numbers-using-mersenne-twister/
Nov 21, 2022 at 10:26am
Worldtreeboy wrote:
Let's say I have I need to generate 100 numbers with minimum value of 0 and maximum value of 1440. The mean of the 100 numbers is 150 and the standard deviation is 25.
George P wrote:
Specifically look at what std::normal_distribution offers, mean and standard deviation are selectable.

When you mention "mean" and "standard deviation" it certainly sounds like you would want a normal distribution.

1
2
3
4
5
6
7
std::mt19937 rng{std::random_device()()};
std::normal_distribution<> dist(150, 25);
for (int i = 0; i < 100; ++i)
{
	int n = std::round(dist(rng));
	std::cout << n << "\n";
}

But the question is how do we limit it? The range of a normal distribution is unlimited.

If you just clamp the values ...
 
int n = std::clamp<int>(std::round(dist(rng)), 0, 1440);
... you'll get an unproportionate number of values that are equal to the max and min values (especially the min in this case).

Another approach is to keep generating a new number until you get one that is within the range ...
1
2
3
4
5
int n;
do
{
	n = std::round(dist(rng));
} while (n < 0 || n > 1440);
... but this skews the mean and standard deviation. In this case the mean would be slightly larger than 150. If only a small proportion of all values fall outside the range then it might not matter, depending on the application.

To avoid affecting the mean you could make sure to regenerate the numbers so that they stay on the same side of the mean...
1
2
3
4
5
6
7
8
9
int n = std::round(dist(rng));
if (n < 0)

	n = std::uniform_int_distribution<>(0, 150)(rng);
}
else if (n > 1440)
{
	n = std::uniform_int_distribution<>(150, 1440)(rng);
}
...but this will obviously affect the standard deviation. Note that the above is just one of many different ways to do it. Another way might affect the standard deviation differently.

I think you could also calculate how big proportion that falls outside the range at both ends and adjust the mean of the std::normal_distribution in a way that the do-while method gives you the intended mean but I'm not too good at this kind of math so I'm not sure. Perhaps you could adjust both the mean and the standard deviation so that the do-while method gives you both the mean and the standard deviation as intended but note that if you plotted the probability density function you might no longer get the peak of the graph where the mean is because the graph is no longer symmetrical like a normal distribution is.

If you're just doing a game or something you can probably just approximate it, with a std::normal_distribution as I did above, or some other non-exact method, and then test it and adjust it until you are satisfied. Otherwise, if you need it to be mathematically correct you probably need a better understanding of the math than I have, to understand what you really want, and to be able ensure that you get it right.
Last edited on Nov 21, 2022 at 11:42am
Nov 21, 2022 at 2:16pm
Worldtreeboy wrote:
Let's say I have I need to generate 100 numbers with minimum value of 0 and maximum value of 1440. The mean of the 100 numbers is 150 and the standard deviation is 25.


With a standard normal distribution - or indeed most distributions - almost every value will be within 3 sigma of the mean (even CERN only demands 5 sigma). With a standard deviation of 25 that would imply almost values between 75 and 225. You would almost get this WITH NO CLAMPING at all! However, the probability of values anywhere near 1440 would be vanishingly small.

Sure, I could rescale a truncated normal distribution for you to give the correct mean and standard deviation ... but with your intended standard deviation it would be to all intents and purposes the usual normal distribution.

Basically, YOUR STANDARD DEVIATION IS TOO SMALL FOR YOUR INTENDED RANGE.
Nov 21, 2022 at 3:28pm
Depends on definition of "most", but yes. I could not get as far as 75 or 225, let alone above 400. Not enough trials, obviously.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <iomanip>
#include <random>
#include <cmath>
#include <numeric>
#include <algorithm>

int main()
{
    std::random_device rd{};
    std::mt19937 gen{rd()};
    std::normal_distribution<> d{150,25};
    constexpr size_t N {100UL};
    size_t count {N};
    int data[N] {};
    while ( count ) {
        auto x = std::round(d(gen));
        if ( 0 <= x and x <= 1440 ) {
            --count;
            std::cout << std::setw(5) << x;
            data[count] = x;
            if ( count % 10 == 0 ) std::cout << '\n';
        }
        else std::cerr << "Outsider: " << x << '\n';
    }
    auto y = std::accumulate( data, data+N, 0 ) / static_cast<double>(N);
    auto z = std::minmax_element( data, data+N );
    std::cout << "Average: " << y << " min: " << *z.first << " max: " << *z.second << '\n';
}
Nov 22, 2022 at 2:23pm
keskiverto wrote:
Not enough trials, obviously.

I wrote the following program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <random>
#include <limits>

int main()
{
	std::mt19937 rng{std::random_device()()};
	std::normal_distribution<> dist(150, 25);
	
	double min = std::numeric_limits<double>::max();
	double max = std::numeric_limits<double>::lowest();
	
	long long count = 0;
	
	while (true)
	{ 
		++count;
		
		double n = dist(rng);
		
		bool print = false;
		if (n < min) { min = n; print = true; }
		if (n > max) { max = n; print = true; }
		if (print)
		{
			std::cout << "\n";
			std::cout << count << "\n";
			std::cout << "Min: " << min << "\n";
			std::cout << "Max: " << max << "\n";
		}
	}
}

I let it run for an hour and a half and got the following:

21038615854
Min: -13.0103
Max: 323.414
Last edited on Nov 22, 2022 at 2:26pm
Nov 22, 2022 at 4:47pm
Someone should derive the expected value of number of iterations before a number of (Mean +/- n) is produced, as a function of n.
Nov 22, 2022 at 8:20pm
Ganado wrote:
Someone should derive the expected value of number of iterations before a number of (Mean +/- n) is produced, as a function of n.


(For a normal distribution), number of expected trials before first distance (either side) of at least n from the mean is
1/erfc(n/sigma/sqrt(2))

Trouble is, the OP wants n=1440-150=1290 and sigma=25. So the expected number of trials is about
1/erfc(36.4867)
That isn't even computable in C++. It's about 64.67exp(1331.3) if you use the asymptotic expansion for erfc, and has about 581 digits. My calculator just gives "Math Error".
Last edited on Nov 22, 2022 at 8:51pm
Nov 22, 2022 at 9:33pm
Thanks :D That's hilariously large.
Nov 22, 2022 at 9:36pm
you need another approach. you need something that looks like a log-normal on the right but has a left side downward normal like shape. but as with the other thread, I think it would be an approximation to what he wants, not 100% matched.
Nov 22, 2022 at 9:46pm
"Lets say" were used. We (or someone) don't know the actual distribution that is desired.

Perhaps one of the piecewise distributions would be "close enough"?
Nov 22, 2022 at 9:59pm
yea what if you had a 95% chance at a normal around 130 or so, and 5% of ??? uniform 200-1400 or whatever the cutoff was? or whatever, uniform seems ok for so few values over such a range, but it could be anything.
Last edited on Nov 22, 2022 at 10:00pm
Nov 23, 2022 at 8:59am
See
https://en.wikipedia.org/wiki/Chebyshev%27s_inequality
particularly the bit about "sharpness of bounds".
Topic archived. No new replies allowed.