MAD (multiply addition devide) hashing q

MAD (multiply addition devide) hashing question

adam2016 (1529)

Hi guys,

so I've been reading this post - https://stackoverflow.com/questions/56209750/how-does-the-mad-multiply-add-divide-hashing-function-work for the past hour but can't for the life of me figure out what Tony Delroy is trying to say,

straight off from the get go when he says -

let's look at a*x + b in isolation first. If you imagine a broken down into a sum of powers of two, a*x is then the sum of x bit shifted left by a smattering of powers of two, such that each bit in x impacts other bit positions that are set in a, and some further bits when the summation produces carries at particular bits

this leaves me somewhat dumbfounded, what does he mean by smattering of powers? and breaking down 'a' into a sum of powers of two?

could someone try help me understand what he is trying to convey?

thanks!

dutch (2548)

The "smattering of powers" refers to the few bits that are set in a.
If a is 83 then in binary it is 1010011.
So it is the sum of

and multiplying each of these by x "shifts" x by it's bit position.
Suppose x is 1011010, then the shifted versions are

1011010000000
  10110100000
     10110100
      1011010

Last edited on

adam2016 (1529)

oh ok, so then a*x : assuming a = 83 and x = 90 ( 1011010 )

a * x = 1011010000000 + 10110100000 + 10110100 + 1011010 ?

but he never mentions or if he does mention I fail to see where, why is this important in terms of a compression map and hashing?

also not sure what he means by this

f say x has is a value between 0 and 255, with bits abcdefgh (each being 0 or 1), then so far we've got:

(a&1 ? abcdefgh : 0) +
(a&2 ? abcdefgh0 : 0) +
(a&4 ? abcdefgh00 : 0) +
(a&8 ? abcdefgh000 : 0) +
... + // continues for a&16, a&32 etc.
ABCDEFGHIJKLMNOP //however many random bits in "b"

Last edited on

adam2016 (1529)

but my main question is how does this multiplying adding and ONLY then doing a modulo operating help to uniformly distribute keys to buckets?

thanks

Last edited on

dutch (2548)

He seems to be saying that the multiplication can be seen as "smearing" x's bits out by shifting them the amounts given by the positions of bits in a. When these various shiftings of x's bits are added together, they get smooshed (I believe that's the techincal word) together, along with carries. This can be seen (with crossed fingers) as kind of randomizing the bits. Adding in b then helps to twiddle some of the lower bits which are not as affected by the prior operation.

The operation x_next = (a * x_curr + b) % m is used for "linear congruential" pseudo-random number generators. Selecting good values for a, b and m is an art, though.

adam2016 (1529)

that makes sense, so the end goal is really to randomise the bits as much as possible?

maybe a stupid question but how would this give a better chance of uniformly distributing the keys?

thanks Dutch

dutch (2548)

how would this give a better chance

Better than what?

adam2016 (1529)

how would this MAD method allow for distributing the keys to buckets more uniformly?

dutch (2548)

"More uniformly" than what?

adam2016 (1529)

then just using the modulo operator?

for example hash(key) = (key % primeNumber) % arraySize

why the need to use the MAD method instead of just the simple one above, how will the MAD method ensure that keys are uniformly distributed across the buckets?

dutch (2548)

If your keys themselves are uniformly distributed then you have a chance of doing pretty well with just return key % arraySize (no need for 'primeNumber').

I think the main idea is that real-life keys are often not uniformly distributed and some kind of reasonable mangling of the bits can do much better.

jonnin (11437)

Typically you want to study how your hash function behaves on a fairly large sized sample of your data. If it is clustering (many collisions, or lots of data in some areas and none in others), use another function. If its working, leave it alone. Its simply not easy to predict how it will do on real data for your project without running some samples.
In general it is difficult for most normal, real world data to hash well using very simple algorithms. It can happen, but the norm is that you need something a little more complicated. The more records you have, and the more similar the keys (part of the data that you hash) are to each other, the more exotic you have to get.

adam2016 (1529)

makes sense, so it's a science within itself

thanks guys

Topic archived. No new replies allowed.

MAD (multiply addition devide) hashing question

C++

Forum