let's look at a*x + b in isolation first. If you imagine a broken down into a sum of powers of two, a*x is then the sum of x bit shifted left by a smattering of powers of two, such that each bit in x impacts other bit positions that are set in a, and some further bits when the summation produces carries at particular bits
this leaves me somewhat dumbfounded, what does he mean by smattering of powers? and breaking down 'a' into a sum of powers of two?
could someone try help me understand what he is trying to convey?
He seems to be saying that the multiplication can be seen as "smearing" x's bits out by shifting them the amounts given by the positions of bits in a. When these various shiftings of x's bits are added together, they get smooshed (I believe that's the techincal word) together, along with carries. This can be seen (with crossed fingers) as kind of randomizing the bits. Adding in b then helps to twiddle some of the lower bits which are not as affected by the prior operation.
The operation x_next = (a * x_curr + b) % m is used for "linear congruential" pseudo-random number generators. Selecting good values for a, b and m is an art, though.
for example hash(key) = (key % primeNumber) % arraySize
why the need to use the MAD method instead of just the simple one above, how will the MAD method ensure that keys are uniformly distributed across the buckets?
If your keys themselves are uniformly distributed then you have a chance of doing pretty well with just return key % arraySize (no need for 'primeNumber').
I think the main idea is that real-life keys are often not uniformly distributed and some kind of reasonable mangling of the bits can do much better.
Typically you want to study how your hash function behaves on a fairly large sized sample of your data. If it is clustering (many collisions, or lots of data in some areas and none in others), use another function. If its working, leave it alone. Its simply not easy to predict how it will do on real data for your project without running some samples.
In general it is difficult for most normal, real world data to hash well using very simple algorithms. It can happen, but the norm is that you need something a little more complicated. The more records you have, and the more similar the keys (part of the data that you hash) are to each other, the more exotic you have to get.