Hello,
I am working on a project where we need to compute the MD5 hash of a large block of memory (e.g. several GB of data in the RAM) on a regular basis to make sure that the data hasn't changed. This is to protect against "random" memory errors, so MD5 should be fine as "checksum" here; probably a little better than CRC64.
Anyway, the "naive" approach would be to just call
md5_update() once, with the base address and the size of the whole memory block. However, I found this to be rather slow. After a lot of testing, I figured out that "loading" small chunks (e.g. 8 byte) of the big memory block into a local variable, via
memcpy() function, and then passing them to the
md5_update() function chunk-by-chunk is about
2.5 times faster!
Of course I could just be happy with the speed-up. But I would like to understand why
memcpy() is
faster here, even though it has the "overhead" of
copying all the data as well as the "overhead" of many more small-size
md5_update() invocations. I thought this may be related to
memcpy() doing some "smart" prefetching. But
manually passing small chunks from the memory and calling
_mm_prefetch()
on each chunk before passing it to
md5_update() did
not make things faster at all. It appears explicit
memcpy() is needed to get the speed-up.
"Naive" (slow) version:
1 2 3 4 5 6 7
|
static void test(const uint8_t *const data, const size_t size, uint8_t *const digest)
{
md5_ctx ctx;
md5_init(&ctx);
md5_update(&ctx, data, size);
md5_final(&ctx, digest);
}
| |
"Fast" version with
memcpy():
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
static void test2(const uint8_t *const data, const size_t size, uint8_t* const digest)
{
const uint8_t* p;
uint64_t temp;
md5_ctx ctx;
const uint8_t* const end = data + size;
md5_init(&ctx);
for (p = data; p < end; p += sizeof(uint64_t))
{
memcpy(&temp, p, sizeof(uint64_t));
md5_update(&ctx, (const uint8_t*)&temp, sizeof(uint64_t));
}
md5_final(&ctx, digest);
}
| |
Explicit pre-fetching,
not faster at all:
1 2 3 4 5 6 7 8 9 10 11 12 13
|
static void test3(const uint8_t *const data, const size_t size, uint8_t* const digest)
{
const uint8_t* p;
md5_ctx ctx;
const uint8_t* const end = data + size;
md5_init(&ctx);
for (p = data; p < end; p += sizeof(uint64_t))
{
_mm_prefetch(p, _MM_HINT_T0);
md5_update(&ctx, p, sizeof(uint64_t));
}
md5_final(&ctx, digest);
}
| |
Full example code see here:
https://www.mediafire.com/file/4kc5oydui3kygtd/memcpy-test-v2.zip/file
I have posted this under "Windows Programming" because this is a Visual Studio 2019 project, currently running on Windows 11. But I have also tested on Windows 8.1 with similar result.