Hello! In the code below, I'm accepting some buffer (1-byte aligned) and swapping bytes as if it's an array of int16s. (Full code has equivalent methods for 32 and 64 bit types.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
#if defined(__GNUC__) || defined(__clang__)
#define BSWAP_INTRINSIC_2(x) __builtin_bswap16(x)
#elif defined(__linux__)
#include <byteswap.h>
#define BSWAP_INTRINSIC_2(x) bswap_16(x)
#elif defined(_MSC_VER)
#include <intrin.h>
#define BSWAP_INTRINSIC_2(x) _byteswap_ushort(x);
#else
#define BSWAP_INTRINSIC_2(x) (x << 8) | (x >> 8)
#endif
char* data; // some char buffer
// For debugging:
int alignment = reinterpret_cast<uintptr_t>(data) % alignof(uint16_t);
std::cout << "aligned? " << alignment << "\n";
uint16_t* data16 = reinterpret_cast<uint16_t*>(data);
for (size_t i = 0; i < length_of_data16; i++) {
data16[i] = BSWAP_INTRINSIC_2(data16[i]);
}
| |
Sometimes, the `data16` pointer is not 2-byte aligned. Nonetheless, all of the possible definitions of BSWAP_INTRINSIC_2 work (i.e. MSVC, g++, clang; byteswap.h's bswap_16; and the bit-shifting fall through). In the assembly I see the compilers emitting MOVDQU, so they clearly identify that this might not be aligned and they're handling it. I also don't see instructions that are specific to 16-bit types.
1. As I understand, it's technically undefined behavior to cast a char* to a uint16_t* without verifying alignment. Is that true, or would it only be undefined behavior to do something based on the pointer being a 16-bit type after the cast?
2. Because this code works, how bad is it to use it?
3. Is there a way to make this legal without using memcpy/memmove/etc. (memory impact is too high for big buffers) and while still making use of the fast SSE instructions that are currently being used (e.g. PSHUFB)?
Thanks!