I think the SIMD shuffle fucntion is not real shuffle for int32_t case the left and right part would be shuffled separately.
I want a real shuffle function as following:
Assumed we got __m256i and we want to shuffle 8 int32_t.
1 2 3 4 5
|
__m256i to_shuffle = {17, 18, 20, 21, 25, 26, 29, 31};
const int imm8 = 0b10101100;
__m256i shuffled _mm256_shuffle(to_shuffle, imm8);
| |
I hope the shuffled = {17, 20, 25, 26, -, -, -, -}, where the - represents the not relevant value. So I hope the int at the position with set bit with 1 would be placed in shuffled. (In our case: 17, 20, 25, 26 are sitting at the possition with set bit 1 in imm8).
Is such function offered by the Intel? How could such fucntion be implemented efficiently?