c - Interleave two 64-bit NEON vectors? -
i'm working on port of sse2 neon. sse code performs following:
int64x2_t a, b, c, d; ... = interleave_high64(b, interleave_low64(c, d)); and performs following in place of _mm_unpackhi_epi64 , _mm_unpacklo_epi64:
static inline int64x2_t interleave_low64(int64x2_t a, int64x2_t b) { const int64x2x2_t result = vzip_s64(vget_low_s64(a), vget_low_s64(b)); return vcombine_s64(result.val[0], result.val[1]); } static inline int64x2_t interleave_high64(int64x2_t a, int64x2_t b) { const int64x2x2_t result = vzip_s64(vget_high_s64(a), vget_high_s64(b)); return vcombine_s64(result.val[0], result.val[1]); } my first, immediate question is, why vzip_s64 missing (though vzip_s32 , vzip_s16 are available). or maybe, should use in stead?
i'm guessing there's bigger pattern @ hand, , might use vstr.2 inteleaved store. second question is, should doing instead of 3 or 4 neon intrinsics?
Comments
Post a Comment