c - Interleave two 64-bit NEON vectors? -
i'm working on port of sse2 neon. sse code performs following:
int64x2_t a, b, c, d; ... = interleave_high64(b, interleave_low64(c, d));
and performs following in place of _mm_unpackhi_epi64
, _mm_unpacklo_epi64
:
static inline int64x2_t interleave_low64(int64x2_t a, int64x2_t b) { const int64x2x2_t result = vzip_s64(vget_low_s64(a), vget_low_s64(b)); return vcombine_s64(result.val[0], result.val[1]); } static inline int64x2_t interleave_high64(int64x2_t a, int64x2_t b) { const int64x2x2_t result = vzip_s64(vget_high_s64(a), vget_high_s64(b)); return vcombine_s64(result.val[0], result.val[1]); }
my first, immediate question is, why vzip_s64
missing (though vzip_s32
, vzip_s16
are available). or maybe, should use in stead?
i'm guessing there's bigger pattern @ hand, , might use vstr.2
inteleaved store. second question is, should doing instead of 3 or 4 neon intrinsics?
Comments
Post a Comment