c - Interleave two 64-bit NEON vectors? -


i'm working on port of sse2 neon. sse code performs following:

int64x2_t a, b, c, d; ... = interleave_high64(b, interleave_low64(c, d)); 

and performs following in place of _mm_unpackhi_epi64 , _mm_unpacklo_epi64:

static inline int64x2_t interleave_low64(int64x2_t a, int64x2_t b) {     const int64x2x2_t result = vzip_s64(vget_low_s64(a), vget_low_s64(b));     return vcombine_s64(result.val[0], result.val[1]); } static inline int64x2_t interleave_high64(int64x2_t a, int64x2_t b) {     const int64x2x2_t result = vzip_s64(vget_high_s64(a), vget_high_s64(b));     return vcombine_s64(result.val[0], result.val[1]); } 

my first, immediate question is, why vzip_s64 missing (though vzip_s32 , vzip_s16 are available). or maybe, should use in stead?

i'm guessing there's bigger pattern @ hand, , might use vstr.2 inteleaved store. second question is, should doing instead of 3 or 4 neon intrinsics?


Comments

Popular posts from this blog

wireshark - USB mapping with python -

c++ - nodejs socket.io closes connection before upgrading to websocket -

Deploying Qt Application on Android is really slow? -