c - Interleave two 64-bit NEON vectors? -

- September 15, 2014

i'm working on port of sse2 neon. sse code performs following:

int64x2_t a, b, c, d; ... = interleave_high64(b, interleave_low64(c, d));

and performs following in place of _mm_unpackhi_epi64 , _mm_unpacklo_epi64:

static inline int64x2_t interleave_low64(int64x2_t a, int64x2_t b) {     const int64x2x2_t result = vzip_s64(vget_low_s64(a), vget_low_s64(b));     return vcombine_s64(result.val[0], result.val[1]); } static inline int64x2_t interleave_high64(int64x2_t a, int64x2_t b) {     const int64x2x2_t result = vzip_s64(vget_high_s64(a), vget_high_s64(b));     return vcombine_s64(result.val[0], result.val[1]); }

my first, immediate question is, why vzip_s64 missing (though vzip_s32 , vzip_s16 are available). or maybe, should use in stead?

i'm guessing there's bigger pattern @ hand, , might use vstr.2 inteleaved store. second question is, should doing instead of 3 or 4 neon intrinsics?

Search This Blog

Shell

c - Interleave two 64-bit NEON vectors? -

Comments

Post a Comment

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -