assembly - New AVX-instructions syntax -

- August 15, 2013

i had c code written intel-intrinsincs. after compiled first avx , ssse3 flags, got 2 quite different assembly codes. e.g:

avx:

vpunpckhbw  %xmm0, %xmm1, %xmm2

ssse3:

movdqa %xmm0, %xmm2 punpckhbw %xmm1, %xmm2

it's clear vpunpckhbw punpckhbw using avx 3 operand syntax. latency , throughput of first instruction equivalent latency , throughput of last ones combined? or answer depend on architecture i'm using? it's intelcore i5-6500 way.

i tried search answer in agner fog's instruction tables couldn't find answer. intel specifications didn't (however, it's missed 1 needed).

is better use new avx syntax if possible?

is better use new avx syntax if possible?

i think first question ask if folder instructions better non-folder instruction pair. folding takes pair of read , modify instructions this

vmovdqa %xmm0, %xmm2 vpunpckhbw %xmm2, %xmm1, %xmm1

and "folds" them 1 combined instruction

vpunpckhbw  %xmm0, %xmm1, %xmm2

since ivy bridge register register move instruction can have 0 latency , can use 0 execution ports. however, unfolded instruction pair still counts 2 instructions on front-end , therefore can affect overall throughput. folded instruction counts 1 instruction in front-end lowers pressure on front-end without side effects. increase overall throughput.

however, memory register moves folding ~~can~~ may have side effect (there some debate this) if lowers pressure on front-end. reason out-of-order engine front-ends point of view sees folded instruction (assuming this answer correct) , if reason more optimal reorder memory read operation (since require execution ports , has latency) independently other operations in folded instruction out-of-order engine won't able take advantage of this. observed first time here.

for particular operation avx syntax better since folds register register move. however, if had memory register move folder avx instruction perform worse unfolded sse instruction pair in cases.

note that, in general, should still better use vex-encoded instructions. think compilers, if not all, assume folding better have no way control folding except assembly (not intrinsics) or in cases telling compiler not compile avx.

Search This Blog

Prevent

assembly - New AVX-instructions syntax -

Comments

Post a Comment

Popular posts from this blog

github - Git errors while pushing -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

Unity3d perpendicular vector3 -