assembly - New AVX-instructions syntax -


i had c code written intel-intrinsincs. after compiled first avx , ssse3 flags, got 2 quite different assembly codes. e.g:

avx:

vpunpckhbw  %xmm0, %xmm1, %xmm2  

ssse3:

movdqa %xmm0, %xmm2 punpckhbw %xmm1, %xmm2 

it's clear vpunpckhbw punpckhbw using avx 3 operand syntax. latency , throughput of first instruction equivalent latency , throughput of last ones combined? or answer depend on architecture i'm using? it's intelcore i5-6500 way.

i tried search answer in agner fog's instruction tables couldn't find answer. intel specifications didn't (however, it's missed 1 needed).

is better use new avx syntax if possible?

is better use new avx syntax if possible?

i think first question ask if folder instructions better non-folder instruction pair. folding takes pair of read , modify instructions this

vmovdqa %xmm0, %xmm2 vpunpckhbw %xmm2, %xmm1, %xmm1 

and "folds" them 1 combined instruction

vpunpckhbw  %xmm0, %xmm1, %xmm2 

since ivy bridge register register move instruction can have 0 latency , can use 0 execution ports. however, unfolded instruction pair still counts 2 instructions on front-end , therefore can affect overall throughput. folded instruction counts 1 instruction in front-end lowers pressure on front-end without side effects. increase overall throughput.

however, memory register moves folding can may have side effect (there some debate this) if lowers pressure on front-end. reason out-of-order engine front-ends point of view sees folded instruction (assuming this answer correct) , if reason more optimal reorder memory read operation (since require execution ports , has latency) independently other operations in folded instruction out-of-order engine won't able take advantage of this. observed first time here.

for particular operation avx syntax better since folds register register move. however, if had memory register move folder avx instruction perform worse unfolded sse instruction pair in cases.


note that, in general, should still better use vex-encoded instructions. think compilers, if not all, assume folding better have no way control folding except assembly (not intrinsics) or in cases telling compiler not compile avx.


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -