Google Compiler Research C. Courbet, B. De Backer, O...

14
Confidential + Proprietary Confidential + Proprietary Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet, B. De Backer, O. Sykora Google Compiler Research

Transcript of Google Compiler Research C. Courbet, B. De Backer, O...

Page 1: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + ProprietaryConfidential + Proprietary

Measuring instruction latencies with llvmGuillaume ChateletC. Courbet, B. De Backer, O. Sykora

Google Compiler Research

Page 2: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Why?

● Scheduling needs latencies and μOp decomposition

○ This talk is about latency measurement only

● Vendors release some information

○ May be incomplete / not be in a machine readable format

● Updating LLVM td files

○ is tedious / requires careful guesswork and analysis.

● Consequences

○ scheduling information is incomplete for most X86 models

2

Page 3: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

How it works

∀ processor, ∀ instruction:

start_measure

.rept 10000

add rax, rax

.endr

end_measure

3

Page 4: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

How it works - actually subtler than this...

∀ processor, ∀ instruction:

start_measure

.rept 10000

andn eax, ebx, edx # processor can execute these in parallel

.endr

end_measure

● We need a way to make the execution sequential

4

Page 5: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

MCInst in LLVM

Explicit inputs(e.g. GR16)

Implicit input(e.g. EFLAGS)

Explicit output

Implicit output

5

Page 6: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Sequential execution: Create Dependency

Current instruction must use an output of previous instruction6

Page 7: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Implicit self cycle

Possible cycle: Possible instance:AAA

7

Page 8: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Implicit self cycle - through register aliasing

Possible cycle: Possible instance:AAA

8

Page 9: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Possible instance:AND32ri EAX, EAX, 1

Possible cycle:

Possible explicit self cycle

9

Page 10: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Possible instance:MMX_PMOVMSKBrr R10D, MM1MMX_MOVD64rr MM1, R10D

Possible cycle through another instruction

Possible cycle:

10

Page 11: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Possible instance:VCMPPSZ256rri K5, YMM31, YMM31, 1VFMSUBADD213PDZrk ZMM31, ZMM25, K5, ZMM29, ZMM9

Keep in mind:This process is fully automated

Possible cycle through another instruction

Possible cycle:

11

Page 12: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Results

> llvm-exegesis -opcode-name IMUL16rri8 -benchmark-mode latency---asm_template: name: latency IMUL16rri8cpu_name: sandybridgellvm_triple: x86_64-grtev4-linux-gnunum_repetitions: 10000measurements: - { key: latency, value: 4.0115, debug_string: '' }error: ''...

● Identified discrepancies between TD files and measurements

12

Page 13: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

What's next?

● Extend to memory operands

● Automate fixing of TD files

● Measure the effect of

○ immediate: ±0, 1, ~1, 28,16,32,64, ±∞, nan, denorm

○ register values: SUB EAX, EAX, EAX vs SUB EAX, EAX, EBX

● Make it work on other CPUs (ARM under way, Power?)

13

Page 14: Google Compiler Research C. Courbet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Chatelet-Measuring...Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet,

Confidential + Proprietary

Try It Out!

https://llvm.org/docs/CommandGuide/llvm-exegesis.html

14