Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20...

31
1 Διασωλήνωση - Pipelining Pedro Trancoso Appendix A (και διαφάνειες από Prof. David Culler, Berkeley) Βιομηχανία Αυτοκινήτων

Transcript of Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20...

Page 1: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

1

Διασωλήνωση - Pipelining

Pedro Trancoso Appendix A

(και διαφάνειες από Prof. David Culler, Berkeley)

Βιοµηχανία Αυτοκινήτων

Page 2: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

2

Βιοµηχανία Αυτοκινήτων

Βιοµηχανία Αυτοκινήτων

Page 3: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

3

Βιοµηχανία Αυτοκινήτων

Βιοµηχανία Αυτοκινήτων

1

Page 4: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

4

Βιοµηχανία Αυτοκινήτων

2 1

Βιοµηχανία Αυτοκινήτων

3 2 1

Page 5: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

5

Βιοµηχανία Αυτοκινήτων

4 3 2 1

Βιοµηχανία Αυτοκινήτων

5 4 3 2

Page 6: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

6

Ας πλύνουµε τα ρούχα! Σειριακή Μεθόδος

n  Sequential laundry takes 6 hours for 4 loads n  If they learned pipelining, how long would laundry take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

T a s k O r d e r

Time

Χρησιµοποιώντας Διασωλήνωση...

n  Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

T a s k O r d e r

Time

30 40 40 40 40 20

Page 7: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

7

Μάθαµε ότι... n  Pipelining doesn’t help latency (χρόνος αναµονής) of single task, it helps throughput (ρυθµοαπόδοση) of entire workload

n  Pipeline rate limited by slowest pipeline stage

n  Multiple tasks operating simultaneously

n  Potential speedup = Number pipe stages

n  Unbalanced lengths of pipe stages reduces speedup

n  Time to “fill” pipeline and time to “drain” it reduces speedup

A

B

C

D

6 PM 7 8 9

T a s k O r d e r

Time

30 40 40 40 40 20

Διασωλήνωση Εντολών n  Execute billions of instructions, so throughput is

what matters n  What is desirable in the ISA (ΑΣΕ) for pipelining?

n  Variable length instructions vs. all instructions same length?

n  Memory operands part of any operation vs. memory operands only in loads or stores?

n  Register operand (τελεστέος) many places in instruction format vs. registers located in same place?

Page 8: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

8

Κύκλος Εκτέλεσης Instruction

Fetch

Instruction Decode

Operand Fetch

Execute

Result Store

Next Instruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor instruction

Στάδια Διασωλήνωσης του DLX (1) 1.  Instruction Fetch (IF) 2.  Instruction Decode / Register Fetch (ID) 3.  Execution / Effective Address (EX) 4.  Memory Access / Branch Completion

(MEM) 5.  Write-back (WB) (go to 1!)

Page 9: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

9

Στάδια Διασωλήνωσης του DLX (2) 1.  Instruction Fetch (IF)

IR ! Mem[PC] NPC ! PC+4

2.  Instruction Decode / Register Fetch (ID) A ! Regs[IR6..10] B ! Regs[IR11..15] Imm ! ((IR16)^16##IR16..31)

3.  Execution / Effective Address (EX) Mem ref: ALU output ! A+Imm Reg-reg (ALUop): ALUoutput ! A op B Reg-imm (ALU op): ALU output ! A op Imm Branch: ALU output ! NPC+Imm

cond ! (A op 0)

Στάδια Διασωλήνωσης του DLX (3) 4.  Memory Access / Branch Completion (MEM)

Mem access: LMD ! Mem[ALU output] Mem[ALU output] ! B

Branch: if (cond) PC ! ALU output else PC ! NPC

5.  Write-back (WB) Reg-reg ALU instr: Regs[IR16..20] ! ALU output Reg-imm ALU instr: Regs[IR11..15] ! ALU output Load instr: Regs[IR11..15] ! LMD

(go to 1!)

Page 10: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

10

Διασωλήνωση του DLX Cycle number 1

Instruction I IF

Instruction I+1

Instruction I+2

Instruction I+3

Instruction I+4

DLX Pipeline Cycle number 1 2

Instruction I IF ID

Instruction I+1 IF

Instruction I+2

Instruction I+3

Instruction I+4

Page 11: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

11

DLX Pipeline Cycle number 1 2 3

Instruction I IF ID EX

Instruction I+1 IF ID

Instruction I+2 IF

Instruction I+3

Instruction I+4

DLX Pipeline Cycle number 1 2 3 4

Instruction I IF ID EX MEM

Instruction I+1 IF ID EX

Instruction I+2 IF ID

Instruction I+3 IF

Instruction I+4

Page 12: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

12

DLX Pipeline Cycle number 1 2 3 4 5

Instruction I IF ID EX MEM WB

Instruction I+1 IF ID EX MEM

Instruction I+2 IF ID EX

Instruction I+3 IF ID

Instruction I+4 IF

DLX Pipeline Cycle number 1 2 3 4 5 6

Instruction I IF ID EX MEM WB

Instruction I+1 IF ID EX MEM WB

Instruction I+2 IF ID EX MEM

Instruction I+3 IF ID EX

Instruction I+4 IF ID

Page 13: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

13

DLX Pipeline Cycle number 1 2 3 4 5 6 7 8 9

Instruction I IF ID EX MEM WB

Instruction I+1 IF ID EX MEM WB

Instruction I+2 IF ID EX MEM WB

Instruction I+3 IF ID EX MEM WB

Instruction I+4 IF ID EX MEM WB

Παράδειγµα: MIPS

Op 31 26 0 15 16 20 21 25

Rs1 Rd immediate

Op 31 26 0 25

Op 31 26 0 15 16 20 21 25

Rs1 Rs2

target

Rd Opx

Register-Register 5 6 10 11

Register-Immediate

Op 31 26 0 15 16 20 21 25

Rs1 Rs2/Opx immediate

Branch

Jump / Call

Page 14: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

14

5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

L M D

ALU

MU

X

Mem

ory

Reg File

MU

X MU

X

Data

Mem

ory

MU

X

Sign Extend

4

Adder

Zero?

Next SEQ PC

Address

Next PC

WB Data

Inst

RD

RS1 RS2

Imm

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

ALU

Mem

ory

Reg File

MU

X MU

X

M

emory

MU

X

Sign Extend

Zero?

IF/ID

ID/EX

MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

Next PC

Address

RS1 RS2

Imm

MU

X

Datapath

Control Path

5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS

Page 15: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

15

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

ALU

Mem

ory

Reg File

MU

X MU

X

Mem

ory

MU

X

Sign Extend

Zero?

IF/ID

ID/EX

MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

Next PC

Address

RS1 RS2

Imm

MU

X

Datapath

Control Path

Inst

1

Inst

1

Inst

2

Inst

1

Inst

2

Inst

3

5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS

Εκτέλεση µε Διασωλήνωση

I n s t r. O r d e r

Time (clock cycles)

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

Page 16: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

16

Όρια της Διασωλήνωσης

n  Hazards (Κίνδυνοι): circumstances that would cause incorrect execution if next instruction were launched n  Structural hazards (κίνδυνος δοµής): Attempting to

use the same hardware to do two different things at the same time

n  Data hazards (κίνδυνος δεδοµένων): Instruction depends on result of prior instruction still in the pipeline

n  Control hazards (κίνδυνος ελέγχου): Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

Παράδειγµα: µια πόρτα µνήµης

I n s t r. O r d e r

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

DMem

Structural Hazard

Page 17: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

17

Λύσεις για κίνδυνους δοµής n  Defn: attempt to use same hardware for

two different things at the same time n  Solution 1: Wait

⇒  must detect the hazard ⇒  must have mechanism to stall

n  Solution 2: Throw more hardware at the problem

Εύρεση και λύση ενός κινδύνου δοµής

I n s t r. O r d e r

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

Reg ALU

DMem Ifetch Reg

Bubble Bubble Bubble Bubble Bubble

Page 18: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

18

Λύση των Κίνδυνων Δοµών στο σχεδιασµό

ALU

Instr Cache

Reg File

MU

X MU

X

Data

Cache

MU

X

Sign Extend

Zero?

IF/ID

ID/EX

MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

Next PC

Address

RS1 RS2

Imm

MU

X

Datapath

Control Path

Σηµασία του ΑΣΕ στην λύση των Κινδύνων Δοµής n  Simple to determine the sequence of

resources used by an instruction n  opcode tells it all

n  Uniformity in the resource usage n  Compare MIPS to IA32? n  MIPS approach => all instructions flow

through same 5-stage pipeling

Page 19: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

19

I n s t r. O r d e r

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9 xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Κίνδυνοι Δεδοµένων Time (clock cycles)

IF ID/RF EX MEM WB

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

n  Read After Write (RAW) (Διάβασµα µετά από γράψιµο) InstrJ tries to read operand before InstrI writes it

Caused by a “Data Dependence” (εξάρτηση δεδοµένων). This hazard results from an actual need for communication.

Τρεις Είδους Κίνδυνοι Δεδοµένων

I: add r1,r2,r3 J: sub r4,r1,r3

Page 20: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

20

n  Write After Read (WAR) (Γράψιµο µετά από διάβασµα) InstrJ writes operand before InstrI reads it

n  Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”.

n  Can it happen in the MIPS 5 stage pipeline? n  Can’t happen in MIPS 5 stage pipeline because:

n  All instructions take 5 stages, and n  Reads are always in stage 2, and n  Writes are always in stage 5

I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7

Τρεις Είδους Κίνδυνοι Δεδοµένων

n  Write After Write (WAW) (Γράψιµο µετά από γράψιµο) InstrJ writes operand before InstrI writes it.

n  Called an “output dependence” (εξάρτηση εξόδου) by compiler writers. This also results from the reuse of name “r1”.

n  Can it happen in the MIPS 5 stage pipeline? n  Can’t happen in MIPS 5 stage pipeline because:

n  All instructions take 5 stages, and n  Writes are always in stage 5

n  Will see WAR and WAW in later more complicated pipes

I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

Τρεις Είδους Κίνδυνοι Δεδοµένων

Page 21: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

21

Time (clock cycles)

Μεταβίβαση (Forwarding) για αποφυγή Κινδύνου Δεδοµένων

I n s t

r.

O r d e r

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Αλλαγές στο υλικό για Forwarding

MEM

/WR

ID/EX

EX/M

EM

Data Memory

ALU

mux

mux

Registers

NextPC

Immediate

mux

Page 22: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

22

Time (clock cycles)

I n s t r. O r d e r

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

Κίνδυνοι Δεδοµένων και µε Forwarding

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Λήξη του κινδύνου της εντολής φόρτωσης

n  Adding hardware? ... not n  Detection? n  Compilation techniques?

n  What is the cost of load delays?

Page 23: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

23

Time (clock cycles)

or r8,r1,r9

I n s t r. O r d e r

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

Reg ALU

DMem Ifetch Reg

Reg Ifetch ALU

DMem Reg Bubble

Ifetch ALU

DMem Reg Bubble Reg

Ifetch ALU

DMem Bubble Reg

How is this different from the instruction issue stall?

Λήξη του κινδύνου της εντολής φόρτωσης

Χρονοδροµολόγηση Λογισµικού για αποφυγή κίνδυνου δεδοµένων Software Scheduling to Avoid Load Hazards

Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

Try producing fast code for a = b + c; d = e – f;

assuming a, b, c, d ,e, and f in memory. Slow code:

LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd

STALL

STALL

Page 24: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

24

Σχέση µε τη ΕΣΑ n  What is exposed about this organizational

hazard in the instruction set? n  k cycle delay?

n  bad, CPI is not part of ISA n  k instruction slot delay

n  load should not be followed by use of the value in the next k instructions

n  Nothing, but code can reduce run-time delays n  MIPS did the transformation in the assembler

Κίνδυνος Ελέγχου από εντολές διακλάδωσης (Branches) => Three Stage Stall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Page 25: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

25

Παράδειγµα: Branch Stall Impact

n  If 30% branch, Stall 3 cycles significant (CPI=?) n  Two part solution:

n  Determine branch taken or not sooner, AND n  Compute taken branch address earlier

n  MIPS branch tests if register = 0 or ≠ 0 n  MIPS Solution:

n  Move Zero test to ID/RF stage n  Adder to calculate new PC in ID/RF stage n  1 clock cycle penalty for branch versus 3

Adder

IF/ID

Διασωλήνωση της διόδου δεδοµένων του MIPS

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

ALU

Mem

ory

Reg File

MU

X

Data

Mem

ory

MU

X

Sign Extend

Zero?

MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC

RD RD RD WB

Dat

a

•  Data stationary control –  local decode for each instruction phase / pipeline stage

Next PC

Address

RS1 RS2

Imm

MU

X

ID/EX

Page 26: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

26

Τέσσερις Επιλογές για Κίνδυνους Ελέγχου

#1: Stall until branch direction is clear #2: Predict Branch Not Taken

n  Execute successor instructions in sequence n  “Squash” instructions in pipeline if branch actually taken n  Advantage of late pipeline state update n  47% MIPS branches not taken on average (CPI=?) n  PC+4 already calculated, so use it to get next instruction

#3: Predict Branch Taken n  53% MIPS branches taken on average n  But haven’t calculated branch target address in MIPS

n  MIPS still incurs 1 cycle branch penalty n  Other machines: branch target known before outcome

#4: Delayed Branch n  Define branch to take place AFTER a following

instruction

branch instruction sequential successor1 sequential successor2 ........ sequential successorn

........ branch target if taken

n  1 slot delay allows proper decision and branch target address in 5 stage pipeline

n  MIPS uses this

Branch delay of length n

Τέσσερις Επιλογές για Κίνδυνους Ελέγχου

Page 27: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

27

Καθυστερηµένη Εντολή Διακλάδωσης (Delayed Branch)

n  Where to get instructions to fill branch delay slot? n  Before branch instruction n  From the target address: only valuable when branch taken n  From fall through: only valuable when branch not taken n  Canceling branches allow more slots to be filled

n  Compiler effectiveness for single branch delay slot: n  Fills about 60% of branch delay slots n  About 80% of instructions executed in branch delay slots

useful in computation n  About 50% (60% x 80%) of slots usefully filled

n  Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)

Καθυστερηµένη Εντολή Διακλάδωσης (Delayed Branch)

Page 28: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

28

Recall: Speed Up συνάρτηση για διασωλήνωση

pipelined

dunpipeline

TimeCycle TimeCycle

CPI stall Pipeline CPI Ideal

depth Pipeline CPI Ideal Speedup ×+×

=

pipelined

dunpipeline

TimeCycle TimeCycle

CPI stall Pipeline 1

depth Pipeline Speedup ×+

=

Instper cycles Stall Average CPI Ideal CPIpipelined +=

For simple RISC pipeline, CPI = 1:

Παράδειγµα: Αξιολόγηση της Διαφορετικής Υλοποίησης της Εντολής Διακλάδωσης

Assume: Conditional & Unconditional = 14%, 65% change PC Scheduling Branch CPI speedup v. scheme penalty stall

Stall pipeline 3 1.42 1.0 Predict taken 1 1.14 1.26 Predict not taken 1 1.09 1.29 Delayed branch 0.5 1.07 1.31

Pipeline speedup = Pipeline depth1 +Branch frequency×Branch penalty

Page 29: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

29

Άλλες Δυσκολίες... n  Χρόνος αναµονής της κρυφής µνήµης n  Διακοπές και Εξαιρέσεις (Interrupts and

Exceptions) n  ΑΣΕ (π.χ. ADD3 42(R1),56(R1)+,@(R1)) n  Λειτουργίες Πολύ-κύκλων (Multicycle

Operations)

Σχέση µεταξύ Κρυφής Μνήµης και Διασωλήνωσης

WB

Dat

a

Adder

IF/ID

ALU

Mem

ory

Reg File

MU

X

Data

Mem

ory

MU

X Sign Extend

Zero? MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC

RD RD RD

Next PC

Address

RS1 RS2

Imm

MU

X

ID/EX

I-$ D-$

Memory

Page 30: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

30

Λειτουργίες Πολύ-κύκλων

Λειτουργίες Πολύ-κύκλων

Page 31: Διασωλήνωση - Pipelining · C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 Διασωλήνωση Εντολών! Execute billions of instructions, so throughput

31

Παράδειγµα: MIPS R4000

Στάδια (8): IF – IS – RF – EX – DF – DS – TC - WB

Άλλα Παραδείγµατα n  PowerPC G4 (4):

n  PowerPC G4e (7):

n  Pentium 4 (20):

FETCH DECODE / DISPATCH

EXECUTE COMPLETE / WRITE-BACK

FETCH FETCH DECODE / DISP

ISSUE EXE COMPL WRITE - BACK

TCNIP

TCNIP

TCF

TCF

DRIVE

A&R

A&R

A&R

Q SCHE

SCHE

SCHE

DISP

DISP

REG

REG

EXE

FLAGS

BRAN

DRIVE