Δυναμική Δρομολόγηση Εντολών (Dynamic Scheduling)
Embed Size (px)
description
Transcript of Δυναμική Δρομολόγηση Εντολών (Dynamic Scheduling)
-
(Dynamic Scheduling)
-
pipelinePipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control StallsIdeal pipeline CPI: pipelineStructural hazards: Data hazards: , pipelineControl hazards: (branches,jumps)
-
CPIPipeline CPI =
Ideal pipeline CPI +
Structural Stalls +
Data Hazard Stalls +
Control Stalls register renamingdelayed branches, branch schedulingloop unrollingstatic scheduling, software pipelining
-
J data dependent I: H J source operand I J data dependent , data dependent I ( ) (True Dependences) Read After Write (RAW) hazards pipeline HazardsI: add r1,r2,r3J: sub r4,r1,r3
-
hazard, hazard, , pipeline 1) hazards2) 3) Hazards
-
Name dependences: 2 (''name''), Anti-dependence: J r1 I Write After Read (WAR) hazards pipelineName Dependences, (1): Anti-dependences
-
Output dependence: J r1 I
Write After Write (WAW) hazards pipelineName Dependences, (2): Output dependences
-
ILP Data Hazards
: , , ,
HW/SW: ,
-
(1)dependence DIVD ADDD dependence SUBD. ADDD?
Dynamic Scheduling: (out-of-order execution) exceptionsDIVD F0,F2,F4ADDD F10,F0,F8SUBD F12,F8,F14
-
(2) compile time (.., ) compiler , pipeline
-
(3)in-order instruction issueout-of-order executionout-of-order completion
ID 5-stage pipeline 2 Issue: structural hazards (in order issue)Read Operands: operands data hazards ( stall or bypass- - ooo execution)
-
execution WAR WAW hazards
antidependence: (2) (3) SUBD WAR
utput dependence: (2) (4) MULD WAW1.DIVD F0,F2,F42.ADDD F6,F0,F83.SUBD F8,F10,F144.MULD F6,F10,F8
-
Scoreboarding1963 CDC6600 scoreboard WAR hazardsStall WB registers registers Read Operands WAW hazards (issue) Robert Tomasulo's algorithm1966 IBM360/91 WAR WAW hazards register renaming
-
Register RenamingDIV.D F0,F2,F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D F8,F10,F14MUL.D F6,F10,F8DIV.D F0,F2,F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D T,F10,F14MUL.D F6,F10,TDIV.D F0,F2,F4ADD.D S,F0,F8S.D S,0(R1)SUB.D T,F10,F14MUL.D F6,F10,TDIV.D F0,F2,F4ADD.D S,F0,F8S.D S,0(R1)SUB.D T,F10,F14MUL.D F6,F10,T
-
TomasuloReservation Stations (RS) operands Functional Units (FUs) source registers RS, input register renaming WAR, WAW hazards RS registers name dependences compiler FU RS, register file, Common Data Bus broadcast FUsLoad,Stores FUs RSs
-
MIPS floating point + load-store unit using Tomasulos algorithmLoad store buffers: hold the components of the effective address, hold the results of the completed loads, track loads that are waiting on the memoryReservation Stations: contain already issued instruction and its operands or the names of the reservation stations that will provide the operand values for this instruction
-
Tomasulo: MIPS FP-UnitFP addersAdd1Add2Add3FP multipliersMult1Mult2From MemFP RegistersReservation StationsCommon Data Bus (CDB)To MemFP OpQueueLoad BuffersStore BuffersLoad1Load2Load3Load4Load5Load6
-
TomasuloIssue: FP Op Queue RS (no structural hazard), (issue) , operands (rename registers) Execute: (EX) operands , . , CDB Write result: (WB) CDB . RS
-
(1)Reservation Station fieldsOp: Vj, Vk: source operands Qj, Qk: RS source operands , Q V operand Busy: RS
-
(2)Register Result statusQi : RS register.
Load,Store Buffer fields: effective address /Busy: buffer
-
(3)Common Data Bus data bus: data + destination (go to bus)CDB: data + source (come from bus)64 bits of data + 4 bits of Functional Unit source address source Q RS, V RSBroadcast: master, slaves
-
Tomasulo Example(load: 2 cycles, add: 2 cycles, mult: 10 cycles, divide 40 cycles)
-
Tomasulo Example Cycle 1
-
Tomasulo Example Cycle 2
-
Tomasulo Example Cycle 3 issue RS, source registers (renamed) V Q RS Load1 - ?
-
Tomasulo Example Cycle 4 Load2 - ?
-
Tomasulo Example Cycle 5 Add1, Mult1 (load: 1 cycle, add: 2 cycles, mult: 10 cycles, divide 40 cycles)
-
Tomasulo Example Cycle 6 ADDD issue name dependency F6
-
Tomasulo Example Cycle 7 o Add1 (SUBD) - ?
-
Tomasulo Example Cycle 8
-
Tomasulo Example Cycle 9
-
Tomasulo Example Cycle 10 Add2 (ADDD) - ?
-
Tomasulo Example Cycle 11 ADDD
-
Tomasulo Example Cycle 12
-
Tomasulo Example Cycle 13
-
Tomasulo Example Cycle 14
-
Tomasulo Example Cycle 15 Mult1 (MULTD) - ?
-
Tomasulo Example Cycle 16... DIVD (div: 40 cycles)
-
Tomasulo Example Cycle 55
-
Tomasulo Example Cycle 56 Mult2 (DIVD) - ?
-
Tomasulo Example Cycle 57: In-order issue, out-of-order execution out-of-order completion.
-
Tomasulo Loop ExampleLoop:LDF00R1 MULTDF4F0F2 SDF40R1 SUBIR1R1#8 BNEZR1Loop
mult: 4 cycles1st load: 8 cycles (L1 cache miss)2nd load: 4 cycles (hit) branch
-
Loop Example
-
Loop Example Cycle 1
-
Loop Example Cycle 2
-
Loop Example Cycle 3
-
Loop Example Cycle 4( SUBI - FP queue- dispatch)
-
Loop Example Cycle 5( BNEZ)
-
Loop Example Cycle 6 F0 load 80
-
Loop Example Cycle 7 register file 1 2
-
Loop Example Cycle 8
-
Loop Example Cycle 9 Load1 - ?( SUBI dispatch)
-
Loop Example Cycle 10 Load2 - ?( BNEZ dispatch)
-
Loop Example Cycle 11 load
-
Loop Example Cycle 12 issue mult?
-
Loop Example Cycle 13 issue store?
-
Loop Example Cycle 14 Mult1 - ?
-
Loop Example Cycle 15 Mult2 - ?
-
Loop Example Cycle 16 issue 3 MULTD
-
Loop Example Cycle 17 issue 3 SD
-
Loop Example Cycle 18 1 SD
-
Loop Example Cycle 19 2 SD
-
Loop Example Cycle 20 : In-order issue, out-of-order execution out-of-order completion
-
?Register renaming destination registers (dynamic loop unrolling).
Reservation stations issue integer control loop buffer registers stalls WAR hazards
-
hazards reservation stations 1 ( operand ), broadcast CDB register file, , register bus
stalls WAW WAR hazardsregister renaming reservation stations
-
Explicit Register Renaming(1) : registers register renaming;
: physical register file physical register ISA registers Translation Table ( ) physical registers
-
Explicit Register Renaming(2) pipeline 5-stage pipeline decode ISA register physical register.target : registers Register Map Table (RMT)source : X RMT physical register , .
-
(1)DIVR5,R4,R2ADDR7,R5,R1SUBR5,R3,R2LDR7,1000(R5)Instruction StreamRegister Map TableFree RegistersPR37,PR4,PR42,PR19,...DIVPR37,PR45,PR2
-
(2)DIVR5,R4,R2ADDR7,R5,R1SUBR5,R3,R2LDR7,1000(R5)Instruction StreamRegister Map TableFree RegistersPR4,PR42,PR19,...DIVPR37,PR45,PR2ADDPR4 ,PR37,PR23
-
(3)DIVR5,R4,R2ADDR7,R5,R1SUBR5,R3,R2LDR7,1000(R5)Instruction StreamRegister Map TableFree RegistersPR42,PR19,...DIVPR37,PR45,PR2ADDPR4 ,PR37,PR23SUBPR42,PR17,PR2
-
(4)DIVR5,R4,R2ADDR7,R5,R1SUBR5,R3,R2LDR7,1000(R5)Instruction StreamRegister Map TableFree RegistersPR19,...DIVPR37,PR45,PR2ADDPR4 ,PR37,PR23SUBPR42,PR17,PR2LDPR19,1000(PR42)
-
reservation stations renaming scheduling pipeline 5-stage pipeline register file WAR,WAW hazardsE ( Tomasulo) out-of-order completion explicit register renaming + Tomasulo
Resolve RAW memory conflict? (address in memory buffers)Integer unit executes in parallelWhat you might have thought1. 4 stages of instruction executino2.Status of FU: Normal things to keep track of (RAW & structura for busyl):Fi from instruction format of the mahine (Fi is dest)Add unit can Add or SubRj, Rk - status of registers (Yes means ready)Qj,Qk - If a no in Rj, Rk, means waiting for a FU to write result; Qj, Qk means wihch FU waiting for it3.Status of register result (WAW &WAR)s:which FU is going to write into registersScoreboard on 6600 = size of FU6.7, 6.8, 6.9, 6.12, 6.13, 6.16, 6.17FU latencies: Add 2, Mult 10, Div 40 clocksWhat you might have thought1. 4 stages of instruction executino2.Status of FU: Normal things to keep track of (RAW & structura for busyl):Fi from instruction format of the mahine (Fi is dest)Add unit can Add or SubRj, Rk - status of registers (Yes means ready)Qj,Qk - If a no in Rj, Rk, means waiting for a FU to write result; Qj, Qk means wihch FU waiting for it3.Status of register result (WAW &WAR)s:which FU is going to write into registersScoreboard on 6600 = size of FU6.7, 6.8, 6.9, 6.12, 6.13, 6.16, 6.17FU latencies: Add 2, Mult 10, Div 40 clocksWhat you might have thought1. 4 stages of instruction executino2.Status of FU: Normal things to keep track of (RAW & structura for busyl):Fi from instruction format of the mahine (Fi is dest)Add unit can Add or SubRj, Rk - status of registers (Yes means ready)Qj,Qk - If a no in Rj, Rk, means waiting for a FU to write result; Qj, Qk means wihch FU waiting for it3.Status of register result (WAW &WAR)s:which FU is going to write into registersScoreboard on 6600 = size of FU6.7, 6.8, 6.9, 6.12, 6.13, 6.16, 6.17FU latencies: Add 2, Mult 10, Div 40 clocks