Computer Architecture The P6 Micro-Architecture An Example of an Out-Of-Order Micro-processor

Click here to load reader

download Computer Architecture The P6 Micro-Architecture  An Example of an Out-Of-Order Micro-processor

of 53

  • date post

    23-Feb-2016
  • Category

    Documents

  • view

    92
  • download

    0

Embed Size (px)

description

Computer Architecture The P6 Micro-Architecture An Example of an Out-Of-Order Micro-processor. Lihu Rappoport and Adi Yoaz. The P6 Family. Features Out Of Order execution Register renaming Speculative Execution Multiple Branch prediction Super pipeline: 12 pipe stages. *off die. - PowerPoint PPT Presentation

Transcript of Computer Architecture The P6 Micro-Architecture An Example of an Out-Of-Order Micro-processor

MAMAS

Computer Architecture The P6 Micro-Architecture An Example of an Out-Of-Order Micro-processorLihu Rappoport and Adi Yoaz Computer Architecture 2011 P6 uArch #The P6 FamilyFeaturesOut Of Order executionRegister renamingSpeculative ExecutionMultiple Branch predictionSuper pipeline: 12 pipe stagesProcessorYearFreq (MHz)Bus (MHz)L2 cacheProcess ()Pentium Pro1995150~20060/66256/512K*0.5, 0.35Pentium II1997233~45066/100512K*0.35, 0.25Pentium III1999450~1400100/133256/512K0.25, 0.18, 0.13Pentium M2003900~2260400/5331M / 2M0.13, 90nmCoreTM20051660~23306672M65nmCoreTM 220061800~2930800/10662/4/8M65nm*off die Computer Architecture 2011 P6 uArch #In-Order Front EndBIU: Bus Interface UnitIFU: Instruction Fetch Unit (includes IC)BPU: Branch Prediction UnitID: Instruction DecoderMS: Micro-Instruction SequencerRAT: Register Alias TableOut-of-order CoreROB: Reorder BufferRRF: Real Register FileRS: Reservation StationsIEU: Integer Execution UnitFEU: Floating-point Execution Unit AGU: Address Generation UnitMIU: Memory Interface UnitDCU: Data Cache UnitMOB: Memory Order BufferL2: Level 2 cache In-Order RetireP6 ArchMSAGUMOBExternal BusIEUMIUFEUBPUBIUIFUIDRATRSL2DCUROB Computer Architecture 2011 P6 uArch #O1O3R1R2ExI1I2I3I4I5I6I7I8NextIPRegRenRSWrIcacheDecodeRS dispRetirement In-Order Front EndOut-of-order CoreIn-order Retirement1: Next IP2: ICache lookup3: ILD (instruction length decode)4: rotate5: ID16: ID27: RAT- rename sources, ALLOC-assign destinations8: ROB-read sources RS-schedule data-ready uops for dispatch9: RS-dispatch uops10:EX11:RetirementP6 Pipeline Computer Architecture 2011 P6 uArch #In-Order Front EndBPU Branch Prediction Unit predict next fetch addressIFU Instruction Fetch UnitiTLB translates virtual to physical address (access PMH on miss)IC supplies 16byte/cyc (access L2 cache on miss)ILD Induction Length Decode split bytes to instructionsIQ Instruction Queue buffer the instructionsID Instruction Decode decode instructions into uopsMS Micro-Sequencer provides uops for complex instructionsNext IPMuxBPU IDMS ILDIQIDQIFUBytesInstructionsuops Computer Architecture 2011 P6 uArch #Branch Prediction ImplementationUse local history to predict directionNeed to predict multiple branches Need to predict branches before previous branches are resolved Branch history updated first based on prediction, later based on actual execution (speculative history)Target address taken from BTBPrediction rate: ~92%~60 instructions between mispredictionsHigh prediction rate is very crucial for long pipelinesEspecially important for OOOE, speculative execution:On misprediction all instructions following the branch in the instruction window are flushedEffective size of the window is determined by prediction accuracyRSB used for Call/Return pairs Computer Architecture 2011 P6 uArch #Branch Prediction ClusteringNeed to provide predictions for the entire fetch line each cyclePredict the first taken branch in the line, following the fetch IP

Implemented by Splitting IP into offset within line, set, and tagIf the tag of more than one way matches the fetch IPThe offsets of the matching ways are ordered Ways with offset smaller than the fetch IP offset are discardedThe first branch that is predicted taken is chosen as the predicted branchJump into the fetch lineJump out of the linejmpjmpjmpjmpPredictnot takenPredicttakenPredicttakenPredicttaken Computer Architecture 2011 P6 uArch #The P6 BTB2-level, local histories, per-set counters4-way set associative: 512 entries in 128 setsIPTagHist1001Pred=msb of counter9015Way 0TargetWay 2Way 39432counters128setsPTV21132LRR2Per-SetBranch Type00- cond01- ret10- call11- uncondReturnStackBufferWay 1Prediction bit4ofstUp to 4 branches can have a tag match Computer Architecture 2011 P6 uArch #Determine where each IA instruction startsIn-Order Front End: DecoderBuffer Instructions Convert instructions Into uopsIf inst aligned with dec1/2/3 decodes into >1 uops, defer it to next cycle IQInstruction Length Decode16 Instruction bytes from IFU1 uopD1IDQD21 uopD04 uopsBuffers uopsSmooth decoders variable throughputD31 uop Computer Architecture 2011 P6 uArch #Micro Operations (Uops)Each CISC inst is broken into one or more RISC uopsSimplicity:Each uop is (relatively) simpleCanonical representation of src/dest (2 src, 1 dest)Increased ILPe.g., pop eax becomes esp1