Comparch CheatSheet

+ /: RAM . Mips: 32 registers, . . (GPU) floating point , CPU , . Northbridge : (CPU, RAM,) Southbridge: (/,)hroughput: , ( bytes/sec ) access Latency: ( ) , , , / throughput bandwidth . 2 : Data / . (response time) bandwidth( /) access latency ( /) (1 4) 2 (5400 15000RPM) /tracks (10 50tracks/) /sectors (100 500 sectors/track) 512bytes ( 4) Seek time ( ) Rotational Latency rotational delay ( ) Transfer Time ( block bits) Redundant Array of Inexpensive Disks . RAID: read heads, redundancy & . RAID 0 - Striping blocks 2 . . , . ( RAID ) RAID 1 - Mirroring block 2 (). , 2 . , , 2. RAID 5 - Striped set with distributed parity blocks , block . , . , , . RAID 6 - Striped set with dual parity blocks , block , . , 2 ,

8

P

8 : versatility low cost : : , access times data rates. : requests acknowledgements : ( ) Bus transaction: (+) CPU-memory bus: . , , CPU-memory bandwidth I/O bus: , , / . CPUmemory (backplane bus). , (.. ) I/O bus . (.. USB, Firewire) . : : (. clock skew) (.. CPU-MEM) : : : , (USB, Firewire = asynchronous) : ( CPU-MEM bus) / I/O 2 : ) memory mapped I/O .. 0xFFFF0000 0xFFFF000F 4 command registers 32bit I/O. , , registers. ) special Input/Output commands / , CPU; ) Polling CPU state ) Interrupt interrupt ) CPU (.. ) ; ) Programmed I/O CPU (CPU busy) ) Direct Memory Access CPU DMA (bus master) CPU interrupt (CPU free) - DMA CPU : Virtual memory 2 : (DRAM) ( ) blocks : locks : Pages ( 4k 64k bytes). locks : Segments ( 216 232) , , . block page

fault (address fault) block (OS handler). virtual addresses (logical program address) physical addresses ( , ). (page fault) (100.000 ) (4KB 16KB) Fully associative / write-back !!

P Page Faults Page faults: the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use write-back TLB: on-chip fully-associative cache . virtual address TLB (TLB hit), .

P RAM: : Memory latency: cache miss penalty :Access time: cache/CPU. Cycle time: ( access time DRAM ) Memory bandwidth: cache/CPU. !CACHE: Cache: Miss Rate ( block - cache block frames conflict misses cache miss rate -* cache associativity * Pseudo-associative Caches Victim caches * Hardware/Software prefetching -Compiler-controlled prefetching * Compiler ) Cache Miss Penalty (Cache 2 (L2) * merging write buffers * Early restart and critical word first * Non-blocking caches* read misses writes ) Cache Hit ( caches indexing Pipelining writes write hits) cache : hit time off-chip caches. Compiler Optimizations : procedures conflict misses. Merging Arrays: spatial locality 2 . Loop Interchange: .

Loop Fusion: 2 . Blocking: temporal locality Early Restart Critical Word First block cache CPU:Early restart: block, CPU . Critical Word First: block CPU . CPU block . cache block . spatial locality early restart.

P !! : Temporal Locality: Spatial locality: , . , hit : block hit rate : hits/ hit time : miss : block miss rate : 1 (hit rate) miss penalty : ( block ) + ( CPU) access time : 1 transfer time : block cache Direct mapped : ( block) mod (. block cache) Set associative : ( block) mod (. sets cache) Fully associative : ! ! associativity => bits cache block => bits cache. block cache Random () block hardware LRU (least recently used) block hardware FIFO (first in - first out) - block cache write through vs write back Write Through Pros:- read miss never results in writes to main memory - easy to implement- main memory always has the most current copy of the data (consistent)Cons:- write is slower - every write needs a main memory access - as a result uses more memory bandwidth Write back Pros: - writes occur at the speed of the cache memory- multiple writes within a block require only one write to main memory- as a result uses less memory bandwidth Cons: - harder to implement- main memory is not always consistent with cache - reads that result in replacement may cause writes of dirty blocks to main memory Write-allocate: block cache write-no-allocate: block e cache read hit : cache read miss : block cache read hit Write-back & Write-allocate write hit: cache (). block dirty block cache write miss: block: cache write hit Write-through & write-no-allocate

write hit: cache write miss: cache Write Back with No Write Allocate: on hits it writes to cache setting dirty bit for the block, main memory is not updated; on misses it updates the block in main memory not bringing that block to the cache; Subsequent writes to the same block, if the block originally caused a miss, will generate misses all the way and result in very inefficient execution Write Through with Write Allocate: On hits it writes to cache and main memory on misses it updates the block in main memory and brings the block to the cache Bringing the block to cache on a miss does not make a lot of sense in this combination because the next hit to this block will generate a write to main memory anyway (according to Write Through policy) miss rate (unified cache) < miss rate (instr + data cache) /access (instr+data cache) < /access (unified cache) : block cache. caches CPUtime = Instruction count x CPI x 1 CPIexecution = CPI CPI = CPIexecution + Mem stalls/ CPUtime = Instruction Count x (CPIexecution + Mem stalls/) x 1 Mem stalls/ = / x Miss rate x Miss penalty CPUtime = IC x (CPIexecution + x Miss rate x Miss penalty) x 1 Misses/ = x Miss rate CPUtime = IC x CPIexecution + Misses/ x Miss penalty) x 1 (C) PIPELINING: timepipelined=timenon_pipelined/ number of pipe stages ? (buffers) Pipeline Hazards (structural hazards) (control hazards) (data hazards) RAW (Read-After-Write) (true-dependence) WAR: (Write-After-Read) (anti-dependence) MIPS WB . WAW: (Write-After-Write) (output-dependence) MIPS WB WB . : 1.EX hazard: if(EX/MEM.RegWrite and(EX/MEM.RegisterRd0) and(EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and(EX/MEM.RegisterRd0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB=10 2.MEM hazard: if(MEM/WB.RegWrite and(MWM/WB.RegisterRd0) and(EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and(MEM/WB.RegisterRd0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB=01 Hazard detection unit ID load . If (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt) ))Stall the pipeline

! !

P !

! : -2n-1+1 2n-1-1. To 0 2 (0000 1000). : : , .. : 0 1: n bits -2n-1+1 2n-1-1 To 0 2 (0000 1111). : 0 2: n bits . . -2n-1 2n-1-1. To 0 1 (0000) : . : : BOOTH (multiplicand) * Q * (multiplier) bit : 01 : 1 10 : 1 00, 11 : 1 bit

( ) n bits : n=1+ n1 + n2 MSB n1 bits ( ) n2 bits ( ) : 2-4 : = (-1)sign 2e = (mantissa / significant) e = (exponent) : 101,011 = 101,011 20 =1,01011 22 = 0,101011 23 = 101011 2-3 (normal form): 1,xxxxxxxx 2e : 2-n1-1 2e : 2-n1-1 . bits bits (floating point overflow) (floating point underflow) 754 floating point standard : Single precision : 8 bits , 23 bits Double precision : 11 bits , 52 bits . bit 1 ( ) 1 bit Min = 00000 Max = 111..11 . 754 = 127 single precision = 1023 double precision N = (-1)sign (1 + ) 2e-bias e = = 0 = 0 e = 0, 0 e = 255/2047, = 0 = e = 255/2047, 0 = aN ( )

!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!

! !!!!!!!!!!!!!!!!! . Instructions/Program, Cycles/Instruction Seconds/Cycle ; (i) 2 1. Instructions/Program : instruction set . Cycles/Instruction : , ( ). , . Seconds/Cycle : . critical path ( ), cycle time . (ii) . Instructions/Program : instruction set . Cycles/Instruction : , pipeline. Seconds/Cycle : seconds. (iii) compiler. Instructions/Program : , compilers . , compiler hazards. Cycles/Instruction : , compiler hazards ( ). Seconds/Cycles : . MEM EX. . (i) ; , ; register indirect addressing. offset ( . (ii) , MIPS (stalls) . LW r2, 0(r3) ADD r4, r2, r5 pipeline ADD 1 stall. 0. (iii) , MIPS (stalls) . ADD r2, r3, r4 LW r5, 0(r2) pipeline 1 stall, 0.

Comparch CheatSheet

Documents

Transcript of Comparch CheatSheet