ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the...

62
Διδάσκων: Χάρης Θεοχαρίδης, ΗΜΜΥ ([email protected]) [Προσαρμογή από Computer Architecture, Computer Organization and Design, Patterson & Hennessy, © 2005 και Superscalar Microprocessor Design, Johnson, © 1992 ] ΗΜΥ 312 -- ΑΡΧΙΤΕΚΤΟΝΙΚΗ ΗΛΕΚΤΡΟΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΔΙΑΛΕΞΕΙΣ 12-13: CPU Datapath Design – Intro to ALU

Transcript of ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the...

Page 1: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

Διδάσκων: Χάρης Θεοχαρίδης, ΗΜΜΥ([email protected])

[Προσαρμογή από Computer Architecture, Computer Organization and Design, Patterson & Hennessy, © 2005

και Superscalar Microprocessor Design, Johnson, © 1992 ]

ΗΜΥ 312 -- ΑΡΧΙΤΕΚΤΟΝΙΚΗ ΗΛΕΚΤΡΟΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

ΔΔΙΙΑΑΛΛΕΕΞΞΕΕΙΙΣΣ 1122--1133:: CCPPUU DDaattaappaatthh DDeessiiggnn –– IInnttrroo ttoo AALLUU

Page 2: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.2

The Stored Program Computer

• 1943: ENIAC– Presper Eckert and John Mauchly -- first general electronic computer

(or was it John V. Atananasoff in 1939?)– Hard-wired program -- settings of dials and switches

• 1944: Beginnings of EDVAC– among other improvements, includes program stored in memory

• 1945: John von Neumann– wrote a report on the stored program concept,

known as the First Draft of a Report on EDVAC• The basic structure proposed in the draft became known

as the “von Neumann machine” (or model).– a memory, containing instructions and data– a processing unit, for performing arithmetic and logical operations– a control unit, for interpreting instructions

Page 3: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.3

CONTROL UNIT

PC IR

MEMORY

MAR MDR

PROCESSING UNIT

ALU REG FILE

Von Neumann Model

INPUT KeyboardMouseScannerDisk, etc.

OUTPUT MonitorPrinterLEDDisk, etc.

Page 4: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.4

Data Path Components• Global bus

– Set of wires that carry n-bit signals to many components– Inputs to bus are controlled by triangle structure called tri-state devices

» Place signal on bus when enabled» Only one (n-bit) signal should be enabled at a time» Control unit decides which signal “drives” the bus

– Any number of components can read bus» Register only captures bus data if write-enabled by the control unit

• Memory and I/O– Control signals and data registers for memory and I/O devices– Memory: LW, SW– Input (keyboard): Interrupt, DMA– Output (text display): Interrupt, DMA

Page 5: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.5

Data Path Components (cont.)• ALU/FPU

– Input: register file or sign-extended bits from IR (immediate field)– Output: bus; used by…

» Condition code registers» Register file» Memory and I/O registers

• Register File– Two read addresses, one write address– Input: n-bits from bus

» Result of ALU operation or memory (or I/O) read– Outputs: two n-bit

» Used by ALU, PC, memory address» Data for store instructions passes through ALU

Page 6: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.6

Instructions (ISA)

• Fundamental unit of work• Constituents

– Opcode: operation to be performed– Operands: data/locations to be used for operation

• Encoded as a sequence of bits (just like data!)– Sometimes have a fixed length (e.g., 16 or 32 bits)– Atomic: operation is either executed completely, or not at all

Page 7: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.7

Instruction Processing

DECODE instruction

EVALUATE ADDRESS

FETCH OPERANDS

EXECUTE operation

STORE result

FETCH instruction from mem.

Page 8: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.8

Instruction Processing: FETCH• Idea

– Put next instruction in IR & increment PC

• Steps– Load contents of PC into MAR– Increment PC– Send “read” signal to memory– Read contents of MDR, store in IR

EA

OP

EX

S

F

D

Page 9: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.9

Instruction Processing: DECODE

• Identify opcode– In LC-3, always first four bits of instruction– 4-to-16 decoder asserts control line corresponding

to desired opcode

• Identify operands from the remaining bits– Depends on opcode

e.g., for LDR, last six bits give offsete.g., for ADD, last three bits name source operand #2

EA

OP

EX

S

F

D

Page 10: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.10

Instruction Processing: EVALUATE ADDRESS

• Compute address– For loads and stores – For control-flow instructions

• Examples– Add offset to base register (as in LDR)– Add offset to PC (as in LD and BR)

EA

OP

EX

S

F

D

Page 11: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.11

Instruction Processing: FETCH OPERANDS

• Get source operands for operation

• Examples– Read data from register file (ADD)– Load data from memory (LDR) EA

OP

EX

S

F

D

Page 12: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.12

Instruction Processing: EXECUTE

• Actually performs operation

• Examples– Send operands to ALU and assert ADD signal– Do nothing (e.g., for loads and stores) EA

OP

EX

S

F

D

Page 13: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.13

Instruction Processing: STORE

• Write results to destination– Register or memory

• Examples– Result of ADD is placed in destination reg.– Result of load instruction placed in destination reg.– For store instruction, place data in memory

» Set MDR» Assert WRITE signal to memory

EA

OP

EX

S

F

D

Page 14: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.14

Datapath and Control Unit

Page 15: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.15

Tracking Control Signals - Cycle 1

LW

Page 16: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.16

Tracking Control Signals - Cycle 2

SW LW

Page 17: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.17

Tracking Control Signals - Cycle 3

ADD SW LW

001

1

Page 18: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.18

Tracking Control Signals - Cycle 4

SUB ADD SW LW

1

0

0

Page 19: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.19

1

1

ADD

Tracking Control Signals - Cycle 5

SUB SW LW

Page 20: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.20

Changing the Sequence of Instructions

• In the FETCH phase,we incremented the Program Counter by 1 (address)

• What if we don’t want to always execute the instructionthat follows this one?

– examples: loop, if-then, function call

• Need special instructions that change the contents of the PC.

• These are called jumps and branches.– jumps are unconditional -- they always change the PC– branches are conditional -- they change the PC only if

some condition is true (e.g., the contents of a register is zero)

Page 21: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.21

Instruction Processing Summary

• Instructions look just like data -- it’s all interpretation.

• Three basic kinds of instructions:– computational instructions (ADD, AND, …)– data movement instructions (LD, ST, …)– control instructions (JMP, BRnz, …)

• Six basic phases of instruction processing:

F ® D ® EA ® OP ® EX ® S– not all phases are needed by every instruction– phases may take variable number of machine cycles

Page 22: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.22

Driving Force: The Clock

• The clock is a signal that keeps the control unit moving– At each clock “tick,” control unit moves to the next

machine cycle -- may be next instruction ornext phase of current instruction.

• Clock generator circuit:– Based on crystal oscillator– Generates regular sequence of “0” and “1” logic levels– Clock cycle (or machine cycle) -- rising edge to rising edge

“1”

“0”

time®MachineCycle

Page 23: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.23

Instructions vs. Clock Cycles

• MIPS vs. MHz– MIPS = millions of instructions per second– MHz = millions of clock cycles per second

• These are not the same -- why?

Page 24: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.24

The Control Unit

• Program is stored in memory – as machine language instructions, in binary

• The task of the control unit is to execute programs by repeatedly:

– Fetch from memory the next instruction to be executed.– Decode it, that is, determine what is to be done.– Execute it by issuing the appropriate signals to the ALU, memory, and

I/O subsystems.– Continues until the HALT instruction

Page 25: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.25 (c) Yngvi Bjornsson

von Neumann

Architecture

Page 26: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.26

The Von Neumann Architecture

Memory

Processor (CPU)

Input-OutputControl Unit

ALUStore data and program

Execute program

Do arithmetic/logic operationsrequested by program

Communicate with"outside world", e.g. • Screen• Keyboard• Storage devices • ...

Bus

Page 27: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.27

The ALU Subsystem

• The ALU (Arithmetic/Logic Unit) performs– mathematical operations (+, -, x, /, …)– logic operations (=, <, >, and, or, not, ...)

• In today's computers integrated into the CPU• Consists of:

– Circuits to do the arithmetic/logic operations. – Registers (fast storage units) to store intermediate computational results.– Bus that connects the two.

Page 28: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.28 28

Structure of the ALU• Registers:

– Very fast local memory cells, that store operands of operations and intermediate results.

– CCR (condition code register), a special purpose register that stores the result of <, = , > operations

• ALU circuitry:– Contains an array of circuits to do

mathematical/logic operations.• Bus:

– Data path interconnecting the registers to the ALU circuitry. ALU circuitry

GT EQ LT

R0

R1

R2

Rn

Page 29: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.29

ALU and its importanceqArithmetic and Logic Unit (ALU)

● It is a functional box designed to perform basic arithmetic, logic, and shift operations on the data.

● Implementation of the basic operations such as logic, program control, and data transfer operations are easier than arithmetic and I/O operations. Therefore, in this section we concentrate on arithmetic operations.

Page 30: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.30

Why do we need to improve the ALU?qArithmetic and Logic Unit (ALU)

● In an attempt to improve the performance, this section will talk about the Arithmetic Logic Unit.

● In regard to our previous mentions about CPU time (T), we are looking at techniques to reduce p.

T = Ic * CPI * t = Ic * (p+m*k)* t

Instruction Count

ProcessorMemory Latency (+ Cache)

Clock

Page 31: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.31

Review: MIPS Arithmetic Instructions

R-type:

I-Type:

31 25 20 15 5 0

op Rs Rt Rd funct

op Rs Rt Immed 16

Type op funct

ADD 00 100000

ADDU 00 100001

SUB 00 100010

SUBU 00 100011

AND 00 100100

OR 00 100101

XOR 00 100110

NOR 00 100111

Type op funct

00 101000

00 101001

SLT 00 101010

SLTU 00 101011

00 101100

0 add

1 addu

2 sub

3 subu

4 and

5 or

6 xor

7 nor

a slt

b sltu

● expand immediates to 32 bits before ALU●10 operations so can encode in 4 bits

32

32

32

m (operation)

result

A

B

ALU

4

zeroovf

11

Page 32: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.32

Logic Operations

R-type (add, sub) instruction format:op rs rt rd shamt funct

6 bits 5 5 5 5 6 = 32opcode 1st src 2nd src dest shift amount func --> fields

For shift instructions (shift left and shift right), the 1st source register isunused.

Logic Operation Symbol MIPS instruction

shift left << sll $10, $16, 8

shift right >> srl $10, $16, 8

AND & and $3, $7, $8

OR | or $3, $7, $8

Page 33: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.33

ALUq Some ALU operations:

● arithmetic:

● logic:

● “comparison”:

q Big Picture:

What’s in there??How do we build it??

Page 34: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.34

PrinciplesqArithmetic and Logic Unit (ALU)

● For a simple machine, the ALU should at least be able to perform operations such as:

- add- increment- subtract- decrement • • •

qArithmetic and Logic Unit (ALU)●A simple ALU is basically an adder and some control circuits

augmented by special circuits to carry out the logic and shift operations.

Page 35: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.35

Serial/Parallel/Modular ALUsqArithmetic and Logic Unit (ALU)

●An ALU can be of three types:- Serial- Parallel- Functional (Modular)

● Similar to the definition of serial and parallel adders, one can define serial and parallel ALUs.

● In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse.

● In a parallel ALU, operation on all the bits of the operand(s) is initiated simultaneously.

- In simple terms a parallel ALU can be looked at as a cascade of identical units forming a one dimensional array of cells.

Page 36: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.36

Parallel ALUqArithmetic and Logic Unit (ALU)

● Parallel ALU

1C

B1 A1

F1

C2

F2

C3

B2 A2

C

B A

Fn

n

nn

Cn+1

Control Signals

Page 37: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.37

Parallel ALUqArithmetic and Logic Unit (ALU)

● Parallel ALU- ALU operation is determined by the control signals.- In a very simple form, the bit-pattern of the control signal is determined by

the operation code.- In a parallel ALU, one needs to determine the design of a unit and then

replicate it.

Page 38: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.38

Simple ALU Design ExampleqArithmetic and Logic Unit (ALU)

●A Simple Arithmetic Logic UnitA iS2

Xi Ci+1

FiYi

BiS1S0

Ci

M

Z i

-S2, S1, S0, and M are the control signals.-Ai, Bi, and Ci are the operand bits and carry-in, respectively.-Fi and Ci+1 are the result bit and carry-out, respectively.

Page 39: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.39

Simple ALU Design ExampleqArithmetic and Logic Unit (ALU)

●A Simple Arithmetic Logic UnitA iS2

Xi Ci+1

FiYi

BiS1S0

Ci

M

Z i

-S2, S1, S0, and M are the control signals.-Ai, Bi, and Ci are the operand bits and carry-in, respectively.-Fi and Ci+1 are the result bit and carry-out, respectively.

Full Adder

Page 40: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.40

Simple ALU Design ExampleqArithmetic and Logic Unit (ALU)

●A Simple Arithmetic Logic Unit- The function of each stage can be defined as: Fi º Xi Å Yi Å Zi

Ci+1 º XiYi +(XiÅYi)Zi = XiYi+XiZi+YiZi

- By appropriate setting of the control signals one can initiate a variety of the operations.

Page 41: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.41

Simple ALU Design ExampleqArithmetic and Logic Unit (ALU)

●A Simple Arithmetic Logic Unit — Example

M=0S2 =1S1 =0S0 =1C1 =x

Þ F ¬ A Å B

Page 42: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.42

Simple ALU Design ExampleqArithmetic and Logic Unit (ALU)

●A Simple Arithmetic Logic Unit — Example

M=1S2 =1S1 =1S0 =1C1 =0

Þ F ¬ A - 1

Page 43: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.43

Improvements?qArithmetic and Logic Unit (ALU)

●As discussed before a parallel ALU offers a higher speed relative to a serial ALU.

●How can one improve the performance (speed) of ALU further?● Is it possible to build (design) an ALU faster than a parallel ALU?

Page 44: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.44

Functional (Modular) ALUqArithmetic and Logic Unit (ALU)

● Functional (modular) ALU- ALU is a collection of independent units each tailored for a specific

operation. As a result, independent operations can be overlapped.- This approach allows an additional degree of concurrency relative to a

parallel ALU, since it allows several operations to be performed on data simultaneously.

- This speed improvement comes at the expense of extra overhead needed to detect data independent operations.

Page 45: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.45

Functional (Modular) ALUqArithmetic and Logic Unit (ALU)

● Functional (modular) ALU

Adder1 Adder2 Subtractor

Multiplier • • •

Page 46: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.46

How do we design an ALU?qArithmetic and Logic Unit (ALU)

● Is it possible to improve the performance of an ALU further?●Naturally, we can improve the performance (physical speed) by taking

advantage of the advances in technology.●How can we improve the logical speed of the ALU further?

qArithmetic and Logic Unit (ALU)● In a functional ALU, is it possible to devise algorithms which allow

one to improve the performance of the basic operations?● If this is a valid direction, then the question of how to design a fast

ALU will change to how to design a fast adder, a fast multiplier, ...?"

As a computer architect, how do you design an ALU?

Page 47: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.47

Review: A 32-bit Adder/Subtractor

1-bit FA S0

c0=carry_in

c1

1-bit FA S1

c2

1-bit FA S2

c3

c32=carry_out

1-bit FA S31

c31

. . .

q Built out of 32 full adders (FAs) A0

B0

A1

B1

A2

B2

A31

B31

add/subt

1 bit FA

A

BS

carry_in

carry_out

S = A xor B xor carry_in

carry_out = AÙÙB v AÙÙcarry_in v BÙÙcarry_in(majority function)

q Small but slow!

Page 48: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.48

Lets start with the addition…qFast Adder

●How to design an adder faster than a parallel adder?●What is the major bottle-neck in a parallel adder?● Is the carry generation and propagation the major bottleneck?● Is it possible to eliminate, moderate, or reduce the delay of carry

generation and propagation?

Page 49: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.49

How do we speed up Adders?qFast Adder

●Carry Lookahead: Generate and propagate carries ahead of time relative to a parallel adder.

- Scheme 1- Scheme 2

●Carry Select

Page 50: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.50

Recall from ECE 210/211/212/213●Basic Building Block — A 4-Bit Ripple-Carry Adder● Timing: Let Dt propagation delay of 1 gate

F1 = 2Dt C2 = 2Dt F2 = 4Dt C3 = 4Dt F3 = 6Dt C4 = 6Dt F4 = 8Dt C5 = 8Dt or, give n –bits, it requires 2n logic levels (gate delays)

FA FA FA FACarry-out

B A B A B A B A

C

Carry-in

FFFF4 32

4

44

3

33

2

22

1

1

11

C C CC5

Page 51: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.51

Carry Lookahead Adder (CLA)qFast Adder

●Carry Lookahead (Scheme 1) Ci+1=AiBi+(AiÅBi)Ci=AiBi+(Ai+Bi)Ci

Carry Generate term (Gi) Carry Propagate Term (Pi)

Page 52: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.52

Σχεδιασμός CLAq Από ένα FA, διαχωρίζουμε μεταξύ της παραγωγής (generation) του

κρατουμένου (όταν ένα νέο κρατούμενο παράγεται, Cout=1) και της μετάδοσης (propagation) του κρατουμένου (όταν ένα υπάρχον Cinμεταδίδεται στο Cout)

q Παραγωγή: Gi = AiBi: if 1, Ci+1=1q Μετάδοση: Pi = Ai Å Bi: εάν 1 τότε Ci+1 = Ci

Partial Full Adder (PFA)Full Adder (FA)

Bi

Bi Ai

Ci Ci

Ci+1

Si

Si Gi Pi

Ai

Page 53: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.53

Μπλοκ CLA

qΥλοποίηση:●C1 = G0+P0 C0

●C2 = G1+P1C1 = G1+P1(G0+P0C0) = G1+P1G0+P1P0C0

●C3 = G2 + P2C2 = G2+P2G1+P2P1G0+P2P1P0C0

●C4 = G3+P3G2+P3P2G1+P3P2P1G0 + P3P2P1P0 C0= G0-3 + P0-3C0

Οµάδα Παραγωγής Κρατουµένου

Οµάδα ΜετάδοσηςΚρατουµένου

Page 54: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.54

Λογική Παραγωγής/Μετάδοσης για 4-bit CLA

Όλα 2-επιπέδωνà Το Cout υπολογίζεται γρήγορα

Page 55: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.55

Carry LookaheadqFast Adder — Carry Lookahead (Scheme 1)

● Extended (CLA) 4-Bit Full Adder

Carry Lookahead (Scheme 1)

(F.A.) (F.A.) (F.A.) (F.A.)

C5 F4g4 P4

C4

AB 44 AB 11AB 22AB 33

C1C2C3

F1g1 P1F2

g2 P2F3

g3 P3

Page 56: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.56

Carry LookaheadqFast Adder — Carry Lookahead (Scheme 1)

● Extended (CLA) 4-Bit Full Adder — Timing ps and gs are generated in 1Dt Cs are generated after another 2Dt Fs are generated after another 2Dt à 5Dt (or logic levels) for 4-bits à for n-bits ??

Page 57: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.57

Carry LookaheadqFast Adder — Carry Lookahead (Scheme 1)

- Extended (CLA) 4-Bit Full Adders in Ripple

- What is the speedup w.r.t. a non-CLA (ripple) for n bits?

Extended4-Bit F.A.

Extended4-Bit F.A.

Extended4-Bit F.A.Carry-out

Fn-3 - Fn F5- F8 F1- F4

Bn-3 - Bn A n-3 - A n B5- B8 A 5- A 8 B1- B4 A1- A 4

C1

Page 58: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.58

Carry LookaheadqFast Adder — Carry Lookahead (Scheme 2)

● Timing CLA = 5Dt Cascades of CLAs overlap 1Dt operation

CLA 2

4-Bit F.A.

B1 -B4 A 1 - A 4

C1

F1 - F4

CLA 2 CLA2B29-32 A29-32

4-Bit F.A.

Carry-out

4-Bit F.A.

B5 - B8 A 5 - A8

F5 - F8F29-32

Page 59: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.59

Carry SelectqFast Adder

●Carry Select- Carry-in to a 4-bit full adder is either 0 or 1.- Duplicate each stage - e.g., 4-bit full adder.- Initiate each unit in a stage with carry-in of 0 and 1.- Use a multiplexer to select the correct answer.

Page 60: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.60

Carry SelectqFast Adder — Carry Select 8-bit

MUX

4-BitFull Adder

4-BitFull Adder

4-BitFull Adder

Carry-in

F5"-F8

"

F4 - F8

B5 -B8 A5- A8 B5 - B8 A5 - A8

1 0B1- B4 A1- A4

F1 - F4F5

' - F8'

12Dt10Dt

Page 61: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.61

ALU Questions - PracticeqQuestions

● Calculate the execution time of a 16-bit adder using carry lookaheadscheme 1.

● Formulate the execution time of an n-bit adder using carry lookaheadscheme 1 (n is a multiple of 4).

● Calculate the execution time of a 16-bit adder using carry lookaheadScheme 2.

● Formulate the execution time of an n-bit adder using carry lookaheadscheme 2 (n is a multiple of 4).

●Calculate the execution time of a 16-bit adder using carry select scheme.

● Formulate the execution time of an n-bit adder using carry select scheme.

● Is it possible to combine carry lookahead and carry select concepts to design a faster adder?

Page 62: ΔΙΑΛΕΞΕΙΣ12-13: CPU DatapathDesign –Intro to ALU · In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU,

ΗΜΥ 312 Δ12-13 CPU Datapath Design – Intro to ALU.62

ΕΠΟΜΕΝΗ ΔΙΑΛΕΞΗ ΚΑΙ ΚΑΤ’ΟΙΚΟΝ ΜΕΛΕΤΗ

q ΕΠΟΜΕΝΗ ΕΝΟΤΗΤΑ –Αριθμητική Η/Υ και βελτιστοποίηση ALU(Computer Arithmetic and ALU Improvements)

q ΚΑΤ’ΟΙΚΟΝ ΜΕΛΕΤΗ●Κεφάλαια 5-6 – Patterson&Hennessy (από το βιβλίο του

ΗΜΥ212)●Παραρτήματα Β και Ι του βιβλίου σας