ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC...

57
ECE 5745 Complex Digital ASIC Design Course Overview Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745

Transcript of ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC...

Page 1: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

ECE 5745 Complex Digital ASIC DesignCourse Overview

Christopher Batten

School of Electrical and Computer EngineeringCornell University

http://www.csl.cornell.edu/courses/ece5745

Page 2: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 2 / 52

Page 3: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

The Computer Engineering Stack

Technology

Application

Gap too large to bridge in one step(but there are exceptions e.g., magnetic compass)

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

ECE 5745 Course Overview 3 / 52

Page 4: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

The Computer Engineering Stack

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

Circuits

Devices

Programming Language

Algorithm

Co

mp

ute

rA

rch

itectu

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

ECE 5745 Course Overview 3 / 52

Page 5: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

What is Computer Architecture?

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

Circuits

Devices

Programming Language

Algorithm

Co

mp

ute

rA

rch

itectu

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Operating System

Gate Level

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Application Requirements • Suggest how to improve architecture • Provide revenue to fund development

Technology Constraints • Restrict what can be done efficiently • New technologies make new arch possible

Architecture provides feedback to guideapplication and technology research directions

ECE 5745 Course Overview 4 / 52

Page 6: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Key Metrics in Computer Architecture

P

I$

D$

Network

Network

Network

P

I$

P

I$

P

I$

D$ D$ D$

I Primary Metrics. Execution time (cycles/task). Energy (Joules/task). Cycle time (ns/cycle). Area (µm2)

I Secondary Metrics. Performance (ns/task). Average power (Watts). Peak power (Watts). Cost ($). Design complexity. Reliability. Flexibility

Discuss qualitative first-order analysisfrom ECE 4750 on board

ECE 5745 Course Overview 5 / 52

Page 7: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Unanswered Questions from ECE 4750

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

I How can we quantitatively evaluatearea, cycle time, and energy?

I How do we actually implementprocessors, memories, andnetworks in a real chip?

I How should we implement/analyzeapplication-specific accelerators?

. Very loosely coupledmemory-mapped accelerators

. More tightly coupled co-processoraccelerators

. Specialized instructions andfunctional units

ECE 5745 Course Overview 6 / 52

Page 8: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC: Application-Specific Integrated Circuit

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

SimpleProc

Processor PowerConstraint

En

erg

y (

Jo

ule

s p

er

Ta

sk)

C

B

AAAA

Superscalarw/ Deeper

Pipelines

Out-of-Order Superscalar

Superpipelined

D

EE

MulticoreFF

Multicore E

AAAA

Specialized Accelerators

ECE 5745 Course Overview 7 / 52

Page 9: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC: Application-Specific Integrated Circuit

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

En

erg

y E

ffic

ien

cy (

Ta

sks p

er

Jo

ule

)

SimpleProcessor

Design PowerConstraint

High-PerformanceArchitectures

EmbeddedArchitectures

DesignPerformanceConstraint

Flexibility vs

. Spe

cializat

ion

CustomASIC

Less FlexibleAccelerator

More FlexibleAccelerator

ECE 5745 Course Overview 8 / 52

Page 10: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Goal for ECE 5745 is to answer these questions!

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

I How can we quantitatively evaluatearea, cycle time, and energy?

I How do we actually implementprocessors, memories, andnetworks in a real chip?

I How should we implement/analyzeapplication-specific accelerators?

. Very loosely coupledmemory-mapped accelerators

. More tightly coupled co-processoraccelerators

. Specialized instructions andfunctional units

ECE 5745 Course Overview 9 / 52

Page 11: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Full Custom Design vs. Standard-Cell Design

L01 –Introduction 22

6.884 –S

pring 20052 Feb 2005

Full Custom D

esignDesigner is free to do anything, anywhere

–though each design team

usually imposes som

e disciplineM

ost time consum

ing design style–

Reserved for very high performance or very high volum

e devices (Intel m

icroprocessors, RF power amps for cellphones)

Requires complete custom

ization of all layers of wafer

Piece of full-custom m

ultiplier array, 1.0Pm

2-metal

Full-custom layoutin 1.0µm w/ 2 metal

layers

I Full-Custom Design (ECE 4740)

. Designer is free to do anything, anywhere; thoughteam usually imposes some design discipline

. Most time consuming design style; reserved forvery high performance or very high volume chips(Intel microprocessors, RF power amps forcellphones)

I Standard-Cell Design (ECE 5745)

. Fixed library of “standard cells” and SRAMmemory generators

. Register-transfer-level description is automaticallymapped to this library of standard cells, then thesecells are placed and routed automatically

. Enables agile hardware design methodology

ECE 5745 Course Overview 10 / 52

Page 12: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Standard-Cell Design Methodology

L01 – Introduction 246.884 – Spring 2005 2 Feb 2005

Standard Cell ASICs• Also called Cell-Based ICs (CBICs)• Fixed library of cells plus memory generators• Cells can be synthesized from HDL, or entered in schematics• Cells placed and routed automatically• Requires complete set of custom masks for each design• Currently most popular hard-wired ASIC type (6.884 will use this)

Mem1 Mem

2

Cells arranged in rows

Generated memory arrays

L01 – Introduction 256.884 – Spring 2005 2 Feb 2005

Standard Cell DesignCells have standard height but vary in widthDesigned to connect power, ground, and wells by abutment

VDD Rail

GND Rail

Clock Rail

Cell I/O on M2Power

Rails in M1

Clock Rail (not typical)

NAND2 Flip-flop

Well Contact under Power Rail

Ripple carry adder with carrychain highlighted

ECE 5745 Course Overview 11 / 52

Page 13: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Standard-Cell Design Methodology

Design in HDL

HDLSimulator

Place&Route

Switching Activity

PowerAnalysis

Synthesis

Standard Cells

Gate-Level Model

Layout

Area (μm2)

Execution Time(cycles/task)

Cycle Time (ns)

Energy (J/task)

Area (μm2)

Cycle Time (ns)

Energy (J/task)

Energy (J/task)

ECE 5745 Course Overview 12 / 52

Page 14: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Example Standard-Cell Chip PlotMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 13 / 52

Page 15: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

What is Complex Digital ASIC Design?

Circuits

Devices

Programming Language

Algorithm

Co

mp

ute

rA

rch

itectu

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

layout ready for fabrication

Complex digital ASIC design isthe process of

quantitatively exploring the

area, cycle time, execution time, andenergy trade-offs

of various

automated standard-cell CAD tools and then to transform the most

promising design to

using

application-specific accelerators(and general-purpose proc+mem+net)

ECE 5745 Course Overview 14 / 52

Page 16: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 15 / 52

Page 17: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Technology Scaling is Slowing

1940 1955 1970 1985 2000 2015 2030

Syste

m P

erf

orm

an

ce

VacuumTube

DiscreteTransistor

IntegratedBipolar

IntegratedCMOS

7nm, ~50B Transistors

New Technology?

TechnologyFallowPeriod?

New Technology?

Golden Ageof ChipDesign?

OR

ECE 5745 Course Overview 16 / 52

Page 18: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Example Application Domain: Image Recognition

Starfish

Dog

ECE 5745 Course Overview 17 / 52

Page 19: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Machine Learning: Training vs. Inference

forward"starfish"

Training

many images

Model

=? "dog"

labels

errorbackward

Inferenceforward

"dog"

fewimages

ECE 5745 Course Overview 18 / 52

Page 20: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ImageNet Large-Scale Visual Recognition ChallengeT

op

5 E

rro

r R

ate

28%26%

16%

12%

7%

'10

3.6% 2.3%3%

'11 '12 '13 '14 '15 '16 '17

HumanError Rate

Software: Deep Neural Network

Hardware: Graphics Processing Units0 0

14%

En

trie

s U

sin

g G

PU

s

74%

89%~100%

ECE 5745 Course Overview 19 / 52

Page 21: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC Design for ML in the Cloud

NVIDIA DGX-1I Graphics processor

specialized just formachine learning

I Available as part of acomplete system withboth the software andhardware designed byNVIDIA

Intel NervanaI Custom ASIC specifically

designed to acceleratemachine learning

I Special hardware supportfor Winograd algorithms

I Collaboration betwenIntel and Facebook

I Maybe Facebook is alsobuilding their ownin-house chips?

Google TPU

I Custom ASIC specificallydesigned to accelerateGoogle’s TensorFlowC++ library

I Tightly integrated intoGoogle’s data centers

I 15–30× faster thancontemporary CPU andGPUs

ECE 5745 Course Overview 20 / 52

Page 22: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC Design for ML at the Edge

Google Pixel

I Designed custom ASICcalled the Pixel VisualCore for imageprocessing

I PVC enables 5xperformanceimprovement inhigh-dynamic-rangeimaging

Facebook OculusI Starting to design ASIC

chips for Oculus VRheadsets

I Significant performancedemands under strictpower requirements

Amazon EchoI Developing AI ASICs so

Echo line can do moreon-board processing

I Reduces need forround-trip to cloud

I Co-design the algorithmsand the underlyinghardware

ECE 5745 Course Overview 21 / 52

Page 23: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Top-five software companies areall making chips

I Facebook: w/ Intel, in-house AI chips?I Amazon: Echo, Oculus, networking chipsI Microsoft: Hiring for AI chips?I Google: TPU, Pixel, convergence?I Apple: SoCs for phones, wireless chips

Chip startup ecosystem formachine learning is thriving!

I GraphcoreI NervanaI CerebrasI Wave ComputingI Horizon RoboticsI CambriconI DeePhiI EsperantoI SambaNovaI EyerissI TenstorrentI MythicI ThinkForceI GroqI Lightmatter

ECE 5745 Course Overview 22 / 52

Page 24: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

The field of complex digital ASIC designis experiencing a disruptive

sea change and has a critical choice:

1. A technological fallow period

2. A golden age of ASIC design

This course will help you appreciate and possiblycontribute to this golden age!

ECE 5745 Course Overview 23 / 52

Page 25: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Motivation: Comp Arch Research Perspective

Transistors(Thousands)

MIPSR2K

IntelP4

Data collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten

1975 1980 1985 1990 1995 2000 2005 2010 2015

100

101

102

103

104

105

106

107

DECAlpha 21264

TypicalPower (W)

Frequency(MHz)

SPECintPerformance

~9%/year

~15%/year

Numberof Cores

Intel 48-Core Prototype

AMD 4-Core OpteronParallelizationandSpecialization

ECE 5745 Course Overview 24 / 52

Page 26: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Cross-Layer Interaction is Critical

Circuits

Devices

Programming Language

Algorithm

Co

mp

ute

rA

rch

itectu

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

Need to quantitatively understand

area, cycle time, and energy

trade-offs to be able to create

new revolutionary architectures

that tackle the most

challenging physical design

issues

ECE 5745 Course Overview 25 / 52

Page 27: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Motivation: Circuits Research Perspective

Your Digital Circuit Here

Fighter Airplane

~100,000 parts

Your Analog Circuit Here

Intel Sandy Bridge E

2.27 Billion transistors

ECE 5745 Course Overview 26 / 52

Page 28: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Cross-Layer Interaction is Critical

Circuits

Devices

Programming Language

Algorithm

Co

mp

ute

rA

rch

itectu

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

Circuit-level researchers need to

appreciate the system-level

context for their subsystems;

cross-layer interaction can

generate some of the

most exciting research ideas

ECE 5745 Course Overview 27 / 52

Page 29: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 28 / 52

Page 30: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Structure

Part 1ASIC Design

Overview

Part 2Digital CMOS

Circuits

Part 3CAD Algorithms

P P

MM

PrereqComputer

Architecture

ECE 5745 Course Overview 29 / 52

Page 31: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 1: ASIC Design Overview

P P

MM

Topic 1Hardware

DescriptionLanguages

Topic 2CMOS Devices

Topic 3CMOS Circuits

Topic 4Full-Custom

DesignMethodology

Topic 5Automated

DesignMethodologies

Topic 7Clocking, Power Distribution,

Packaging, and I/O

Topic 8Testing and Verification

Topic 6Closing

theGap

ECE 5745 Course Overview 30 / 52

Page 32: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 2: Digital CMOS Circuits

Topic 1

0

Sequential

StateTopic

11

Interconnect

Topic 9

Combinational

Logic

ECE 5745 Course Overview 31 / 52

Page 33: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 3: CAD Algorithms

RTL to Logic

Synthesis

TechnologyIndependent

Synthesis

TechnologyDependentSynthesis

x = a'bc + a'bc'y = b'c' + ab' + ac

x = a'by = b'c' + ac

Placement

DetailedRouting

Global

Routing

Topic 12Synthesis Algorithms

Topic 13Physical Design Automation

ECE 5745 Course Overview 32 / 52

Page 34: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Five-Week Design Project

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

En

erg

y E

ffic

ien

cy (

Ta

sks p

er

Jo

ule

)

SimpleProcessor

Design PowerConstraint

High-PerformanceArchitectures

EmbeddedArchitectures

DesignPerformanceConstraint

Flexibility vs

. Spe

cializat

ion

CustomASIC

Less FlexibleAccelerator

More FlexibleAccelerator

ECE 5745 Course Overview 33 / 52

Page 35: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 34 / 52

Page 36: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2

Fixed-Latency Iterative Multiplier Datapath

b_regb_mux_sel

32b

a_rega_mux_sel

32b

result_reg

result_mux_sel

result_en

32b

32b

32b

>>

<<

add_mux_sel

req_msg.areq_msg

32b resp_msg

req_msg.b

b_lsb

0

ECE 5745 Course Overview 35 / 52

Page 37: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 36 / 52

Page 38: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

μT0

Memory

μT1 μT2 μT3 μT4 μTi

ExecutionResources

ControlInstructions

MemoryInstructions

ArchitecturalRegisters

ECE 5745 Course Overview 37 / 52

Page 39: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

μT0

Memory

μT1 μT2 μT3 μT4 μTi

loop: load a, a_ptr load b, b_ptr add c, a, b store c, c_ptr a_ptr++ b_ptr++ c_ptr++ branch

Vector-Vector Add Masked Filter

loop: load a, a_ptr a_ptr++ branch a = 0 op0 op1 ... branch

ECE 5745 Course Overview 37 / 52

Page 40: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

μT0

Memory

μT1 μT2 μT3 μT4 μTi

TypicalCoreMicro-Architecture

En

erg

y

Performance

MIMDμT0

μT1

μT2

μT3

Instr Memory

Data Memory

Multi-threadedCores

ECE 5745 Course Overview 37 / 52

Page 41: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0

elm 0 elm 1 elm 2 elm 3

Memory

CTj

elm i

ArchitecturalVector Registerwith 4 Elements

Vector-SIMDArithmetic

Instructions

Vector-SIMDMemory

Instructions

ECE 5745 Course Overview 38 / 52

Page 42: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0

elm 0 elm 1 elm 2 elm 3

Memory

CTj

elm i

loop: vload A, a_ptr vload B, b_ptr vadd C, A, B vstore C, c_ptr a_ptr++ b_ptr++ c_ptr++ branch

loop: vload A, a_ptr vset F, A = 0 vop0 under flag F vop1 under flag F ... a_ptr++ branch

Vector-Vector Add Masked Filter

ECE 5745 Course Overview 38 / 52

Page 43: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0

elm 0 elm 1 elm 2 elm 3

Memory

CTj

elm i

TypicalCoreMicro-Architecture 0

2

1

3

Instruction Memory

CP VIU

VMU

Data Memory

Ve

cto

r L

an

es

En

erg

y

Performance

MIMD

Vector-SIMD

ECE 5745 Course Overview 38 / 52

Page 44: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Area EvaluationMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 39 / 52

Page 45: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Area Evaluation

Quad-Corew/ Vertical

Multithreading

Vector-SIMD

(8 elm/lane)

1 t

hre

ad

2 t

hre

ad

s4

th

rea

ds

8 t

hre

ad

s

1 c

ore

w/

4 la

ne

s

4 c

ore

s w

/ 1

lan

e e

a

data cacheinstruction cache

control processor

integer functional units

floating-point functional unitsmemory unitsregister filecontrol logic

Single-core multi-lane designreduces area by 15%

Multi-core single-lane design

increases area by 20%

1.75

1.25

1.00

0.50

0.00

0.25

0.75

1.50

No

rma

lize

d A

rea

ECE 5745 Course Overview 40 / 52

Page 46: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Performance and Energy Evaluation

0.8 1.0 1.2 1.4 1.6

Normalized Tasks / SecondNo

rma

lize

d E

ne

rgy / T

ask

1.6

1.4

1.2

1.0

0.8

0.6

0.4

25

20

15

10

5

0En

erg

y / T

ask (μ

J)

Multithreaded Multicore(increasing number ofthreads per core)

Quad-Corew/ Vertical

Multithreading

1 th

rea

d

2 th

rea

ds

4 th

rea

ds

8 th

rea

ds

Performance reduction with increasingthreads due to increased cycle time and

thread management overhead onfine-grain loops

Single-Lane Vector(increasing vlen)

Vector-SIMD

(4 core w/ 1 lane)

data cacheinst cache

control procint func unitsmemory unitsregister filecontrol logic

1 e

lm/la

ne

2 e

lm/la

ne

4 e

lm/la

ne

8 e

lm/la

ne

leakage

ECE 5745 Course Overview 41 / 52

Page 47: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Simple RISC Processor ASIC

SP Datapath SP Regfile

RAMSubbank

(2KB)

RAMSubbank

(2KB)

RAMSubbank

(2KB)

RAMSubbank

(2KB)

AH

IPC

on

tro

ller

SP Control

RAM Interface

VCO

Figure 2: STC1 chip plot and die photo. On the chip plot, the Exec Unit is colored blue, and theFetch Unit is colored green and purple.

HostPC

PLX9050

ATB0Controller

PowerSupplies

ProbePoints

PCI

SDRAM

STC1

Host ATB0 Daughter Card

Figure 3: Test System Block Diagram.The test system includes the ATB0test baseboard, a daughter card withthe STC1 chip, and a host computerto control the test system.

5

ECE 5745 Course Overview 42 / 52

Page 48: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Simple RISC Processor ASIC

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 41 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

750 ------------- . . . . . . . . . . . . . . . . . . . . . . . . .740 . . . . . . . . . . . . . . . . . . . . . . . . .730 . . . . . . . . . . . . . . . . . . . . . . . . .720 . . . . . . . . . . . . . . . . . . . . . . . . .710 . . . . . . . . . . . . . . . . . . . . . . * * .700 ------------- . . . . . . . . . . . . . . . . . . . . * * * * .690 . . . . . . . . . . . . . . . . . . . * * * * * .680 . . . . . . . . . . . . . . . . . * * * * * * . .670 . . . . . . . . . . . . . . . . * * * * * * * . .660 . . . . . . . . . . . . . . . . * * * * * * * . .650 ------------- . . . . . . . . . . . . . . * * * * * * * * * * .640 . . . . . . . . . . . . . . * * * * * * * * * * .630 . . . . . . . . . . . . . * * * * * * * * * * . .620 . . . . . . . . . . . . * * * * * * * * * * * * .610 . . . . . . . . . . . * . * * * * * * * * * * . .600 ------------- . . . . . . . . . . . * * * * * * * * * * * * . .590 . . . . . . . . . . * * * * * * * * * * * * * * .580 . . . . . . . . . . * * * * * * * * * * * * * . .570 . . . . . . . . . * * * * * * * * * * * * * * * *560 . . . . . . . . . * * * * * * * * * * * * * * * *550 ------------- . . . . . . . * * * * * * * * * * * * * * * * * .540 . . . . . . . * * * * * * * * * * * * * * * * . *530 . . . . . . . * * * * * * * * * * * * * * * * * *520 . . . . . . * * * * * * * * * * * * * * * * * * *510 . . . . . . * * * * * * * * * * * * * * * * * * *500 ------------- . . . . . * * * * * * * * * * * * * * * * * * * *490 . . . . * * * * * * * * * * * * * * * * * * * * *480 . . . . * * * * * * * * * * * * * * * * * * * * *470 . . . * * * * * * * * * * * * * * * * * * * * * *460 . . . * * * * * * * * * * * * * * * * * . * * * *450 ------------- . . . * * * * * * * * * * * * * * * * * * * * * *440 . . * * * * * * * * * * * * * * * * * * * * * * *430 . . * * * * * * * * * * * * * * * * * * * * * * *420 . . * * * * * * * * * * * * * * * * * * * * * * *410 . * * * * * * * * * * * * * * * * * * * * * * * *400 ------------- . * * * * * * * * * * * * * * * * * * * * * * * *390 . . . . . * * * * *380 . . . . . * * * * *370 . . . . . * * * * *360 . . . . . * * * * *350 --- . . . . * * * * * *340 . . . . * * * * * *330 . . . . * * * * * *320 . . . * * * * * * *310 . . . * * * * * * *300 --- . . . * * * * * * *290 . . . * * * * * * *280 . . * * * * * * * *270 . . * * * * * * * *260 . . * * * * * * * *250 --- . . * * * * * * * *240 . * * * * * * * * *230 . * * * * * * * * *220 . * * * * * * * * *210 . * * * * * * * * *200 --- * * * * * * * * * *190 * * * * * * * * * *180 * * * * * * * * * *

Figure 5: Shmoo plot for awide voltage range. The hori-zontal axis plots supply volt-age in mV, and the verticalaxis plots frequency in MHz.A * indicates correct opera-tion at that operating pointand a dot indicates failure.The shmoo plot shows com-bined results from two di�er-ent chips.

la t9, input_end

forever:

la t8, input_start

la t7, output

loop:

lw t0, 0(t8)

addu t8, 4

bgez t0, pos

subu t0, $0, t0

pos:

sw t0, 0(t7)

addu t7, 4

bne t8, t9, loop

b forever

Figure 6: Test program for measuringtypical power consumption. The innerloop calculates the absolute values ofintegers in an input array and storesthe results to an output array. The in-put array had 10 integers during test-ing, and the program repeatedly runsthis inner loop forever.

7

I RISC processor w/ 8 KB SRAMI TSMC 0.18 µm processI 1.7× 2.1 mmI 610K TransistorsI 450 MHz at 1.8 V

ECE 5745 Course Overview 43 / 52

Page 49: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Vector-Thread Processor ASIC

ECE 5745 Course Overview 44 / 52

Page 50: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Vector-Thread Processor ASIC

TSMC 0.18µm • 7.14 Million Transistors • 16.6 mm2 Core Area

CP

Vector Lane 0

Vector Lane 1

Vector Lane 2

Cache Control

Cache Tag CAMs

CacheDataRAM

Banks VMU

VIU

XBAR

Vector Lane 3

ECE 5745 Course Overview 45 / 52

Page 51: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Energy vs. Performance Results

Motivation Vector-Threading Scale VT Processor Maven VT Processor Future Research

Energy-Efficiency of the Scale VT Processor

MIT CSAIL Christopher Batten 24 / 39ECE 5745 Course Overview 46 / 52

Page 52: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

BRG Test Chip 1 (2016)

Taped-out Layout for BRGTC1

HostInterface

de

bu

g

RISCCore

SortAccel

Memory Arbitration Unit

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

diff clk (+)

diff clk (−)

singleended clk

rese

t

Ctrl

Reghost2chip

chip2host

LVDSRecv

clkdiv

clk tree

resettree

clk

ou

t

The testing platform enables running small

test programs on BRGTC1 to compare the

performance and energy of pure-software kernels

versus the HLS-generated sorting accelerator

2x2mm 1.3M transistors in IBM 130nm

RISC processor, 16KB SRAM

HLS-generated accelerators

Static Timing Analysis Freq. @ 246 MHz

Post-Silicon Evaluation Strategy

LV

DS

div

ide

d

clk

ou

t

LV

DS

clk

ou

t

ECE 5745 Course Overview 47 / 52

Page 53: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Celerity System-on-Chip Overview (2017)

Target Workload: High-Performance Embedded Computing

I 5× 5mm in TSMC 16 nm FFCI 385 million transistorsI 511 RISC-V cores

. 5 Linux-capable Rocket cores

. 496-core tiled manycore

. 10-core low-voltage arrayI 1 BNN acceleratorI 1 synthesizable PLLI 1 synthesizable LDO VregI 3 clock domainsI 672-pin flip chip BGA packageI 9-months from PDK access to

tape-out

ECE 5745 Course Overview 48 / 52

Page 54: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

BRG Test Chip 2 (2018)

Memory

Instruction Memory Arbiter

L1 Data $

(32KB)

LLFU Arbiter

Int Mul/Div

FPU

L1 Instruction $

(32KB)

Ho

st

Inte

rfa

ce

Syn

the

siz

ab

le P

LL

ArbiterData

Block Diagram4xRV32IMAF cores with “smart”

sharing L1$/LLFU,synthesizable PLL

Taped-out Layout for BRGTC22x2mm, 1.2M-trans, IBM 130nm

Static Timing Analysis Freq. @ 500MHz

ECE 5745 Course Overview 49 / 52

Page 55: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. Why should students want to take this course?. How is the course structured?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 50 / 52

Page 56: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •

Brainstorming for Sorting Accelerator

I$

D$

Arbiter

XcelP

Performance (Tasks per Second)

En

erg

y E

ffic

ien

cy (

Ta

sks p

er

Jo

ule

)

SoftwareBaseline

Sort array of integers stored in memoryProc sends Xcel base address and sizeProc waits for Xcel to finish sorting

I Software Baseline. Insertion sort O(n2)

. 20% of dynamicinstructions are lw/sw

I Hardware Accelerator

. What is ideal speedupassuming we still useinsertion sort?

. What if we use adifferent algorithm?

. Sketch hardware impl ofa sorting accelerator

. Where will acceleratorbe in the energy/perfspace?

ECE 5745 Course Overview 51 / 52

Page 57: ECE 5745 Complex Digital ASIC Design Course Overview · Facebook Oculus I Starting to design ASIC chips for Oculus VR headsets I Significant performance demands under strict power

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

μArch

Technology

Application

OS

Gates

Circuits

Take-Away Points

I Complex digital ASIC design is the process ofquantitatively exploring the area, cycle time,execution time, and energy trade-offs ofgeneral-purpose and application-specific designsusing automated standard-cell CAD tools and thento transform the most promising design to layoutready for fabrication

I Course can provide useful experience withcross-layer interaction for students interested inpursuing research in computer architecture orcircuits or students interested in pursuing a careerin in industry development of ASICs

ECE 5745 Course Overview 52 / 52