ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

119
ΗΜΥ 408 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Εαρινό Εξάμηνο 2020 ΔΙΑΛΕΞΕΙΣ 6 - 7: Design Flow ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ ([email protected]) Some slides adopted from Digital Integrated Circuits, Rabbey et. al.

Transcript of ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

Page 1: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ 408

ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs

Εαρινό Εξάμηνο 2020

ΔΙΑΛΕΞΕΙΣ 6 - 7: Design Flow

ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ

([email protected]) Some slides adopted from Digital Integrated Circuits, Rabbey et. al.

Page 2: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.2 © Θεοχαρίδης, ΗΜΥ, 2020

Design Process Steps (Review)

⚫ Definition of system requirements.

Example: ISA (instruction set architecture) for CPU.

Includes software and hardware interfaces including timing.

May also include cost, speed, reliability and maintainability specifications.

⚫ Definition of system architecture.

Example: high-level HDL (hardware description language) representation - this is not required in ECE 408 specifically but is done in the real world).

Useful for system validation and verification and as a basis for lower level design execution and validation or verification.

Page 3: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.3 © Θεοχαρίδης, ΗΜΥ, 2020

Design Process Steps (Review)

⚫ Refinement of system architecture

In manual design, descent in hierarchy, designing increasingly lower-level components

In synthesized design, transformation of high-level HDL to “synthesizable” register transfer level (RTL) HDL

⚫ Logic design or synthesis

In manual or synthesized design, development of logic design in terms of library components

Result is logic level schematic or netlist representation or combinations of both.

Both manual design or synthesis typically involve optimization of cost, area, or delay.

Page 4: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.4 © Θεοχαρίδης, ΗΜΥ, 2020

Design Process Steps (Review)

⚫ Implementation

Conversion of the logic design to physical implementation

Involves the processes of: Mapping of logic to physical elements,

Placing of resulting physical elements,

And routing of interconnections between the elements.

In case of SRAM-based FPGAs, represented by the programming bitstream which generates the physical implementation in the form of CLBs, IOBs and the interconnections between them

Page 5: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.5 © Θεοχαρίδης, ΗΜΥ, 2020

Design Process Steps (Review)

⚫ Validation (used at number of steps in the process)

At architecture level - functional simulation of HDL

At RTL level- functional simulation of RTL HDL

At logic design or synthesis - functional simulation of gate-level circuit - not usually done in ECE 408/664

At implementation - timing simulation of schematic, netlist or HDL with implemention based timing information (functional simulation can also be useful here)

At programmed FPGA level - in-circuit test of function and timing

Page 6: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.6 © Θεοχαρίδης, ΗΜΥ, 2020

Hardware design in general

Logic (RTL) design

Logic simulation

Logic debugging

RTL code

(Verilog)

Placement & routing

Timing simulation

Timing analysis

Netlist &

Gate delay

(SDF)GDSII

Semiconductor fabricationGDSII &

Test-vector

Logic synthesis

Gate-level simulation

Gate-level debugging

Netlist

(EDIF)

RTL code &

Target

library

Logic synthesis

Placement & routing

FPGA

bit-stream

RTL code &

FPGA

library

FPGA

BOARD

FPGA

Debugging

Logic synthesis

RTL compilation

Placement & routing

H/W Platform

General Hardware

Design Flow /

Methodologies.

Page 7: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.7 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL/Core Design Flow

DESIGN ENTRY

CORE GENERATIONRTL HDL EDITING

RTL HDL-CORE

SIMULATION

SYNTHESIS

IMPLEMENTATION

TIMING

SIMULATION

FPGA PROGRAMMING

& IN-CIRCUIT TEST

Page 8: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.8 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL/Core Design Flow - HDL Editing

Language Construct

Templates

HDL EDITOR

DESIGN WIZARD LANGUAGE ASSISTANTAccessed within

HDL Editor

RTL HDL Files

HDL Module

Frameworks

Page 9: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.9 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL/core Design Flow – Core Generation

CORE GENERATOR

Select core and

specify input

parameters

HDL instantiation

module for

core_name

EDIF netlist for

core_name

Other core_name files

Page 10: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.10 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL/core Design Flow - HDL Functional Simulation

Compile HDL Files

Waveforms

or List Files

Set Up and Map

work libraryRTL HDL Files

Test Inputs or

Force Files

HDL instantiation

module for

core_name

EDIF netlists for

core_names

Functional Simulate

Testbench HDL

Files

HDLSIMULATOR

Page 11: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.11 © Θεοχαρίδης, ΗΜΥ, 2020

All HDL Files

Gate/Primitive Netlist

Files (EDIF or XNF)

Xilinx HDL Design Flow - Synthesis

Select Top Level

Select Target Device

Edit XST Synthesis

Constraints

Synthesize

Synthesis/Implement-

ation Constraints

Synthesis Report

Files

EDIF netlists for

core_names

XST

Page 12: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.12 © Θεοχαρίδης, ΗΜΥ, 2020

Model Extraction

Xilinx HDL/core Design Flow - Implementation

Netlist

Translation

Map

Place &

Route

BIT File

Create

Bitstream

Timing Model Gen

Gate/Primitive Netlist

Files (XNF or EDN)

Standard Delay

Format File

HDL or EDIF for

Implemented Design

XILINX DESIGN

MANAGER

Page 13: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.13 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL/core Design Flow- Timing Simulation

Test Inputs,

Force Files

MODELSIM

Compile HDL Files

Waveforms

or List Files

Set Up and Map

work Directory

Compiled HDL

HDL Simulate

Standard Delay Format FileHDL or EDIF for

Implemented Design

Testbench HDL Files

Page 14: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.14 © Θεοχαρίδης, ΗΜΥ, 2020

Xilinx HDL Design Flow - Programming and In-circuit Verification

Bit File

FPGA Board

iMPACT

I/O Port

Input Byte

Human Inputs

Outputs

Page 15: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.15© Θεοχαρίδης, ΗΜΥ, 2020

A Few Notes on Programming: Start up Sequence

° During an FPGA start-up, the device performs four operations:

1. The assertion of DONE signal. The failure of DONE to go High may indicate the unsuccessful loading of configuration data.

2. The release of the Global Three State (GTS) signal. This activates all the I/Os.

3. The release of the Global Set Reset (GSR) signal. This allows all flip-flops to change state.

4. The assertion of Global Write Enable (GWE) signal. This allows all RAMs and flip-flops to change state.

° By default, these operations are synchronized to the CCLK signal.

° The entire start-up sequence lasts eight cycles, called C0-C7, after which the loaded design is fully functional.

Page 16: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.16© Θεοχαρίδης, ΗΜΥ, 2020

Serial Load Configuration

° There are two serial configuration modes.

° Master Serial mode

• the FPGA controls the configuration process by driving CCLK as an output.

° Slave Serial mode

• the FPGA passively receives CCLK as an input from an external agent (e.g., a microprocessor, CPLD, or second FPGA in master mode) that is controlling the configuration process.

° In both modes, the FPGA is configured by loading one bit per CCLK cycle.

° The MSB of each configuration data byte is always written to the DIN pin first.

Page 17: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.17© Θεοχαρίδης, ΗΜΥ, 2020

ASIC Design Flow

°ASIC

• Application Specific Integrated Circuits

• Custom design, usually from scratch or from pre-built components

• Chip performs a particular function

• Typically NOT general purpose

°Front End ➔Back End

• Front End – Synthesis / Gate Level

• Back End – Layout / Mask Generation

Page 18: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.18© Θεοχαρίδης, ΗΜΥ, 2020

ASIC Design Flow – Typical flow

° ASIC Design Flow Steps

• Specifications

• Early Planning

• Architecture

• Design

• Synthesis

• Pre-Layout Static Timing Analysis

• Layout

• Post-Layout Static Timing Analysis

• Pads Placement

• Sent for Manufacturing

VERIFICATION

Page 19: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.19© Θεοχαρίδης, ΗΜΥ, 2020

Design Flow – Commercial (Example Tools)

Synopsys Design Compiler

Modelsim

Prime Time

Cadence Silicon Ensemble

Silicon Ensemble/Virtuoso

HDL Model

Verilog Gate Level / Netlist

Verilog Gate Level / Netlist

DEF File

DEF File

DEF File

VHDL / Verilog

Verilog Simulation

Static Timing Analysis

Standard Cell Placement and Routing

Post-Layout Static Timing Analysis

Pads Placement

Prime Time

RTL (Register Transfer Language)

Page 20: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.20© Θεοχαρίδης, ΗΜΥ, 2020

Other Commercial Tools

° RTL Verification with Specman e

° Gate-level simulation with ModelSim

° Logic Synthesis with Synopsys Design Compiler

° Static Timing Analysis with Synopsys PrimeTime

° Placement and Routing with Cadence Silicon Ensemble

° Running Silicon Ensemble in the GUI mode

° Clock Tree Generation with Cadence CTGen

° Integrating IP Block, DesignWare and Virage SRAM

° Power Estimation with Synopsys Power Compiler

° Code Revision Control with CVS

Page 21: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.21© Θεοχαρίδης, ΗΜΥ, 2020

° Must understand specifications first

° Start by looking it as black box

° e.g. Adder

• F(X,Y) = X+Y

• Takes two inputs, produces Sum of Inputs

Starting A Design

X

YF(X,Y)

Page 22: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.22© Θεοχαρίδης, ΗΜΥ, 2020

Starting A Design

° SPECS → Architecture

• Block Diagram

• Brainstorming (if collaborating)

• Feedback

• I/O Specs

• Architectural Decisions

- Frequency?

- Latency?

- Power/Performance?

- Reliability?

• Architectural Optimizations

• Finalizing the initial Design

Page 23: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.23© Θεοχαρίδης, ΗΜΥ, 2020

From “Architecture” to RTL

° Create Block Diagram of Design - with sub-blocks if necessary

° Create I/O Specs for each block

• e.g adder

- Sum generator

– Takes three inputs, produces one output

- Carry generator

– Takes three inputs, produces one output

- Interconnected?

• Place box in functional order

- i.e. can’t generate sum after carry-in arrives!!!

• Create pipeline flow

- i.e. IF →ID→IX→IC→WB

• Clocked signals/registers/latches

° Proceed then to code module by module

Page 24: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.24© Θεοχαρίδης, ΗΜΥ, 2020

Hierarchical Design

•Multiple modules

•Multiple instances

•Top-Level Design

•Contains all sub-modules

and connection information

•Sub-Modules can be

hierarchically built themselves

Page 25: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.25© Θεοχαρίδης, ΗΜΥ, 2020

HDL

° Hardware Description Language

• Verilog, VHDL, SystemC, etc.

° High Level of Design Abstraction

• ex:

- Input A, B

- Output C

- Architecture entity of adder is

C A + B

° Not going to talk in depth about HDL

• Refer to multiple online resources

- www.deeps.org

• Behavioral vs. Structural

• Code →Simulate or Code→ Synthesize (Compile) → Simulate

Page 26: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.26© Θεοχαρίδης, ΗΜΥ, 2020

HDL - Tools

° Code programming

• Just a text editor!

• Today, fancy text editors with syntax highlighting are available for free (emacs, nedit, etc.)

° Simulation

• Multiple free HDL Simulators for simple designs

• State-of-the-art Simulators available at CSE

- Modelsim

- NCVHDL

- NCVerilog

• Not necessary synthesized code

° Synthesis (Compilation)

• Neet a target library of “standard” cells (i.e. AND, XOR, ADDER, etc.)

• Synopsys Design Compiler

Page 27: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.27© Θεοχαρίδης, ΗΜΥ, 2020

HDL Simulation / Verification

° Upon coding each block / module, we can then simulate its functionality

° Use an HDL / RTL Simulator

• Event Driven

• Cycle Driven

° Simulator reads code and models code functionality based on clock cycles or events, e.g.

• CA+B @posedge clk

• CA+B after 10 ns

° Tools Available:

• Modelsim, NCVHDL, NCVerilog, etc.

Page 28: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.28© Θεοχαρίδης, ΗΜΥ, 2020

HDL Synthesis ➔ H/W

Timing Analysis

Routing

Placement

Synthesis

Page 29: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.29© Θεοχαρίδης, ΗΜΥ, 2020

Why learning about Logic Synthesis?

° Logic synthesis is the core of today's CAD flows for IC and system design

• course covers many algorithms that are used in a broad range of CAD tools

• basis for other optimization techniques, e.g. embedded software

• basis for functional verification techniques

° Most algorithms are computationally hard

• covered algorithms and flows are good example for approaching hard algorithmic problems

• course covers theory as well as implementation details

• demonstrates an engineering approaches based on theoretical solid but also practical solutions

- very few research areas can offer this combination

Page 30: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.30© Θεοχαρίδης, ΗΜΥ, 2020

Design of Integrated Systems

System Level

Register Transfer Level

Gate Level

Transistor Level

Layout Level

Mask Level

Page 31: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.31© Θεοχαρίδης, ΗΜΥ, 2020

System Level

° Abstract algorithmic description of high-level behavior

• e.g. C-Programming language

• abstract because it does not contain any implementation details for timing or data

• efficient to get a compact execution model as first design draft

• difficult to maintain throughout project because no link to implementation

Port*

compute_optimal_route_for_packet(Packet_t *packet,

Channel_t *channel)

{

static Queue_t *packet_queue;

packet_queue = add_packet(packet_queue, packet);

...

}

Page 32: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.32© Θεοχαρίδης, ΗΜΥ, 2020

RTL Level

° Cycle accurate model “close” to the hardware implementation

• bit-vector data types and operations as abstraction from bit-level implementation

• sequential constructs (e.g. if - then - else, while loops) to support modeling of complex control flow

module mark1;

reg [31:0] m[0:8192];

reg [12:0] pc;

reg [31:0] acc;

reg[15:0] ir;

always

begin

ir = m[pc];

if(ir[15:13] == 3b’000)

pc = m[ir[12:0]];

else if (ir[15:13] == 3’b010)

acc = -m[ir[12:0]];

...

end

endmodule

Page 33: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.33© Θεοχαρίδης, ΗΜΥ, 2020

Gate Level

° Model on finite-state machine level

• models function in Boolean logic using registers and gates

• various delay models for gates and wires

• in this lecture we will mostly deal with gate level

1ns

4ns3ns

5ns

Page 34: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.34© Θεοχαρίδης, ΗΜΥ, 2020

Transistor Level

° Model on CMOS transistor level

• depending on application function modeled as resistive switches

- used in functional equivalence checking

• or full differential equations for circuit simulation

- used in detailed timing analysis

Page 35: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.35© Θεοχαρίδης, ΗΜΥ, 2020

Layout Level

° Transistors and wires are laid out as polygons in different technology layers such as diffusion, poly-silicon, metal, etc.

Page 36: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.36© Θεοχαρίδης, ΗΜΥ, 2020

Design of Integrated Systems

Re

lative

Effo

rt

Project Time

System

RTL

Logic

- Design phases overlap to large degrees

- Parallel changes on multiple levels, multiple teams

- Tight scheduling constraints for product

Transistor

Page 37: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.37© Θεοχαρίδης, ΗΜΥ, 2020

Design Challenges

° Systems are becoming huge, design schedules are getting tighter

• > 100 Mio gates becoming common for ASICs

• > 0.4 Mio lines of C-code to describe system behavior

• > 5 Mio lines of RLT code

° Design teams are getting very large for big projects

• several hundred people

• differences in skills

• concurrent work on multiple levels

• management of design complexity and communication very difficult

° Design tools are becoming more complex but still inadequate

• typical designer has to run ~50 tools on each component

• tools have lots of bugs, interfaces do not line up etc.

Page 38: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.38© Θεοχαρίδης, ΗΜΥ, 2020

Design Challenges

° Decision about design point very difficult

• compromise between performance / costs / time-to-market

• decision has to be made 2-3 years before design finished

• design points are difficult to predict without actually doing the design

• scheduling of product cycles

° Functional verification

• simulation still main vehicle for functional verification but inadequate because of size of design space

• results in bugs in released hardware that is very expensive to recover from (different in software ;-)

Page 39: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.39© Θεοχαρίδης, ΗΜΥ, 2020

Design Challenges

° Fundamental tradeoffs between different modeling levels:

• modeling detail and team size to maintain model

- high-level models can be maintained by one or two people

- detailed models need to be partitioned which results in a significant communication overhead

• modeling accuracy versus modeling compactness

- compact models omit details and give only crude estimations for implementation

- detailed models are lengthy and difficult to adopt for major changes in design points

• simulation speed versus hardware performance

- high-level models can be simulated fast but cannot be implemented efficiently with automatic means

- low-level models can be made to have a fast implementation but cannot be simulated very fast

Page 40: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.40© Θεοχαρίδης, ΗΜΥ, 2020

General Design Approach

° How do engineers build a bridge?

° Divide and conquer !!!!• partition design problem into many sub-problems which are manageable

• define mathematical model for sub-problem and find an algorithmic solution

- beware of model limitations and check them !!!!!!!

• implement algorithm in individual design tools, define and implement general interfaces between the tools

• implement checking tools for boundary conditions

• concatenate design tools to general design flows which can be managed

• see what doesn’t work and start over

Page 41: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.41© Θεοχαρίδης, ΗΜΥ, 2020

Design Automation

° Design Automation is one of the most advanced areas in practical computer science

• many problems require sophisticated mathematical modeling

• many algorithms are computationally hard and require advanced and fine-tuned heuristics to work on realistic problem sizes

• boundary conditions need to be well declared and synchronized between different tools (patchwork to cover all wholes)

° Two common pitfalls in CAD research

• problem is looking for a solution:

- problem scope is too big, makes modeling difficult or algorithms don’t scale

- problem scope is too small, solutions are not good enough

• solution is looking for a problem:

- model was oversimplified because real problem was too complex with too many boundary conditions

Page 42: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.42© Θεοχαρίδης, ΗΜΥ, 2020

Key to Success

° Fine-tuned combination of Design Methodology and Tools

• addresses algorithmic complexity by requiring

- manual partitioning of the problem

- manual input of hints/suggestions

- manual iterations to drive tool application to best solution

• makes CAD systems and design flows very complex and difficult to manage

Problem space Tools applicable

Practical combination through design methodology

Page 43: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.43© Θεοχαρίδης, ΗΜΥ, 2020

What is Logic Synthesis?

D

X Y

Given: Finite-State Machine F(X,Y,Z, , ) where:

X: Input alphabet

Y: Output alphabet

Z: Set of internal states

: X x Z Z (next state function)

: X x Z Y (output function)

Target: Circuit C(G, W) where:

G: set of circuit components g {Boolean gates,

flip-flops, etc}

W: set of wires connecting G

Page 44: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.44© Θεοχαρίδης, ΗΜΥ, 2020

Objective Function for Synthesis

° Minimize area

• in terms of literal count, cell count, register count, etc.

° Minimize power

• in terms of switching activity in individual gates, deactivated circuit blocks, etc.

° Maximize performance

• in terms of maximal clock frequency of synchronous systems, throughput for asynchronous systems

° Any combination of the above

• combined with different weights

• formulated as a constraint problem

- “minimize area for a clock speed > 300MHz”

° More global objectives

• feedback from layout

- actual physical sizes, delays, placement and routing

Page 45: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.45© Θεοχαρίδης, ΗΜΥ, 2020

Constraints on Synthesis

° Given implementation style:

• two-level implementation (PLA, CAMs)

• multi-level logic

• FPGAs

° Given performance requirements

• minimal clock speed requirement

• minimal latency, throughput

° Given cell library

• set of cells in standard cell library

• fan-out constraints (maximum number of gates connected to another gate)

• cell generators

Page 46: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.46© Θεοχαρίδης, ΗΜΥ, 2020

Why learn HDL coding styles for FPGAs?

° HDLs contain many complex constructs that are difficult to understand at first.

° Methods and examples included in HDL manuals do not always apply to the design of FPGA devices.

° If you currently use HDLs to design ASICs, your established coding style may unnecessarily increase the number of gates or CLB levels in FPGA designs

° HDL synthesis tools implement logic based on the coding style of your design.

Page 47: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.47© Θεοχαρίδης, ΗΜΥ, 2020

Naming Convention - Restrictions

° The following FPGA resource names are reserved and should not be used to name nets or components.

• Components (Comps), Configurable Logic Blocks (CLBs), Input/Output Blocks (IOBs), Slices, basic elements (bels), clock buffers (BUFGs), tristate buffers (BUFTs), oscillators (OSC), CCLK, DP, GND, VCC, and RST

• CLB names such as AA, AB, SLICE_R1C2, SLICE_X1Y2, X1Y2, and R1C2

• Primitive names such as TD0, BSCAN, M0, M1, M2, or STARTUP

• Do not use pin names such as P1 and A4 for component names

• Do not use pad names such as PAD1 for component names

Page 48: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.48© Θεοχαρίδης, ΗΜΥ, 2020

Use optional labels on flow control constructs

° Make the code structure more obvious

° Can slow execution in some simulators

/* Changing Latch into a D-Register

* D_REGISTER.V */

module d_register (CLK, DATA, Q);

input CLK;

input DATA;

output Q;

reg Q;

always @ (posedge CLK)

begin: My_D_Reg

Q <= DATA;

end

endmodule

Page 49: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.49© Θεοχαρίδης, ΗΜΥ, 2020

Coding for Synthesis

° Omit the Wait for XX ns Statement

• XX specifies the number of nanoseconds that must pass before a condition is executed.

• VHDL: wait for XX ns;

• Verilog: #XX;

° Omit the ...After XX ns or Delay Statement

• VHDL

(Q <=0 after XX ns)

• Verilog

assign #XX Q=0;

• This statement is usually ignored by the synthesis tool. In this case, the functionality of the simulated design does not match the functionality of the synthesized design.

Page 50: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.50© Θεοχαρίδης, ΗΜΥ, 2020

Coding for Synthesis

° Omit Initial Values

• VHDL

signal sum : integer := 0;

• Verilog

initial sum = 1’b0;

° Order and Group Arithmetic Functions

• ADD = A1 + A2 + A3 + A4;

cascades three adders in series.

• ADD = (A1 + A2) + (A3 + A4);

two additions are evaluated in parallel and the results are combined with a third adder.

• RTL simulation results are the same for both statements,

• however, the second statement results in a faster circuit after synthesis (depending on the bit width of the input signals).

• When is second construct preferred ?

Page 51: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.51© Θεοχαρίδης, ΗΜΥ, 2020

Coding for Synthesis

For example, if the A4 signal reaches the adder later than the other signals, the first statement produces a faster implementation because the cascaded structure creates fewer logic levels for A4.

This structure allows A4 to catch up to the other signals. In this case, A1 is the fastest signal followed by A2 and A3; A4 is the slowest signal.

° Most synthesis tools can balance or restructure the arithmetic operator tree if timing constraints require it.

° However, Xilinx® recommends that you code your design for your selected structure.

Page 52: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.52© Θεοχαρίδης, ΗΜΥ, 2020

Comparing If Statement vs.Case Statement

° If statement generally produces priority-encoded logic

° Case statement generally creates balanced logic.

° Use the Case statement for complex decoding and use the If statement for speed critical paths.

° Make sure that all outputs are defined in all branches of an if statement.

• If not, it can create latches or long equations on the CE signal.

• Have default values for all outputs before the if statements.

° Limiting the number of input signals into an if statement can reduce the number of logic levels.

° If there are a large number of input signals, see if some of them can be pre-decoded and registered before the if statement.

° Avoid bringing the dataflow into a complex if statement.

° Only control signals should be generated in complex if-else statements.

Page 53: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.55© Θεοχαρίδης, ΗΜΥ, 2020

Implementation

Page 54: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.56© Θεοχαρίδης, ΗΜΥ, 2020

Case vs IF

° Case implementation requires only one Virtex™ slice while the If construct requires two slices in some synthesis tools.

° In this case, design the multiplexer using the Case construct because fewer resources are used and the delay path is shorter.

Page 55: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.57© Θεοχαρίδης, ΗΜΥ, 2020

From IF construct

Page 56: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.58© Θεοχαρίδης, ΗΜΥ, 2020

From Case construct

Page 57: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.59© Θεοχαρίδης, ΗΜΥ, 2020

Implementing Latches and Registers

° Synthesizers infer latches from incomplete conditional expressions, such as an If statement without an Else clause.

° This can be problematic for FPGA designs because not all FPGA devices have latches available in the CLBs.

° In addition, you may think that a register is created, and the synthesis tool actually created a latch.

° For these devices, synthesizers infer a dedicated latch from incomplete conditional expressions.

Page 58: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.60© Θεοχαρίδης, ΗΜΥ, 2020

D Latch

module d_latch (GATE, DATA, Q);

input GATE;

input DATA;

output Q;

reg Q;

always @ (GATE or DATA)

begin

if (GATE == 1'b1)

Q <= DATA;

end // End Latch

endmodule

Page 59: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.61© Θεοχαρίδης, ΗΜΥ, 2020

D register

module d_register (CLK, DATA, Q);

input CLK;

input DATA;

output Q;

reg Q;

always @ (posedge CLK)

begin: My_D_Reg

Q <= DATA;

end

endmodule

Page 60: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.62© Θεοχαρίδης, ΗΜΥ, 2020

How to handle latches?

° With some synthesis tools you can determine the number of latches that are implemented in your design.

° You should convert all If statements without corresponding Else statements and without a clock edge to registers.

° Use the recommended register coding styles in the synthesis tool documentation to complete this conversion.

Page 61: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.63© Θεοχαρίδης, ΗΜΥ, 2020

Resource Sharing

° Resource sharing is an optimization technique that uses a single functional block (such as an adder or comparator) to implement several operators in the HDL code.

° Use resource sharing to improve design performance by reducing the gate count and the routing congestion.

° If you do not use resource sharing, each HDL operation is built with separate circuitry.

° However, you may want to disable resource sharing for speed critical paths in your design.

Page 62: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.64© Θεοχαρίδης, ΗΜΥ, 2020

Resource Sharing

module res_sharing (A1, B1, C1, D1, COND_1, Z1);

input COND_1;

input [7:0] A1, B1, C1, D1;

output [7:0] Z1;

reg [7:0] Z1;

always @(A1 or B1 or C1 or D1 or COND_1)

begin

if (COND_1)

Z1 <= A1 + B1;

else

Z1 <= C1 + D1;

end

endmodule

Page 63: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.65© Θεοχαρίδης, ΗΜΥ, 2020

With and Without Resource Sharing

Page 64: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.66© Θεοχαρίδης, ΗΜΥ, 2020

Resource Sharing

° The following operators can be shared either with instances of the same operator or with an operator on the same line.

• *

• + –

• > >= < <=

° For example, a + operator can be shared with instances of other + operators or with – operators.

° A * operator can be shared only with other * operators.

Page 65: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.67© Θεοχαρίδης, ΗΜΥ, 2020

Resource Sharing

° You can implement arithmetic functions (+, –, magnitude comparators) with gates or with your synthesis tool’s module library.

° The library functions use modules that take advantage of the carry logic in CLBs/slices.

° Resource sharing of the module library automatically occurs in most synthesis tools if the arithmetic functions are in the same process.

° Resource sharing adds additional logic levels to multiplex the inputs to implement more than one function.

• Do not use it for arithmetic functions that are part of your design’s time critical path.

Page 66: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.68© Θεοχαρίδης, ΗΜΥ, 2020

Using Preset Pin or Clear Pin

° Xilinx® FPGA devices consist of CLBs that contain function generators and flip-flops. Spartan-II™, Spartan-3™ , Virtex™, Virtex-E™, Virtex-II™, Virtex-II Pro™ and Virtex-II Pro X™ registers can be configured to have either or both preset and clear pins.

Page 67: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.69© Θεοχαρίδης, ΗΜΥ, 2020

FlipFlop

module ff_example( RESET, SET, CLOCK, ENABLE;D_IN;A_Q_OUT; B_Q_OUT; C_Q_OUT; D_Q_OUT; E_Q_OUT);

input RESET;

input SET;

input CLOCK;

input ENABLE;

input [7:0] D_IN;

output [7:0] A_Q_OUT;

output [7:0] B_Q_OUT;

output [7:0] C_Q_OUT;

output [7:0] D_Q_OUT;

output [7:0] E_Q_OUT;

// D flip-flop

always @(posedge CLOCK)

begin

A_Q_OUT <= D_IN;

end // End FF

Page 68: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.70© Θεοχαρίδης, ΗΜΥ, 2020

Asynchronous Reset

always @(posedge CLOCK || posedge RESET)

begin

if (RESET == 1'b1)

B_Q_OUT <= “00000000”;

else if (CLOCK == 1'b1)

B_Q_OUT <= D_IN;

end

Page 69: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.71© Θεοχαρίδης, ΗΜΥ, 2020

Asynchronous Set

always @(posedge CLOCK || posedge SET)

begin

if (SET == 1'b1)

C_Q_OUT <= “11111111”;

else if (CLOCK == 1'b1)

C_Q_OUT <= D_IN;

end

Page 70: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.72© Θεοχαρίδης, ΗΜΥ, 2020

What is this ?

always @(posedge CLOCK || posedge RESET)

begin

if (RESET == 1'b1)

D_Q_OUT <= “00000000”;

else if (CLOCK == 1'b1)

begin

if (ENABLE == 1'b1)

D_Q_OUT <= D_IN;

end

end

Page 71: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.73© Θεοχαρίδης, ΗΜΥ, 2020

Answer

° Flip-flop with asynchronous reset and clock enable

Page 72: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.74© Θεοχαρίδης, ΗΜΥ, 2020

Flip-flop with asynchronous reset; asynchronous set andclock enable

always @(posedge CLOCK || posedge RESET || posedge SET)

begin

if (RESET == 1'b1)

E_Q_OUT <= "00000000";

else if (SET == 1'b1)

E_Q_OUT <= "11111111";

else if (CLOCK == 1'b1)

begin

if (ENABLE == 1'b1)

E_Q_OUT <= D_IN;

end

end

Page 73: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.75© Θεοχαρίδης, ΗΜΥ, 2020

Using Clock Enable Pin Instead of Gated Clocks

° Use the CLB clock enable pin instead of gated clocks in your designs. Gated clocks can introduce glitches, increased clock delay, clock skew, and other undesirable effects

Page 74: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.76© Θεοχαρίδης, ΗΜΥ, 2020

Gated Clock

module gate_clock(IN1, IN2, DATA, CLK,LOAD,OUT1);

input IN1;

input IN2;

input DATA;

input CLK;

input LOAD;

output OUT1;

reg OUT1;

wire GATECLK;

assign GATECLK = (IN1 & IN2 & CLK);

always @(posedge GATECLK)

begin

if (LOAD == 1'b1)

OUT1 <= DATA;

end

endmodule

Page 75: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.77© Θεοχαρίδης, ΗΜΥ, 2020

Gated Clock

BAD IDEA !!

Page 76: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.78© Θεοχαρίδης, ΗΜΥ, 2020

Clock Enable

module clock_enable (IN1, IN2, DATA, CLK, LOAD, DOUT);

input IN1, IN2, DATA;

input CLK, LOAD;

output DOUT;

wire ENABLE;

reg DOUT;

assign ENABLE = IN1 & IN2 & LOAD;

always @(posedge CLK)

begin

if (ENABLE)

DOUT <= DATA;

end

endmodule

Page 77: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.79© Θεοχαρίδης, ΗΜΥ, 2020

Clock Enable

Page 78: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.80© Θεοχαρίδης, ΗΜΥ, 2020

PLACEMENT AND ROUTING

Post-Synthesis Implementation

Page 79: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.81© Θεοχαρίδης, ΗΜΥ, 2020

Placement and routing

Two critical phases of layout design:

– placement of components on the chip;

– routing of wires between components.

Placement and routing interact, but separating layout design

into phases helps us understand the problem and find good

solutions.

Page 80: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.82 © Θεοχαρίδης, ΗΜΥ, 2020

Placement metrics

° Quality metrics for layout:

• Area

• Delay

• Energy consumption

° Ideally placement and routing would be performed together

• Both problems are NP-hard

• For practical considerations placement and routing must be performed separately

° Design time may be important for FPGAs

Page 81: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.83 © Θεοχαρίδης, ΗΜΥ, 2020

Wire length as a quality metric

bad placement good placement

Page 82: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.84 © Θεοχαρίδης, ΗΜΥ, 2020

Wire length measures

Estimate wire length by distance between components.

Possible distance measures:

– Euclidean distance (sqrt(x2 + y2));

– Manhattan distance (x + y).

Multi-point nets must be broken up into trees for good estimates.

Euclidean

Manhattan

Page 83: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.85 © Θεοχαρίδης, ΗΜΥ, 2020

Placement techniques

Can construct an initial solution, improve an existing

solution.

Pairwise interchange is a simple improvement metric:

– Interchange a pair, keep the swap if it helps wire length.

– Heuristic determines which two components to swap.

Page 84: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.86 © Θεοχαρίδης, ΗΜΥ, 2020

Before Placement: Clustering

° Need to group BLEs into groups

° Goals:

• Minimize number of clusters

• Minimize inter-cluster wiring

• Minimize critical path (timing-driven)

° How do we do this

• Take advantage of cluster architecture

Page 85: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.87 © Θεοχαρίδης, ΗΜΥ, 2020

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

4

6

6

5

5

7

4

netlist with delay for

each gate

Timing Analysis

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

4

6

6

5

5

7

4

arrival times

0

0

0

1

3

1

7

9

7

7

13

15

14

18

22

18

Source: David Pan

Page 86: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.88 © Θεοχαρίδης, ΗΜΥ, 2020

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

4

6

6

5

5

7

4

arrival time/required time

0/4

0/0

0/8

1/5

3/3

1/9

7/9

9/9

7/15

7/13

13/15

15/15

14/18

18/22

22/22

18/22

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

4

6

6

5

5

7

4

slack = required time -

arrival time

4

0

8

4

0

8

2

0

8

6

2

0

4

4

0

4

Timing Analysis

Page 87: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.89 © Θεοχαρίδης, ΗΜΥ, 2020

Example with interconnect delay

5 5 5

4 4 4

2

FF

FF

3 2 1 1

2 1 3 2

1

22

19

Page 88: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.90 © Θεοχαρίδης, ΗΜΥ, 2020

Placement

• Placement has a set of competing goals.

• Can’t optimize locally and globally simultaneously.

• Use heuristic approaches to evaluate quality.

C D F

AB

E1 2

LUT1 LUT2ABCD E

Page 89: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.91 © Θεοχαρίδης, ΗΜΥ, 2020

Placement Algorithms

• Constructive methods: begin from netlist and generate an initial placement.

- Partitioning methods: mincut and Kernighan-Lin methods

- Clustering

• Iterative improvement

- Begin with random or constructive placement.

- Iterate to improve it.

- Hill-climbing

Page 90: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.92 © Θεοχαρίδης, ΗΜΥ, 2020

Timing-driven Placement

• Take both wire length and critical path into account

• Problem

- Critical path changes as I move blocks

- How do I balance the two objectives

• How do we go about modeling routing delay during placement?

Page 91: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.93 © Θεοχαρίδης, ΗΜΥ, 2020

Determining Criticality

• Same basic approach as used for clustering criticality

• For each (i, j) connection from source i and sink j

- Determine arrival times (pre-order BFS)

- Determine required arrival times (post-order BFS)

- Determine slack -> required_arrival_time –arrival_time

- Criticality(i, j) = [1- slack(i, j)]/ (Max slack)

What is the purpose of the criticality exponent?

Page 92: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.94 © Θεοχαρίδης, ΗΜΥ, 2020

Balancing Wiring and Timing Cost

• Need to determine relative changes in timing and wiring based on moves

• Idea: Use relative changes from previous calculation

- Both values less than 1

- Helps balance effect based on scaling parameter

This still doesn’t help address changes in delay

Page 93: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.95 © Θεοχαρίδης, ΗΜΥ, 2020

Routing

• Problem

▪ Given a placement, and a fixed number of metal layers, find a valid pattern of horizontal and vertical wires that connect the terminals of the nets

▪ Levels of abstraction:

o Global routing

o Detailed routing

• Objectives

▪ Cost components:

o Area (channel width) – min congestion in prev levels helped

o Wire delays – timing minimization in previous levels

o Number of layers (less layers → less expensive)

o Additional cost components: number of bends, vias

Page 94: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.96 © Θεοχαρίδης, ΗΜΥ, 2020

Metal layer 1

Via

Routing Anatomy

Topview

3Dview

Metal layer 2

Metal layer 3

Sym

bolic

Layout

Note: Colors usedin this slide are notstandard

Page 95: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.97 © Θεοχαρίδης, ΗΜΥ, 2020

Global vs. Detailed Routing

• Global routing▪ Input: detailed placement, with exact terminal

locations

▪ Determine “channel” (routing region) for each net

▪ Objective: minimize area (congestion), and timing (approximate)

• Detailed routing▪ Input: channels and approximate routing from

the global routing phase

▪ Determine the exact route and layers for each net

▪ Objective: valid routing, minimize area (congestion), meet timing constraints

▪ Additional objectives: min via, powerFigs. [©Sherwani]

Page 96: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.98 © Θεοχαρίδης, ΗΜΥ, 2020

Channel graph

LE LE

LE LE

channel channel

channel

channel

channel channel

channel

channelchannel

channel

channel channelswitch

box

switch

box

switch

box

switch

box

switch

box

switch

box

switch

box

switch

box

switch

box

Page 97: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.99 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA Programmable Switch Elements

• Used in connecting:

▪ The I/O of functional units to the wires

▪ A horizontal wire to a vertical wire

▪ Two wire segments to form a longer wire segment

Page 98: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.100 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA Routing Channels Architecture

• Note: fixed channel widths (tracks)

• Should “predict” all possible connectivity requirements when designing the FPGA chip

• Channel -> track -> segment

• Segment length?

▪ Long: carry the signal longer,less “concatenation” switches, but might waste track

▪ Short: local connections, slow for longer connections

channeltrack

segment

Page 99: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.101 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA Switch Boxes

• Ideally, provide switches for all possible connections

• Trade-off:

▪ Too many switches:

o Large area

o Complex to program

▪ Too few switches:

o Cannot route signals

Xilinx 4000

One possiblesolution

Page 100: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.102 © Θεοχαρίδης, ΗΜΥ, 2020

18Kb

BRAM

CAM

Multiplier

BLVDS

Backplane

PCI-X

DDR

DDR

DDR

CAM

QDR

SRAM

DDR

SDRAMDistri

RAM

LVDS

Shift

Registers

DCM

FIFO

PCI

SONET / SDH

Virtex II Architecture

Page 101: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.103 © Θεοχαρίδης, ΗΜΥ, 2020

Virtex II Clock Distribution

Page 102: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.104 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA Routing

• Routing resources pre-fabricated▪ 100% routability using existing channels

▪ If fail to route all nets, redo placement

• FPGA architectural issues▪ Careful balance between number of logic blocks and routing resources (100% logic area

utilization?)

▪ Designing flexible switchboxes and channels(conflicts with high clock speeds)

• FPGA routing algorithms▪ Graph search algorithms

o Convert the wire segments to graph nodes, and switch elements to edges

▪ Bin packing heuristics (nets as objects, tracks as bins)

▪ Combination of maze routing and graph search algorithms

Page 103: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.105 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA issues

Often want a fast answer. May be willing to accept lower

quality result for less place/route time.

May be interested in knowing wirability without needing

the final configuration.

Fast placement: constructive placement, iterative

improvement through simulated annealing.

Page 104: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.106 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA routing

Finding a route into given interconnection network.

Global routing assigns to channels.

Local routing selects the programming points used to make

the connections.

Page 105: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

FPGA-Based System Design: Chapter 4 Copyright © 2004 Prentice Hall PTR

ΗΜΥ408 Δ06-7 Design Flow.107 © Θεοχαρίδης, ΗΜΥ, 2020

FPGA routing techniques

Nair: route based on congestion, not distance. Route in two

passes:

– Estimate congestion.

– Final routing.

Triptych: more gradual penalty for congestion.

Page 106: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.108 © Θεοχαρίδης, ΗΜΥ, 2020

Routing Connections

Based on the switch and wire parasitic, interconnect routes can be

modeled as RC networks.

S S

Other issues:Power

Routability

Page 107: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.109 © Θεοχαρίδης, ΗΜΥ, 2020

Timing-Driven Routing

• Add delay cost component to routing.

• Represent delay along path as RC chain. Buffering important here.

• Note that timing driven routing selects most distant point for first route.

- Sets upper bound on delay.

• Need for combined breadth-first congestion and timing-driven route.

Page 108: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.110 © Θεοχαρίδης, ΗΜΥ, 2020

Timing-Driven Routing

• Difficult to estimate remaining timing along a path

• Difficult to balance costs for each critical net

• Some routers attempt to “look-ahead” to anticipate congested or time-critical areas

• Optimal approaches have generally failed.

Page 109: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.111 © Θεοχαρίδης, ΗΜΥ, 2020

Combined Placement and Routing

• Used depth-first route to select initial connections

• Swap blocks and rip up attached nets

• Bias nets that span the bulk of device onto long-line resources.

• Took 16X longer than place and route

- 8% to 15% improvement.

Page 110: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.112 © Θεοχαρίδης, ΗΜΥ, 2020

Optimizing your FPGA design

° Pinout and Area Constraints Editor (PACE)

° Implementation (Mapping, Placing, Routing)

• Constraints Editor

• Text Editor (HDL source)

• Floorplanner -- Placement

• FPGA Editor – Routing

° Timing Constraints

• Xilinx Constraints Editor

Page 111: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.113 © Θεοχαρίδης, ΗΜΥ, 2020

VHDL based synthesis

Page 112: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.114 © Θεοχαρίδης, ΗΜΥ, 2020

VHDL code

architecture RTL1 of RESOURCE is

begin

seq : process (RSTn, CLOCK)

begin

if (RSTn = '0') then

DOUT <= (others => '0');

elsif (CLOCK'event and CLOCK = '1') then

case SEL is

when "00" => DOUT <= unsigned(A) - 1;

when "01" => DOUT <= unsigned(B) - 1;

when "10" => DOUT <= unsigned(C) - 1;

when others => DOUT <= unsigned(D) - 1;

end case;

end if;

end process;

end RTL1;

Page 113: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.115 © Θεοχαρίδης, ΗΜΥ, 2020

Synthesized

schematic

for RTL1 of

resource

delay 57 ns

area 65

number of

flip-flops 16

Page 114: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.116 © Θεοχαρίδης, ΗΜΥ, 2020

4-bit Shift Register

Page 115: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.117 © Θεοχαρίδης, ΗΜΥ, 2020

4-bit Shift Register

Page 116: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.118 © Θεοχαρίδης, ΗΜΥ, 2020

HDL: Design Verification

HDL

Synthesis

Implementation

Download

HDL

Implement your

design using

VHDL or Verilog

Functional

Simulation

Timing

Simulation

In-Circuit

Verification

Behavioral

Simulation

Page 117: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.119 © Θεοχαρίδης, ΗΜΥ, 2020

Behavioral

Simulation

Synthesis: Design Verification

HDL

Synthesis

Implementation

Download

HDL

Synthesize the

design to create

an FPGA netlist

Functional

Simulation

Timing

Simulation

In-Circuit

Verification

Page 118: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.120 © Θεοχαρίδης, ΗΜΥ, 2020

Implementation: Design Verification

Behavioral

SimulationHDL

Synthesis

Implementation

Download

HDL

Translate,

place and

route and

generate a

bitstream to

download in

the FPGA

Functional

Simulation

Timing

Simulation

In-Circuit

Verification

Page 119: ΗΜΥ 408/664 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs Χειμερινό …

ΗΜΥ408 Δ06-7 Design Flow.121 © Θεοχαρίδης, ΗΜΥ, 2020

HDL: Summary

° Full VHDL/Verilog (RTL code)

• Advantages:

- Portability

- Complete control of the design implementation and tradeoffs

- Easier to debug and understand a code that you own

• Disadvantages:

- Can be time consuming

- Don’t always have control over the Synthesis tool

- Need to be familiar with algorithm and how to write it