TBD
description
Transcript of TBD
23.09.2009
Dr. John JonesPrinceton University
TBD
CMSCERN
FERMIELETTRA
The Matrix Card and its Applications
Dr. John Jones ([email protected]) 223.09.2009
Progression in Physics Hardware Over ~20 Years
1995VME
2003VME / ATCA
2008+ATCA / μTCA
SizeDensityConnectivitySpeed
Dr. John Jones ([email protected]) 323.09.2009
The Matrix Processor - Schematic
Xilinx Virtex 5FPGA
(LX110T)
Mindspeed72x72
Cross-pointSwitch
SNAP12
POP4
SNAP12
2GbDDR2
2GbDDR2
OP
TIC
AL
I/O
(16
/16)
3U μ
TC
A I/
O (
20/2
0)
NXP2366μC
Dr. John Jones ([email protected]) 423.09.2009
The Matrix Processor – Top Photo
MTP Optics
FPGA
Mindspeed Switch
Ethernet
Dr. John Jones ([email protected]) 523.09.2009
The Matrix Processor – Bottom Photo
DDR2 SDRAM
NXP Microcontroller
TCA Connector
Dr. John Jones ([email protected]) 623.09.2009
Switched Topology ProcessingThe switch technology makes it topologically agnostic:
This is a critical difference compared to previous systemsAllows the system to be used to solve calculations in various geometries1-to-N data duplication is easy to implement w/ latency ~100psAlso allows real-time redundancy, dynamic switching, etc…
Linear (e.g. batch)
2D / projective 2D (e.g. CMS)3D (e.g. lattice sim.)
4D…
Dr. John Jones ([email protected]) 723.09.2009
Example 1 – CMS Trigger Upgrade
2-phase upgrade of trigger system:
Phase 1 (2011 2016): Replacement of older components, HCAL FE & associated trigger hardwareCalorimeter trigger upgrade
Phase 2 (Some time after...): Installation of upgraded tracker including TP generationIntegration of tracking information into enhanced trigger system
Dr. John Jones ([email protected]) 823.09.2009
GCT (25 / 9)
External link latency (BX)Link speed (Gb/s)Internal processing latency (with internal connections)Internal processing latency (without internal connections)
GT (11 / 6)
RCT (20 / 18*)
1.5
5.5
0.08
3.0
ECAL TCC (17 / 17*) HCAL HTR (? / ?)
4*1.25*
ECAL FE (4.5 / 4.5)
Collision t=0
Detector readout t=131
19*
0.5
0.8
HCAL FE (? / ?)
? 1.6
TTC (2* / 2*)
10.04
19*
51.5 (37)79.556.5
Current CMS Trigger Architecture
Dr. John Jones ([email protected]) 923.09.2009
Current CMS Trigger Architecture
Processing subdivided into eta-phi regions / link (e.g. calorimeter trigger)
2 scaling problems with this approach:Difficult to add new input sources (i.e. improved HCAL, tracking)Data reduction layer doesn’t scale efficiently & balance boundary data
sharing
CAL TPG
RCT
GCT
GTSignificant data reduction
400
20
1η
φ
Dr. John Jones ([email protected]) 1023.09.2009
Current-Revised CMS Trigger Architecture
Revisit calorimeter TPG principle:
CAL TPG
Current
CAL TPGCAL TPG CAL TPG
Revised (time-mulitplexed serialisation)
CAL TPGCAL TPG
RCT CAL TPGCAL TPG ROGCAL TPG CAL TPG
ηφ=1, t=1 ηφ=3, t=3
Dr. John Jones ([email protected]) 1123.09.2009
Data Serialisation in TPG
TPG multiplexes data into BX-serialised streams:
η
φ
t0 t1 t2
Initial cost: lost time due to multiplexingLater gain: Compact, redundant, time-multiplexed system up to GT
Overall latency DECREASES!
t0
t1
t2φ
φ
Dr. John Jones ([email protected]) 1223.09.2009
Current CMS Trigger Architecture
Processing subdivided into eta-phi regions / link (e.g. calorimeter trigger)
CAL TPG
ROG
GT
More compactFasterLower latencyTopological
400
20
1
Region / card increasesEliminates GCT / RCT boundarySpace for additional future dataInter-card data sharing decreases
OR
MUON TPG
Dr. John Jones ([email protected]) 1323.09.2009
Doing the Numbers (Based on Current CT)
Post-TPG link speed ~3.75Gb/s ~8b * 9.375 / BX / fibre16 x serialisation in TPG => ~75 towers (ECAL+HCAL) / BX / fibre
Eliminate phi-boundary (one fibre absorbs entire eta segment!)
Calorimeter dimensions 88 (eta) x 72 (phi) trigger towerse.g. 1 matrix card = 16 (eta) x 72 (phi)
16 input channels => all inputs for jet trigger + overlap in current CMS
10 matrix cards for full-phi-granularity, coarse (4 tower) eta processing (x16 copies)
16 matrix cards for full-tower-granularity processing (x16 copies)
2 fibers => output for results (electrons, photons & jets)
32 input fibres into GT card
Dr. John Jones ([email protected]) 1423.09.2009
Processing Topology – New and Old
φ
η
New Scheme3x3 jet tower finder (full phi resolution)
4x4 calorimeter towers / jet tower3.75Gb/s links
ProcessingFibers
22x18(88x72)
Data sharing – input fiber ratio: 160/88 = 1.82
φ
η
Data sharing – input fiber ratio: ~21888/680 = 32.19
Old scheme – NN sharing6.5Gb/s links
Real input fiber count: 16x88 = ~1408
Real input fiber count: 16x72x88/144 = ~680
Factor of two from link speed – need 6.5Gb/s to use old scheme
Dr. John Jones ([email protected]) 1523.09.2009
Can have a fully-redundant crate (spare fibres from TPG)Redundant power & communicationsImprovements in link speed = reduction in crate size or latencyComplete system test can be achieved with a small setup (e.g. debug)
The Modular Trigger Crate – 3.5Gb/s, Partial Granularity
PW
R2
CM
S A
UX
/DA
Q
PW
R1
MC
H2
MA
TR
IX
MA
TR
IX
MA
TR
IX
MA
TR
IX
MA
TR
IX
MA
TR
IX
MA
TR
IX
MA
TR
IX
MIN
I-T
MC
H1
CLK
DATA
MA
TR
IX
MA
TR
IX
12 8 8 8 8 8 8 8 8 12
20
Dr. John Jones ([email protected]) 1623.09.2009
Example 2 – FERMI, Trieste
4th generation Free Electron Laser (FEL)http://www.elettra.trieste.it/FERMILinear accelerator, VUV-XRAY (10-100nm)Extremely challenging (3GHz) RF control system
Tolerance: 0.1% amplitude, 0.1 degree phase
Precision (~20fs accuracy over 24 hours / 200m distance) RF timing system
Control / diagnostic system for RF cavities will use matrix card and LLRF board
Control system accuracy: 50ps clock resolution, synchronised at 16 stations
This will be achieved without a dedicated timing interface
Dr. John Jones ([email protected]) 1723.09.2009
Timing System Principle
A standard optical fiber has very similar path lengths (~ps) in each direction.
Any change in path length in fiber of a TX/RX pair is matched by the other.
If you have a timing reference at each end of a serial link with guaranteed constant phase relationship between them, you can measure the loop time and use it to measure the propagation delay from the master board (matrix) to the slave (LLRF), and therefore compensate for the delay.
Such guaranteed phase can be achieved by either:
1) A shared reference clock at both ends of a link.2) An extremely accurate OCXO that can be used to track the recovered
serial clock at the receiving end.
Given the available components in a Xilinx FPGA, the loop time can be measured consistently to an accuracy of ~50ps (Xilinx DCM limited).
(N.B. With a few tricks, you can possibly do better)
Dr. John Jones ([email protected]) 1823.09.2009
Round-Trip Phase Compensation, Version I
GTP RX
GTP TX GTP RX
GTP TX
δc
CLKBRIDGE
RFCLK
δR+δFRPCBi
δT+δNTPCBi
δCB1
δT+δFTPCBiδR
+δNRPCBi
DCM
TX
CL
K
TXCLK
CLKBRIDGE
PIPELINEDELAY
LLRFCLK
LLRFDATA
LLRFMATRIX
δCB2
δFL
δFCB
δFDCM
NOTE: Control logic not shown
CLKBRIDGE
TXCLK
CTRL
DCM
Dr. John Jones ([email protected]) 1923.09.2009
DPLL
Round-Trip Phase Compensation, Version II
GTP RX
GTP TX GTP RX
GTP TX
δc
OCXO
RX
RE
CC
LK
CMP
CLKBRIDGE
RFCLK
δR+δFRPCBi
δT+δNTPCBi
δCB1
δT+δFTPCBiδR
+δNRPCBi
DCM
TX
CL
K
TXCLK
CLKBRIDGE
PIPELINEDELAY
LLRFCLK
LLRFDATA
LLRFMATRIX
δCB2
δFL
δFCB
δFDCM
NOTE: Control logic not shown
CLKBRIDGE
TXCLK
CTRL
Dr. John Jones ([email protected]) 2023.09.2009
Timing System Details
Version I has been implemented, mostly finished (calibration in software a.t.m.)
Caveat: 1 serial time UI variability seen on one channel in matrix cardThis needs further study, hard to reproduce and doesn’t occur on all channelsPossible to correct for this effect using a loopback techniqueXilinx datasheet implies this is an artifact of the way V5 MGTs work……but they don’t tell you the details…
Backup: Use LVDS @ 1Gb/s, which has completely deterministic behaviour
Dr. John Jones ([email protected]) 2123.09.2009
Conclusions
The Matrix Card is an extremely flexible device with many applications
A large part of this flexibility comes from evolution in FPGA technology
The addition of the cross-point switch provides significant extra flexibility