The World Wide Distributed Computing Architecture of the LHC Datagrid

28
Distributed Data Management for LHC Dirk Duellmann CERN, Geneva Accelerating Science and Innovation 1

Transcript of The World Wide Distributed Computing Architecture of the LHC Datagrid

Page 1: The World Wide Distributed Computing Architecture of the LHC Datagrid

Distributed Data Management for LHC

Dirk Duellmann CERN, Geneva

Accelerating Science and Innovation

1

Page 2: The World Wide Distributed Computing Architecture of the LHC Datagrid

July

4th

201

2 T

he S

tatu

s of

the

Hig

gs S

earc

h

J. In

cand

ela

for t

he C

MS

CO

LLA

BO

RAT

ION

H #γγ candidate

[email protected]   2  

July

4th

201

2 T

he S

tatu

s of

the

Hig

gs S

earc

h

J. In

cand

ela

for t

he C

MS

CO

LLA

BO

RAT

ION

S/B%Weighted%Mass%Distribution%!  Sum%of%mass%distributions%for%each%event%class,%weighted%by%S/B%

!  B%is%integral%of%background%model%over%a%constant%signal%fraction%interval%

43

ATLAS: Status of SM Higgs searches, 4/7/2012 9

Evolution of the excess with time

Energy-scale systematics not included

Page 3: The World Wide Distributed Computing Architecture of the LHC Datagrid

4  

Founded in 1954: “Science for Peace”

Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in the Pre-Stage to Membership: Israel, Serbia Applicant States: Cyprus, Slovenia, Turkey Observers to Council: India, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and UNESCO

~ 2300 staff ~ 1050 other paid personnel ~ 11000 users Budget (2012) ~1000 MCHF

CERN: 20 member states

Page 4: The World Wide Distributed Computing Architecture of the LHC Datagrid

5  

Global Science: 11000 scientists

Page 5: The World Wide Distributed Computing Architecture of the LHC Datagrid

Dirk Düllmann, CERN/IT 7

Page 6: The World Wide Distributed Computing Architecture of the LHC Datagrid

8

Stars and Planets only account for a small percentage of the universe !

Page 7: The World Wide Distributed Computing Architecture of the LHC Datagrid

CERN  /  May  2011  

Page 8: The World Wide Distributed Computing Architecture of the LHC Datagrid

Ø  27 kilometre circle Ø  proton collisions at 7+7 TeV Ø  10.000 magnets Ø  8000 km super-conducting cables Ø  120 t of liquid Helium

The Large Hadron Collider

The largest super conducting installation in the word

Page 9: The World Wide Distributed Computing Architecture of the LHC Datagrid

Dirk Düllmann, CERN/IT 14

Precision ! The 27 km long ring is sensitive to <1mm changes

Tides

Stray currents

Rainfall

LHC

Page 10: The World Wide Distributed Computing Architecture of the LHC Datagrid

Dirk Düllmann, CERN/IT 17

Ø  140 000 m3 rock removed Ø  53 000 m3 concrete Ø  6 000 tons steel reinforcement Ø  55 meters long Ø  30 meters wide Ø  53 meters high (10-storey building)

The ATLAS Cavern

Page 11: The World Wide Distributed Computing Architecture of the LHC Datagrid

15  

 A  collision  at  LHC  

26  June  2009   Ian  Bird,  CERN  

Page 12: The World Wide Distributed Computing Architecture of the LHC Datagrid
Page 13: The World Wide Distributed Computing Architecture of the LHC Datagrid

Ian  Bird,  CERN   18  

The  Data  AcquisiIon  for  one  Experiment  

Page 14: The World Wide Distributed Computing Architecture of the LHC Datagrid

Tier  0  at  CERN:  AcquisiIon,  First  reconstrucIon,    Storage  &  DistribuIon  

[email protected]  

1.25 GB/sec (ions)

19  

2011: 400-500 MB/sec 2011: 4-6 GB/sec

Page 15: The World Wide Distributed Computing Architecture of the LHC Datagrid

20  

The  LHC  Computing  Challenge  

�  Signal/Noise:  10-­‐13  (10-­‐9  offline)  

�  Data  volume  �  High  rate  *  large  number  of  

channels  *  4  experiments  è  ~15  PetaBytes  of  new  data  each  

year  �  Compute  power  

�  Event  complexity  *  Nb.  events  *  thousands  users  

è 200  k  CPUs  è 45  PB  of  disk  storage  

�  Worldwide  analysis  &  funding  �  CompuIng  funding  locally  in  major  

regions  &  countries  �  Efficient  analysis  everywhere  è  GRID  technology  

à ~30 PB in 2012

à 170 PB à 300 k CPU

Page 16: The World Wide Distributed Computing Architecture of the LHC Datagrid

CERN  Computer  Centre  

CERN  computer  centre:  •  Built  in  the  70s  on  the  CERN  site  •  ~3000  m2  (on  three  machine  rooms)  •  3.5  MW  for  equipment  

A  recent  extension:  •  Located  at  Wigner  (Budapest,  Hungary)  •  ~1000  m2    

•  2.7  MW  for  equipment  •  Connected  to  CERN  with  2x100Gb  links  

21  

Page 17: The World Wide Distributed Computing Architecture of the LHC Datagrid

•  A  distributed  compuIng  infrastructure  to  provide  the  producIon  and  analysis  environments  for  the  LHC  experiments  

•  Managed  and  operated  by  a  worldwide  collaboraIon  between  the  experiments  and  the  parIcipaIng  computer  centres  

 •  The  resources  are  distributed  –  

for  funding  and  sociological  reasons  

•  Our  task  was  to  make  use  of  the  resources  available  to  us  –  no  mafer  where  they  are  located  

23  

World  Wide  Grid  –  what  and  why?  

Tier-0 (CERN): • Data recording •  Initial data reconstruction • Data distribution

Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis

Tier-2 (~130 centres): •  Simulation •  End-user analysis

Page 18: The World Wide Distributed Computing Architecture of the LHC Datagrid

•  The  grid  really  works  •  All  sites,  large  and  small  can  contribute  –  And  their  contribuIons  are  needed!  

[email protected]   24  

CPU  –  around  the  Tiers    CPU$delivered$+$January$2011$

CERN%

BNL%

CNAF%

KIT%

NL%LHC/Tier21%

RAL%

FNAL%

CC2IN2P3%

ASGC%

PIC%

NDGF%

TRIUMF%

Tier%2%

Tier%2%CPU%delivered%by%country%4%January%2011% USA$ UK$

France$ Germany$

Italy$ Russian$Federa7on$

Spain$ Canada$

Poland$ Switzerland$

Slovenia$ Czech$Republic$

China$ Portugal$

Japan$ Sweden$

Israel$ Romania$

Belgium$ Austria$

Hungary$ Taipei$

Australia$ Republic$of$Korea$

Norway$ Turkey$

Ukraine$ Finland$

India$ Pakistan$

Estonia$ Brazil$

Greece$

Page 19: The World Wide Distributed Computing Architecture of the LHC Datagrid

25  

Evolution  of  capacity:  CERN  &  WLCG  

0"

200000"

400000"

600000"

800000"

1000000"

1200000"

1400000"

1600000"

1800000"

2000000"

2008" 2009" 2010" 2011" 2012" 2013"

WLCG%CPU%Growth%

Tier2%

Tier1%

CERN%

0"

20"

40"

60"

80"

100"

120"

140"

160"

180"

200"

2008" 2009" 2010" 2011" 2012" 2013"

WLCG%Disk%Growth%

Tier2%

Tier1%

CERN%

0"

100000"

200000"

300000"

400000"

500000"

600000"

2005" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013"

CERN%Compu*ng%Capacity%

CERN"

2013/14:  modest  increases  to  process  “parked  data”  2015  à  budget  limited  ?        -­‐  experiments  will  push  trigger  rates        -­‐  flat  budgets  give  ~20%/year  growth  

What  we  thought  was  needed  at  LHC  start  

What  we  actually  used  at  LHC  start!  

Page 20: The World Wide Distributed Computing Architecture of the LHC Datagrid

•  Relies  on    – OPN,  GEANT,  US-­‐LHCNet  – NRENs  &  other  naIonal  &  internaIonal  providers  

Ian  Bird,  CERN   27  

LHC  Networking  

Page 21: The World Wide Distributed Computing Architecture of the LHC Datagrid

28  

Computing  model  evolution  

EvoluIon  of  compuIng  models  

Hierarchy   Mesh  

Page 22: The World Wide Distributed Computing Architecture of the LHC Datagrid

Physics Storage @ CERN: CASTOR and EOS  

CASTOR  and  EOS  are  using  the  same  commodity  disk  servers  

•  With  RAID-­‐1  for  CASTOR  •  2  copies  in  the  mirror  

•  JBOD  with  RAIN  for  EOS  •  Replicas  spread  over  different  disk  servers  

•  Tunable  redundancy  

Storage  Systems  developed  at  CERN  

30  

Page 23: The World Wide Distributed Computing Architecture of the LHC Datagrid

CERN Disk/Tape Storage Management @ storage-day.ch

CASTOR  -­‐  Physics  Data  Archive  

31

Data: •  ~90 PB of data on tape; 250 M files •  Up to 4.5 PB new data per month •  Over 10GB/s (R+W) peaks

Infrastructure: •  ~ 52K tapes (1TB, 4TB, 5TB) •  9 Robotic libraries (IBM and Oracle) •  80 production + 30 legacy tape drives

Page 24: The World Wide Distributed Computing Architecture of the LHC Datagrid

CERN IT Department CH-1211 Genève 23

Switzerland www.cern.ch/it

Internet Services

DSS 44.8 PB

136 (279) Mio.

20.7k

32.1 PB

EOS Usage at CERN Today

Page 25: The World Wide Distributed Computing Architecture of the LHC Datagrid

Availability  and  Performance  

Archival  &  Data  DistribuIon   User  Analysis  

Usage    Peaks  

pp  2012  pA  2013  

34  

Page 26: The World Wide Distributed Computing Architecture of the LHC Datagrid

CERN  openlab  in  a  nutshell  

•  A  science  –  industry  partnership  to  drive  R&D  and  innovaIon  with  over  a  decade  of  success    

•  Evaluate  state-­‐of-­‐the-­‐art  technologies  in  a  challenging  environment  and  improve  them    

•  Test  in  a  research  environment  today  what  will  be  used  in  many  business  sectors  tomorrow    

•  Train  next  generaIon  of  engineers/employees  

•  Disseminate  results  and  outreach  to  new  audiences  

40  

Page 27: The World Wide Distributed Computing Architecture of the LHC Datagrid

41

CERN IT Department CH-1211 Genève 23

Switzerland www.cern.ch/it

Internet Services

DSS Ongoing R&D: Eg Cloud Storage

•  CERN openlab – joint project since Jan ’12 – Testing scaling and TCO

gains with prototype applications

•  Huawei S3 storage appliance (0.8 PB)

•  logical replication •  fail-in-place

Page 28: The World Wide Distributed Computing Architecture of the LHC Datagrid

Thanks for your attention!

More at http://cern.ch

Accelerating Science and Innovation 45