Offline Discussion

32
Offline Discussi on M. Moulson 22 October 2004 Datarec status Reprocessing plans MC status MC development plans Linux Operational issues Priorities AFS/disk space

description

Datarec status Reprocessing plans MC status MC development plans Linux Operational issues Priorities AFS/disk space. Offline Discussion. M. Moulson 22 October 2004. Datarec DBV-20. Run > 31690. DC geometry updated Global shift: D y = - 550 μ m, D z = - 1080 μ m - PowerPoint PPT Presentation

Transcript of Offline Discussion

Page 1: Offline Discussion

Offline Discussion

M. Moulson22 October 2004

• Datarec status• Reprocessing plans• MC status• MC development plans• Linux• Operational issues• Priorities• AFS/disk space

Page 2: Offline Discussion

2

Datarec DBV-20DC geometry updated

Global shift: y = 550 μm, z = 1080 μmImplemented in datarec for Run > 28000Thickness of DC wall not changed (75 μm)

Modifications to DC timing calibrationsIndependence from EmC timing calibrations

Modifications to event classification (EvCl)New KSTAG algorithm (KS tagged by vertex in DC)Bunch spacing by run number in T0_FIND step 1 for ksl

2.715 ns for 2004 data (also for MC, some 2000 runs)

Boost valuesRuns not reconstructed without BMOM v.3 in HepDBpx values from BMOM(3) now used in all EvCl routines

Run > 31690

Page 3: Offline Discussion

3

Datarec operations

Runs 28479 (29 Apr) to 32380 (21 Oct, 00:00)413 pb-1 to disk with tag OK394 pb-1 with tag = 100 (no problems)388 pb-1 with full calibrations371 pb-1 reconstructed (96%) 247 pb-1 DSTs (except KK)

fsun03-fsun10 decommissioned 11 OctNecessary for installation of new tape librarydatarec submission moved from fsun03 to fibm35DST submission moved from fsun04 to fibm36

150 keV offset in s discovered!

Page 4: Offline Discussion

4

150 keV offset in sDiscovered while investigating ~100 keV discrepancies between physmon and datarec

+150 keV adjustment to fit value of s not implemented• in physmon• in datarec• when final BVLAB s values written to HepDB

Plan of action:1. New Bhabha histogram for physmon fit, taken from data2. Sync datarec fit with physmon3. Fix BVLAB fit before final 2004 values computed4. Update 2001-2002 values in DB records

histogram_history and HepDB BMOM 2001-2002 currently from BVLAB scan, need to add 150 KeV

Update of HepDB technically difficult, need a solution

Page 5: Offline Discussion

5

Reprocessing plans

Issues of compatibility with MC• DC geometry, T0_FIND modifications by run number• DC timing modifications do not impact MC chain• Additions to event classification would require new

MCDSTs onlyIn principle possible to use run number range to fix px values for backwards compatibility

Use batch queues?Main advantage: Increased stability

Page 6: Offline Discussion

6

Further datarec modifications

Modification of inner DC wall thickness (75 μm)Implement by run number

Cut DC hits with drift times 2.5 μsSuggested by P. de Simone in May to reduce fraction of split tracks

Others?

Page 7: Offline Discussion

7

ProgramEvents(106)

LSFTime

(B80 days)Size(TB)

ee 36 6 120 0.8

e+e (ISR only) 36 6 120 0.8

rad 114 5 480 1.7

ee ee 38 0.15 220 0.6

all 252 0.2 1100 6.9

all (21 pb-1 scan) 29 1 130 0.7

KSKL 411 1 2100 11.0

KK 611 1 2620 18.0

Total 1527 - 6890 40.5

KSKL rare 62 20* 320 est. 1.7 est.

MC production status

Page 8: Offline Discussion

8

Generation of rare KSKL events

KS

3

KL

(DE)

Peak cross section: 7.5 nbApprox 2x sum of BRs for rare KL channels

In each event, either KS or KL decays to rare mode

Random selection

Scale factor of 20 applies to KL

For KS, scale factor is ~100

Page 9: Offline Discussion

9

MC development plansBeam pipe geometry for 2004 data (Bloise)

LSB insertion code (Moulson)

Fix generator (Nguyen, Bini)

Improve MC-data consistency on tracking resolution (Spadaro, others)

• MC has better core resolution and smaller tails than data in Emiss pmiss distribution in background for KSe analysis

• Improving agreement would greatly help for precision studies involving signal fits, spectra, etc.

• Need to systematically look at other topologies/ variables

• Need more people involved

Page 10: Offline Discussion

10

Linux software for KLOE analysis

P. Valente had completed an earlier port based on free software

VAST F90-to-C preprocessorClunky to build and maintain

M. Matsyuk has completed a KLOE port based on the Intel Fortran compiler for Linux

Individual, non-commercial license is freelibkcp code compiles with zero difficulty

Reconsider issues related to maintenance of KLOE software for Linux

Page 11: Offline Discussion

11

Linux usage in KLOE analysisMost users currently processing YBOS DSTs into Ntuples on farm machines and transferring Ntuples to PCs

• AFS does not handle random-access data welli.e., writing CWNs as analysis output

• Multiple jobs on a single farm node stress AFS cache• Farm CPU (somewhat) limited• AFS disk space perennially at a premium

KLOE software needs minimal for most analysis jobs• YBOS to Ntuple: No DC reconstruction, etc.

Analysis jobs on user PCs accessing DSTs via KID and writing Ntuples locally should be quite fast

Continuing interest on part of remote users

Page 12: Offline Discussion

12

KLOE software on Linux: Issues1.Linux machines at LNF for hosting/compilation

3 of 4 Linux machines in Computer Center are down, including klinux (mounts /kloe/soft, used by P. Valente for VAST build)

2.KLOE code distributionUser PCs do not mount /kloe/softMove /kloe/soft to network-accessible storage?Use CVS for distribution?

Elegant solution but user must periodically update…

3. Individual users must install Intel compiler

4. KIDHas been built for Linux in the past

5. Priority/manpower

Page 13: Offline Discussion

13

Operational issues

Offline expert training1-2 day training course for all expertsGeneral update

PC backup systemCommercial tape backup system available to users to backup individual PCs

Page 14: Offline Discussion

14

Priorities and deadlines

In order of priority, for discussion:

1. Complete MC production: KSKL rare

2. Reprocessing

3. MC diagnostic work

4. Other MC development work for 2004

5. Linux

Deadlines?

Page 15: Offline Discussion

15

Disk resources

Current recalled areas

Production 0.7 TB

User recalls 2.1 TB

DST cache12.9 TB

(10.2 TB added in April)

2001 – 2002

Total DSTs7.4 TB

Total MCDSTs7.0 TB

2004

DST volume scales with L3.2 TB added to AFS cell

Not yet assigned to analysis groups

2.0 TB available but not yet installedReserved for testing new network-accessible storage solutions

Page 16: Offline Discussion

16

Limitations of AFSInitial problems with random-access files blocking AFS on farm machines resolved

Nevertheless, AFS has some intrinsic limitations:

Volume sizes at most 100 GB• Already pushed to the limit – max spec is 8 GB!

Cache must be much larger than AFS-directed data volume for all jobs on farm machine

• Problem characteristic of random-access files (CWNs)• Current cache sizes 3.5 GB on each farm machine

More than sufficient for a single jobPossible problems with 4 big jobs/machine

• Enlarging cache sizes requires purchase of more local disk for farm machines

Page 17: Offline Discussion

17

Network storage: Future solutionsPossible alternatives to AFS

1.NFS v. 4• kerberos authentication – use klog as with AFS• Size of data transfers smaller, expect fewer problems with random-access files

2.Storage Area Network (SAN) filesystem• Currently under consideration as a Grid solution• Works only with Fibre Channel (FC) interfaces• FC – SCSI/IP interface implemented in hardware/software

Availability expected in 2005

Migration away from AFS probable within ~6 months2 TB allocated to tests of new network storage solutions Current AFS system will remain interim solution

Page 18: Offline Discussion

18

Current AFS allocations

VolumesSpace (GB)

Working group

cpwrk 195 Neutral K

kaon 170 Neutral K

kwrk 200 Charged K

phidec 400 Radiative

ecl 149

mc 90

recwrk 30

trg 100

trk 90

365

200400

Page 19: Offline Discussion

19

A fair proposal?

Each of the 3 physics WGs gets 1400 GB total

Total disk space (incl. already installed) divided equally

• Physics WGs similar in size and diversity of analyses• WGs can make intelligent use of space

e.g.: Some degree of Ntuple sharing already present• Substantial increases for everyone anyway

Page 20: Offline Discussion

20

Additional information

Page 21: Offline Discussion

21

Offline CPU/disk resources for 2003

Available hardware:

23 IBM B80 servers: 92 CPU’s

10 Sun E450 servers: 18 B80 CPU-equivalents

6.5 TB NFS-mounted recall disk cache

Easy to reallocate between production and analysis

Allocation of resources in 2003:

64 to 76 CPU’s on IBM B80 servers for production

800 GB of disk cache for I/O staging

Remainder of resources open to users for analysis

Page 22: Offline Discussion

22

Analysis environment for 2003

Production of histograms/Ntuples on analysis farm:

4 to 7 IBM B80 servers + 2 Sun E450 servers

DST’s latent on 5.7 TB recall disk cache

Output to 2.3 TB AFS cell accessed by user PC’s

Analysis example:

440M KSKL events, 1.4 TB DST’s

6 days elapsed for 6 simultaneous batch processes

Output on order of 10-100 GB

Final-stage analysis on user PC/Linux systems

Page 23: Offline Discussion

23

CPU power requirements for 2004

020406080

100120

2001 2002 2004

0

20

40

60

80

100

120

140

160

2001 2002 2004

0

1

2

3

2001 2002 2004

Input rate (KHz)

Avg L (1030 cm2s1)

B80 CPU’s needed to follow acquisition

MC

DST

recon

76 CPU offline farm

Page 24: Offline Discussion

24

CPU/disk upgrades for 2004

Additional servers for offline farm:

10 IBM p630 servers: 10×4 POWER4+ 1.45 GHz

Adds more than 80 B80 CPU equivalents to offline farm

Additional 20 TB disk space

To be added to DST cache and AFS cell

More resources already allocated to users

8 IBM B80 servers now available for analysis

Can maintain this allocation during 2004 data taking

Ordered, expected to be on-line by January

Page 25: Offline Discussion

25

Installed tape storage capacity

IBM 3494 tape library:• 12 Magstar 3590 drives, 14 MB/s read/write • 60 GB/cartridge (upgraded from 40 GB this year)• 5200 cartridges (5400 slots)• Dual active accessors• Managed by Tivoli Storage Manager

Maximum capacity: 312 TB (5200 cartridges)

Currently in use: 185 TB

Page 26: Offline Discussion

26

0

100

200

300

400

500

600

Tape storage requirements for 2004

0

50

100

raw recon DST MC

0

50

100

raw recon DST MC

Stored vol. by type (GB/pb1)

2002

2004 est.Incl. streaming mods

11898

16 43

57 4916 43

Today +780pb1

+1210pb1

+2000pb1

Tape library usage (TB)

fre

e

raw

recon

DST

MC

Page 27: Offline Discussion

27

Tape storage for 2004

Additional IBM 3494 tape library• 6 Magstar 3592 drives: 300 GB/cartridge, 40 MB/s• Initially 1000 cartridges (300 TB)• Slots for 3600 cartridges (1080 TB)• Remotely accessed via FC/SAN interface• Definitive solution for KLOE storage needs

Bando di gara submitted to Gazzetta UfficialeReasonably expect 6 months to delivery

Current space sufficient for a few months of new data

Page 28: Offline Discussion

28

Machine background filter for 2004Background filter (FILFO) last tuned on 1999-2000 data

5% inefficiency for events, varies with background levelMainly traceable to cut to eliminate degraded BhabhasRemoval of this cut: Reduces inefficiency to 1%

Increases stream volume 5-10%Increases CPU time 10-15%

New downscale policy for bias-study sample:Fraction of events not subject to veto, written to streams

Need to produce bias-study sample for 2001-2002 dataTo be implemented as reprocessing of a data subset with new downscale policyWill allow additional studies on FILFO efficiency and cuts

Page 29: Offline Discussion

29

Other offline modifications for 2004Modifications to physics streaming:

Bhabha stream: keep only subset of radiative eventsReduces Bhabha stream volume by 4Reduces overall stream volume by >40%

KSKL stream: clean up choice of tags to retain

Reduces KSKL stream volume by 35%

KK stream: new tag using dE/dxFully incorporate dE/dx code into reconstructionEliminate older tags, will reduce stream volume

Random trigger as source of MC background for 200420 Hz of random triggers synched with beam crossing allows background simulation for L up to 21032 cm2s1

Page 30: Offline Discussion

30

KLOE computing resources

tape library IBM 3494, 5400 60GB slots, 2 robots, TSM 324 TB 12 Magstar E1A drives, 14 MB/sec each

managed disk space 0.8 TB SSA: offline staging6.5 TB 2.2 TB SSA + 3.5 TB FC: latent disk cache

offline farm19 IBM B80 4×POWER3 375

8 Sun E450 4×UltraSPARC-II 400

AFS cell2 IBM H70 4×RS64-III 340

1.7 TB SSA + 0.5 TB FC disk

online farm7 IBM H50 4×PPC604e 332

1.4 TB SSA disk

analysis farm4 IBM B80 4×POWER3 375

2 Sun E450 4×UltraSPARC-II 400

file servers 2 IBM H80 6×RS64-III 500

DB2 serverIBM F50 4×PPC604e 166

CISCOCatalyst

6000

nfs nfs

nfs afs

100 Mbps 1 Gbps

Page 31: Offline Discussion

31

2004 CPU estimate: details Extrapolated from 2002 data with some MC input

2002

L = 36 b1/s

T3 = 1560 Hz345 Hz + Bhabha680 Hz unvetoed CR535 Hz bkg

2004

L = 100 b1/s (assumed)

T3 = 2175 Hz960 Hz + Bhabha680 Hz unvetoed CR535 Hz bkg (assumed constant)

From MC: = 3.1 b (assumed)

+ Bhabha trigger: = 9.6 b + Bhabha FILFO: = 8.9 bCPU( + Bhabha) = 61 ms avg.

CPU time calculation:4.25 ms to process any event+ 13.6 ms for 60% of bkg evts+ 61 ms for 93% of + Bha evts

2002: 19.6 ms/evt overall – OK

2004: 31.3 ms/evt overall (10%)

Page 32: Offline Discussion

32

2004 tape space estimate: details 2001: 274 GB/pb1

2002: 118 GB/pb1

Highly dependent on luminosity

2004: Estimate a priori

Assume: 2175 Hz @ 2.6 KB/evtRaw event size assumed same for all events (has varied very little with background over KLOE history)

Assume: L = 100 b1/s

1 pb1 = 104 s:25.0 GB for 9.6M physics evts31.7 GB for 12.2M bkg evts

(1215 Hz of bkg for 104 s)

56.7 GB/pb-1 total

Stream2001-2002

GB/pb1

2004

GB/pb1

KK 11.6 11.6

KSKL 19.7 12.8

3.3 3.3

radiative 6.4 6.4

Bhabha 56.0 14.0

other 0.8 0.8

Total 98 49

raw recon Include effects of streaming changes:

MC Assumes 1.7M evt/pb1 produced all (1:5) and KSKL (1:1)