Alice data Challenge Experience Stefano Bagnasco I. N. F. N. Torino Contents alice data Challenge 2004


Download 445 b.
Sana05.04.2017
Hajmi445 b.


ALICE Data Challenge Experience Stefano Bagnasco I.N.F.N. Torino


Contents

  • ALICE Data Challenge 2004

  • Production infrastructure: AliEn + LCG (+ INFNGRID)

  • Interfacing strategy

  • Software installation

  • Round 1 experience

  • Lessons learned











Available resources

  • Several AliEn “native” sites (some rather large)

    • CERN, CNAF, Catania, Cyfronet, FZK, JINR, LBL, Lyon, OSC, Prague, Torino
  • LCG-2 core sites

    • CERN, CNAF, FZK, NIKHEF, RAL, Taiwan (more than 1000 CPUs)
  • GRID.IT sites

    • LNL.INFN, PS.INFN and several smaller ones (about 400 CPUs not including CNAF)
  • Strategy: interface AliEn to LCG, use AliEn as a front-end for production, manage LCG resources through a “gateway”











ALICE Data Challenge

  • Phase 1 first round (central events) ended April 5

    • 23912 events generated (current model is 1 event per job)
    • Average job running time 10 to 15 hours
    • Up to 1450 jobs simultaneously running, 627 on average (403 on AliEn resources, 207 on LCG with a maximum of 767).
    • 157.8 MSI-2K Hours CPU (84.9 MSI-2K Hours on AliEn native resources, 72.9 MSI-2K Hours on LCG resources)
    • 15.6 TB of data generated in 789096 files
  • No reliable SE or Replica Management available on LCG-2

    • Produced files always moved to CERN CASTOR upon completion (which, in different instances, was responsible for stops in data production)
    • Produced files registered in the AliEn Data Catalogue only


PDC 2004 Statistics

  • Up to 1800 CPU simultaneously under AliEn control

    • 1400 running jobs + 400 saving
    • About half “native AliEn”, half LCG-2+GRID.IT


PDC 2004 Statistics

  • Statistics after round 1 (ended april, 4): job distribution

    • Alice::CERN::LCG is the interface to LCG-2
    • Alice::Torino::LCG is the interface to GRID.IT


PDC 2004 Statistics

  • Statistics after round 1: Failed jobs

    • Alice::CERN::LCG is the interface to LCG-2
    • Alice::Torino::LCG is the interface to GRID.IT


PDC 2004 Monitoring

  • LCG jobs seen through AliEn MonaLisa monitoring

    • Ramp-up slope shows no major performance degradation


PDC 2004 Monitoring

  • GRID.it job distribution seen through MonaLisa monitoring

    • [FreeCPUs || WaitingJobs] ranking fills large sites first
    • Default ranking (EstimatedResponseTime) not working on many sites


Conclusions

  • First major data production transparently carried out on very different middlewares (AliEn + LCG)

  • Performance of Workload Management middleware greatly improved since last year: no problems from there.

  • Site configuration major culprit for failed jobs: software installation infrastructure still rudimental, problems with published information (e.g. ERT ranking)

  • Phase 2 will require local storage management.

  • Looking forward to test storage interface on the LCG EIS testbed

  • AliEn provides the functionality for distributed analysis, but still awkward to exploit it on LCG-controlled sites




Do'stlaringiz bilan baham:


Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2017
ma'muriyatiga murojaat qiling