Project

General

Profile

March-15-2017

Slides: https://indico.fnal.gov/conferenceDisplay.py?confId=13970

Meeting Notes


Present

Margaret Votava - FNAL Project Sponsor/Scientific Computing Services Associate Head
Parag Mhashilkar - Project Lead
Hyunwoo Kim - Project Member
Marco Mambelli - Project Member
Dennis Box - Project Member
Dave Mason - USCMS Tier1 Facility Manager
James Letts - CMS L2 Manager
Antonio Perez-Calero - CMS L2 Manager
Tony Tiradani - HEP Cloud Technical Lead
Steve Timm - HEP Cloud Technical Advisor
Joe Boyd - FIFE Support
Stu Fuess - Scientific Computing Facilities Associate Head/Scientific Computing Division Deputy Head
Burt Holzman - SCD Assistant Head/Facilities Coordinator


Communication

  • Marco Mambelli is the new Technical Lead of the GlideinWMS project
  • We have a new opening for a developer position. Job posting will be out soon.

Project Management

  • Next Stakeholders meeting to be scheduled in June 2017 timeframe

Technical

  • Burt mentioned that the CMS would need to use their allocations at NERSC in May - July time frame. The current version of GlideinWMS, does not fully support the single pilot grabbing multiple resources at LCF. There are accounting issues that need to be handled correctly. Adding support for this is required for the CMS to use NERSC efficiently.
    • GlideinWMS team is aware of this request. It is documented in #15176 and will be prioritized accordingly.
  • Dave mentioned that adopting Singularity will be one of the priorities for USCMS in coming months and he is glad that GlideinWMS team is already considering this as one of the priorities.
  • Antonio wanted to understand the status of tickets related to accounting.
    • Logging activations and claims per glidein (#11755) will be released in v3.2.19
    • Aggregating and advertising pilot accounting information in the glidein job's classad was released in v3.2.17 (#13277). However, it needs to be enabled in the factory configuration.
  • Dave and Antonio mentioned that while running at high scale, CMS found that various thresholds and limits applied on the frontend side were too high to be triggered. CMS found that the limits are not what they interpreted earlier.
    • Parag: This was changed in v3_2_13 last year to make accounting work correctly with the multi-core glideins. In past single glidein resulted in a single core slot. These thresholds and limits are to protect the HTCondor collector. With multicore glideins we had to revise the semantics to make more sense. Since the number slots aka classads impacts the collector, it makes more sense to consider slots rather number of glideins. Mixture of single and multicore glideins makes the control tricky to achieve and can be impacted by the ratio of single core to multi core glideins.