Project

General

Profile

January-07-2016

Slides: https://indico.fnal.gov/conferenceDisplay.py?confId=10389

Meeting Notes

Present

Margaret Votava - FNAL Project Sponsor/Scientific Computing Services Associate Head
Parag Mhashilkar - Project Lead
Hyunwoo Kim - Project Member
Marco Mambelli - Project Member
Ken Bloom - USCMS Software and Computing Head
Dave Mason - CMS Tier1 Lead
James Letts - CMS L2 Manager
Antonio Perez-Calero - CMS L2 Manager
Burt Holzman - Assistant Head/Facilities Coordinator
Tony Tiradani - HEPCloud Technical Lead
Gabriele Garzoglio - HEPCloud Project Manager
Joe Boyd - FIFE Support
Mike Kirby - FIFE Support
Tanya Levshina - FIFE Support
Stu Fuess - Scientific Computing Facilities Associate Head
Chander Sehgal - OSG Production Support
Jeff Dost - OSG Factory Operations

Communication

  • Over the next few weeks, GlideinWMS homepage will be migrated to http://glideinwms.fnal.gov At present both old and new homepage is active and are in synch. Once the migration is complete, announcement will be made to the GlideinWMS community.

Project Management

  • Next Stakeholders meeting to be scheduled in April 2016

Technical

  • GlideinWMS v3.2.12 has been delayed due to increase in the scope and few critically issues uncovered during testing phase. It is in the final stages of testing and the release is expected during the week of January 11, 2016
  • GlideinWMS v3.3 is expected to be released after v3.2.12
  • Support for GPUs as a resource will be available in v3.2.12. Several users in OSG are interested in this feature. Once the new version is deployed in OSG, OSG operations will announce the availability of this feature to OSG users.
  • Dave Mason thinks that the fixes to the accounting of multicore glideins addressed in v3.2.12 could uncover some of the other issues/assumptions in the operations/usage.
  • FIFE is bringing up cluster at Fermilab with new configuration for HTCondor and GlideinWMS.
  • While providing patch for GlideinWMS, Burt Holzman noticed that some of the terminology used in GlideinWMS configuration and codebase is confusing. It is not always clear if the configuration/code refers to per core or per glidein or per slots, etc. This is a legacy from the past where one glidein corresponded to a single core that ran a single job. However with multicore glideins, this is not true any more. Streamlining the terminology is required. As a first step, current terminology should be documented in the manual. Also, information in monitoring pages may have bugs or are incorrectly labelled.
  • James Letts pointed out that CMS observed issues with the information represented in the GlideinWMS monitoring pages. For example, number of running jobs in the monitoring pages is always lower than the actual running jobs. Information logged in the frontend logs is correct. It’s not clear if this is a result of the accounting issues related to multicore glideins. CMS will continue to monitor the situation after deploying GlideinWMS v3.2.12 and see if latest bug fixes resolve the monitoring issue.
  • Based on the current schedule, CMS is likely to deploy GlideinWMS v3.2.12 frontend in couple of months timeframe.