Project

General

Profile

July-27-2016

Slides: https://indico.fnal.gov/conferenceDisplay.py?confId=12627

Meeting Notes

Present

Margaret Votava - FNAL Project Sponsor/Scientific Computing Services Associate Head
Burt Holzman - SCD Assistant Head/Facilities Coordinator
Stu Fuess - Scientific Computing Facilities Associate Head
Parag Mhashilkar - Project Lead
Marco Mambelli - Project Member
Marco Mascherone - Project Member
Hyunwoo Kim - Project Member
Dave Mason - CMS Tier1 Lead
James Letts - CMS L2 Manager
Antonio Perez-Calero - CMS L2 Manager
Gabriele Garzoglio - HEPCloud Project Manager
Tony Tiradani - HEPCloud Technical Lead
Steve Timm - HEPCloud Technical Advisor
Joe Boyd - FIFE Support
Tanya Levshina - FIFE Support
Bo Jayatilaka - OSG Production Support

Communication

  • Project home page was successfully moved to the new URL: http://glideinwms.fnal.gov
  • Burt pointed out that, it will be 10 years in September first version of the GlideinWMS was released in September 19, 2006. Since it's inception, the GlideinWMS product has continuously evolved to meet the needs of its stakeholders and to keep up with the technological advancements.

Project Management

  • Senior Management will be discussing the effort for upcoming year. It is expected that the effort will be driven by the requirements and timescale of the HEPCloud project .
  • Next Stakeholders meeting to be scheduled in September/October 2016 timeframe

Technical

  • In the recent releases of GlideinWMS, more monitoring information and statistics has been made available through various classads in the system. Publishing of additional information has also been planned for upcoming release. CMS, FIFE and OSG has been using this information in classads to build their monitoring visualization and dashboards. These VO specific monitoring provides the users and operators with more detailed and tailored information. Based on this info and input from the stakeholders, Milestone for unified monitoring in the GlideinWMS is now much lower in priority and may not be necessary.
  • Margaret wanted to understand if the CMS and OSG stakeholders are looking into options beyond GlideinWMS to meet their provisioning needs.
    • CMS:
      • As per Dave there are no plans from CMS to replace GlideinWMS. This is consistent and in line with what Parag has been hearing from Eric Vaandering. However there is always a possibility that things can change in future.
      • As per Bo CMS will be paying close attention to the community white paper that is expected next year. CMS will consider the recommendations in the white paper and come up with future coarse of action.
  • Dave Mason is interested in the understand HEPCloud made feature request to propagate information from the job to the Glidein.
    • This feature is being actively worked on in the context of the BOSCO ticket and will be available in v3.2.16 or one of the near future releases.
  • James Letts brought up the issue of giving VOs to control pilot pressure at individual site. Some of the sites have many sub-sites which some sites are represented multiple times to account for HA setup. This results in certain bias and in the current architecture, some sites either get too many glideins while others get too few. As we move into the multi-core glidein era, this has manifested into a bigger issue. CMS has asked the GlideinWMS project to provide with a possible solution that addresses this issue.
    • Parag: This needs to be treated carefully so that factory still remains responsible and in control of the sites, while frontend makes bulk request for glideins at a group of sites. We do not want to get into an architecture where we blur the distinction between factory and frontend and the frontend starts performing tasks that factory is responsible.
    • Burt: This is a fine tuning issue. It could also be a policy issue which development cannot address.
    • Antonio: It was ok in case of single core but not any more in case of multi core glideins. VO knows about the site more in details from pledges or past experience and VO's intervention maybe required to address this issue.