Project

General

Profile

March-07-2018

Slides: https://indico.fnal.gov/event/16575/

Meeting Notes


Present

Margaret Votava - FNAL Project Sponsor/Scientific Computing Services Associate Head
Parag Mhashilkar - Project Lead
Marco Mambelli - Technical Lead
Dennis Box - Project Member
Lorena Lobato - Project Member
Stu Fuess - HEPCloud Sponsor Proxy/Scientific Computing Facilities Associate Head
Burt Holzman - HEPCloud Sponsor Proxy
Steve Timm - HEPCloud Technical Advisor
Tony Tiradani - HEPCloud Technical Lead
Antonio Perez-Calero - CMS L2 Manager
James Letts - CMS L2 Manager
Eric Vaandering
Brian Bockelman
Tanya Levshina - Scientific Distributed Computing Solutions Department Head
Dave Dykstra
Jeff Dost - CMS/OSG Factory Operations
Marco Mascheroni - Project Member/OSG Factory Operations


Communication

  • Brian wants to see fixed bi-monthly stakeholder's meeting. This should also match with the release schedule.

Support

  • CMS plans to upgrade to latest version of the GlideinWMS later this month.

Project Management

  • Migration to Github
    • Parag: We have official repository in redmine and mirror in Github. We are already following the pull request model for external contributors
    • Brian: Add a GlideinWMS organization in Github (instead of having Burt host the project)
  • James: Would like to see decoupling of factory and frontend releases. Its becoming a priority.
    • Parag: We are working on reducing the release cycle down to 2 months.

Roadmap

  • Brian is concerned that the roadmap is missing concrete slide between now and next April.
    • Parag: Focusing on the next release v3.2.22. Information is available in the v3.2.22 release info.

Technical

  • Discussion on Singularity
    • Brian: Adoption is 87% for CMS CPUs sites. On the OSG side there is interest.
    • During the December stakeholders meeting, we discussed that multiple implementations of this feature existed, several from VOs and one from GlideinWMS itself. This was an opportunity for the GlideinWMS project to approach FIFE to see if they could use the GlideinWMS feature (or at least understand why they couldn't) instead of doing yet another implementation. Brian was unhappy that such a discussion did not take place between the GlideinWMS project and FIFE since then and the project should engage stakeholders better.
    • GlideinWMS does not drive or influence FIFE schedule. FIFE will work with the GlideinWMS team as needed.
    • Tanya: FIFE is submitting w/ singularity on sites w/ enabled singularity.
    • Frontends needed to be updated with GlideinWMS v3.2.22 to support Singularity
    • For the OSG and CMS Singularity scripts are evolving more quickly than the GlideinWMS releases, we should discuss if the release is the best place for these. Parag will schedule a meeting to find a path forward.
    • We need factory ops to upgrade to glideinWMS v3.2.21. SDSC factory was upgraded to 3.2.21 yesterday. v3.2.21 is Not in the OSG production repo yet. Until OSG pushes it to production, IU OSG factory would not update it.
  • Brian: Scalability wrt HL-LHC that is 6 yrs from now. Project is planning to remove the Frontend before HLLHC. Sounds strange to do major investments in the frontend and replace it with Decision Engine (DE) soon. Given the size of the team is that practical?
    • Parag: We are not throwing away the frontend. Improvments in the frontend along with the lessons learned will be moved to DE as DE modules. We can use DE frame
  • Jeff: Disabling check for pilot proxy's lifetime is important feature. What is the progress?
    • Mambelli: It is planned for the next release, v3.2.22. Redhat issue: #17102
  • Antonio: CMS has testing the "auto" feature for GLIDEIN_CPUS. Found the issue that if no single core jobs are in the queue, multicore core jobs will never get provisioned. Do we have estimated CPU count for first level matchmaking?
    • Mambelli: Redmine issue #16161
    • Parag: This is expected as "auto" does not help frontend in many cases.
  • Monitoring
    • Jeff: We are OK with current Factory monitoring released in v3.2.21. Monitoring for metasites case is a work in progress
    • In response to "retireing monitoring pages in GlideinWMS", CMS wants to keep glidein entry page for config monitoring
      • Parag: We will survey stakeholders before retiring pages and will take into account functionality current used and required by the stakeholders
  • Release stability
    • Jeff: At the previous meeting we mentioned things getting into production releases and breaking. But we discussed this in the meeting at FNAL and talked about plans to avoid it in the future.

Post Meeting Notes

  • Repeating Bi-monthly stakeholders meeting scheduled
  • Meeting to discuss future support for Singularity scheduled for Tuesday 13, 11am - noon.