Project

General

Profile

September-12-2018

Slides: https://indico.fnal.gov/event/17318/


Meeting Notes


Present

Margaret Votava - FNAL Project Sponsor/Scientific Computing Services Associate Head
Parag Mhashilkar - Project Lead
Marco Mambelli - Technical Lead
Dennis Box - Project Member
Lorena Lobato - Project Member
Marco Mascheroni - Project Member/OSG Factory Operations
Burt Holzman - HEPCloud Sponsor Proxy
Tony Tiradani - HEPCloud Technical Lead
Antonio Perez-Calero Yzquierdo - CMS
Dave Mason - USCMS T1 facility manager
Steve Timm - HEPCloud Technical Advisor
Ken Herner - FIFE
Joe Boyd - FIFE
Brian Lin - OSG


Communication

  • GlideinWMS v3.4 to supersede v3.2.22 in OSG 3.4
  • GlideinWMS v3.5 will not support Globus GRAM gt2/5
  • Steve: Is the timeline for use and integration of acquisition engine realistic?
    • Parag: We are dependent on HTCondor to deliver this feature so the timeline will be influenced by their priorities.
    • Burt: There is some work started already by Brian Bockelman and Jaime. This is outside the scope and control of the GlideinWMS team.

Support


Project Management


Roadmap

  • Brian Lin: There was an interested in moving issues and main code repo to github. Was there in progress?
    • Marco: GitHub is a full fledged mirror but we did not make progress in making it as a primary project space. The main worry is the loss of information (ticket discussion, interactions) that currently is captured in Redmine. All our existing tools are using redmine
    • We only get few external contributions and they can continue to send merge requests through github.
    • [Action item] Need to understand the priority and work required for migration.

Technical

  • Given the upcoming GlideinWMS v3.4.1, Brian Lin wanted to confirm that it is OK to move GlideinWMS v3.4 to OSG production.
    • Yes, GlideinWMS 3.4 and 3.4.1 should go in OSG 3.4 production series
  • Tony: How is use of singularity using never/options/ etc negotiated?
    • [Action Item]: Marco will send docs in notes. See below for his comments
The documentation for the upcoming 3.4.1 is available online:
http://GlideinWMS.fnal.gov/doc.v3_4_1/

Specifically, the Sections about Singularity are:
- Factory configuration:
http://GlideinWMS.fnal.gov/doc.v3_4_1/factory/configuration.html#singularity
- Frontend configuration:
http://GlideinWMS.fnal.gov/doc.v3_4_1/frontend/configuration.html#attr_singularity
- A summary of the important variables:
http://GlideinWMS.fnal.gov/doc.v3_4_1/factory/custom_vars.html#singularity_vars
This is from the Factory documentation:

An entry can control the use of Singularity by setting GLIDEIN_SINGULARITY_REQUIRE to NEVER (Singularity is not supported), OPTIONAL or PREFERRED (capable of Singularity but it is not enforced), REQUIRED (jobs must run with Singularity) or REQUIRED_GWMS (jobs must run with Singularity and use the GWMS wrapper scripts). This last option is the only one that really enforces Singularity, but is not compatible with VOs that currently self-manage Singularity with custom scripts, like OSG and CMS. The attribute can be set in the general or entry <attrs> section of the Factory configuration: <attr name="GLIDEIN_SINGULARITY_REQUIRE" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="REQUIRED"/>.
The value of GLIDEIN_SINGULARITY_REQUIRE is used by the Frontend to provision resources and by the Glidein to negotiate the use of Singularity with the jobs. An entry where Singularity is OPTIONAL or PREFERRED will allow to run without Singularity if the job prefers so or if Singularity fails. An entry requiring Singularity (REQUIRED, REQUIRED_GWMS) will not allow to run without Singularity and the Glidein will fail if Singularity fails (e.g. the singularity binary or the image are not found).

NOTE: For compatibility with previous versions and to ease the migration to the use of GWMS scripts, GLIDEIN_SINGULARITY_REQUIRE=REQUIRED works only if Singularity is managed via GWMS. GLIDEIN_Singularity_Use=GWMS_DISABLE in the Frontend configuration (default) allows VOs to manage Singularity independently from GWMS. GLIDEIN_SINGULARITY_REQUIRE=REQUIRED_GWMS will not accept jobs where GLIDEIN_Singularity_Use=GWMS_DISABLE. It is a stronger enforcement, but will not allow VOs managing Singularity on their own. Jobs with GWMS_DISABLE will not trigger Glideins on entries with REQUIRED_GWMS, and the Glidein will fail at setup setup if somehow this combination happens.

In v3.4 things work differently but you can still force the use of Singularity if VOs use the GWMS Singularity wrapper.
This is done by setting GLIDEIN_SINGULARITY_REQUIRE=True in the entry configuration in the Factory.
Most VOs (CMS, OSG, ...) use their own wrapper and in 3.4 there is no REQUIRED_GWMS, so setting GLIDEIN_SINGULARITY_REQUIRE=True will not gain you much.
  • Antonio: What is the relationship of condor annex to GlideinWMS
    • Parag: Evaluated condor annex couple of years ago. It can be considered as a proto type of acquisition engine. It works well to create annexes to condor clusters but lacks key functionality for GlideinWMS to use it.
  • Margaret: How much unittest coverage do you want to achieve?
    • Dennis: commonly it is 70-80%. Want to get to around 60% with critical functionality covered
  • Antonio: For factory operations we have a view on how we want to utilize resources, we need to interact 3 way interaction, factory, site, factory & site. We want to use CRIC is to improve the interaction with a stub automatically generated and tested, but will not remove the activity form the operator.
    • Mascheroni: tools to automate stuff to minimize the interaction. Will be a commissioning phase when moving to production. Example: max site allowed is 48 hrs but VO wants to use max 24 hrs.
      Marco: patching mechanism or work with site admin to fix config if required
  • Antonio: Do you need to work with CMS for Frontend scale tests?
    • Mascheroni: We can work with Diego to replicate tests and conditions