Project

General

Profile

FIFEandContainers190228

WH8XO, 2/28/2019, 2pm
Remote attendance:
- https://fnal.zoom.us/j/8190868101
- tel: +1 646 558 8656 , Meeting ID: 819 086 8101

Convocation

Dear all,
we have identified as one of the FIFE milestones this year the ability to build and execute experiment specific containers on HEP Cloud. We discussed organizing a working group and invite representatives from the experiments to participate in it.
The goals of the group as we discussed at FIFE the workshop are:
Proposing to organize a working group that will understand experiments’ requirements for running “proprietary” containers on HEPCloud. We should address:
- Initial software build cluster
- Integration with CI
- Repository (local repo available)
- Registry
- Policy (lightweight process for users, but that causes delays for security vetting, etc.)
- GPU (libraries, drivers, etc.)
- Specific OS

This is an internal meeting (FIFE and CD and people with experience) to discuss how to proceed before inviting to the experiments in the discussion, e.g. how to collect requirements, what's our initial vision, what we'll be going to allow, ...

MINUTES IN SHORT:

FIFE discussed the use of containers and this should be added in its roadmap.
Much more is possible on Fermigrid or OSG but
To start we'd like to encourage experiments:
- to use an existing image if it works for them (this allows uniform environments, images are provided by OSG and some experiments).
- or to prepare a few images and use them on Fermigrid and OSG (experiments will be allowed to prepare images on ECF hosts and allowed a handful of images on OSG's CVMFS).

We will have a meeting on 3/28, 4 weeks from today, where all experiments are invited and some will start presenting their use-cases and requirements.
Marco or Tony will give a brief overview of Singularity on OSG and Fermigrid
Tony will present what the ECF infrastructure available to build docker images.
After hearing from the experiments we'll start outlining the steps for getting them started with their containers needs

Once I get input on the people to invite I will send the exchange invitation for the next meeting.

Below you will find the longer notes I summarized above.

LONGER NOTES:

FIFE discussed the use of containers and this should be added in its roadmap.
Following an email discussion was decided to have an internal meeting to see what can be provided to the experiments (by ECF/SCD) and what should be recommended.
A meeting with FIFE experiments will follow.

CMS and Atlas use containers regularly:
1. to have a uniform environment and protect themselves against what sites are/are-not providing. You know what you get on a FNAL worker node, not a generic node on the grid
2. to get all the packages they need in a container:
- CMS: all the distribution in a container
- Atlas: campaign specific containers
This covers production and the official analysis.
- official CMS analysis is using Crab, all is included in the CMS release (including multiple library versions)

Another important reason to use Singularity is to provide isolation and traceability
- Singularity provides isolation
- condor advertise maintains traceability
- OSG maintains standard images SL6 and SL7

Fermigrid has Singularity installed and allows expanded images off CVMFS
There are security concerns on allowing arbitrary images

GlideinWMS allows to use Singularity images using the singularity binary provided by the sites or the one distributed by OSG
- sets restrictions on using only images off CVMFS
- but a VO can lift them and allow an arbitrary image supported by Singularity

The Wilson cluster is also supporting Singularity containers:
- allows arbitrary container on the bare metal hardware
- encourages users to build their own container (on their machines, they require root to generate)
- maintaining experiment images will be difficult and unmaintainable long term

Each experiment could prepare its docker container for production
- this is converted to Singularity and distributed in CVMS (OSG)
- and OSG allows 3 (a limited number) images per experiment

Questions:
What is the policy to manage the lifetime/lifecycle?
What if there are individual analyzers that want to bring their own containers?

Singularity can run in privileged mode (setuid binary) or unprivileged mode

For singularity in order to be fully unprivileged either:
- the kernel allows regular users to set user namespace
- or must run on expanded images
OSG is moving towards the deployment and use of unprivileged Singularity
This will allow also other advantages like supporting condor_ssh_to_job

Other considerations:
- Allowing any analyzer to have their own image could cause a lot of file movements
And there have been severe IO problems on both CMS and Fermigrid lately
- Images are not the only solution to provide the desired software
Spack, package manager, could fill in the missing gaps

Fermilab (ECF SSI) would have build machines that could be used to build experiment images
Fully automated and tested:
- use jenkins to do CI/CD
- pushd to Docker register
- get an approved Docker container

Missing steps:
- convert to Singularity image
- distribute to OSG CVMFS
- send to NERSC and convert there
The OSG steps are easily covered by OSG: opening a ticket and referring to the docker image they will take care of all

They are ready for experiments to start using this infrastructure.
Few people for experiments, e.g. the librarians

We will have a meeting on 3/28, 4 weeks from today (HEPClous go-live is in 2 weeks, OSG all-hands in 3)
All experiments are invited and some will start presenting their use-cases and requirements.

NOVA, uboone and DES could be in the first group adding a contained plan.
Marco will get input on the people to invite and will send the exchange invitation for the next meeting.

Marco or Tony will give a brief overview of Singularity on OSG and Fermigrid
Tony will present what the ECF infrastructure available to build docker images.