Support unprivileged singularity and update the singularity scripts
CMS, FIFE, OSG
Add support to the script that invokes singularity to first try unprivileged singularity before a privileged version in /usr/bin. The current path to the tool is/cvmfs/oasis.opensciencegrid.org/mis/singularity/el-x86_64/bin/singularity
This only works on el7.4 kernels, and only if they have enabled it by boot parameter and sysctl option, so you might want to limit looking for this tool to kernels 3.10.0-693 and greater. A pilot would need the el6 version if it is running under docker on an el7.4 host.
It's probably a good idea to also allow overriding that default path with a factory variable.
#5 Updated by Dennis Box about 2 years ago
Brian commented on this issue, I assumed it automatically made it into the ticket but it doesn't.
Kernel versions are incredibly misleading - historically, these have been fairly unsuccessful indicators of features (e.g., we still see UPS/UPD screwing this up periodically).
I would recommend instead to simply always test the unprivileged execution. There is extremely little cost in doing so.
#6 Updated by Marco Mambelli about 2 years ago
- Subject changed from Support unprivileged singularity to Support unprivileged singularity and update the singularity scripts
I changed the scope of the ticket.
Brian mentioned that the scripts changed a lot form when Hyunwoo worked on the singularity ticket.
Mats provided the updated OSG scripts.
Please compare what they do and whether it should be added to the GWMS functionalities or not.
For the GPU support, how does this compares with the GlideinWMS GPU support?
I did share my scripts and concerns with HyunWoo, so some of this might
be a repeat. The scripts are available at:
My main concern is that the Singularity execution is in the critical
path - if you have any issues, we will quickly see a lot of failed user
jobs. As a protection, OSG VO runs a bunch of tests in in a periodic
validation script (first one above), and only advertise and execute with
Singularity only if everything looks perfect.
Some of the checks are could be considered VO specific. For example, we
only advertise Singularity support if we can access
/cvmfs/singularity.opensciencegrid.org/ . This is our main repository of
images, and even though you technically could run Singularity without
this repo, we (OSG VO) do not want to support that.
Another thing which is tricky, but may not necessary for all VOs, is how
we re-exec the validation script so that part of it runs inside the
selected default image. For us, this approach allows us to have a set of
default images (currently el6 and el7), use a weighted random selection
of what default image to use on each node, and then test and advertise
the capabilities of that image.
Supporting GPUs is another issue, which we have spent a fair amount of
time on. However, we might want to ignore this for now, as it seems like
the --nv flag in the newest versions should solve most of these issues.
Note that I have not talked about the actual job wrapper (2nd script
above). That part is pretty easy, but in our case it is closely tied to
the information from the advertise script. Similar to the adveritse
script, we use a re-exec approach here to make the decision in the
middle of the script if we should use Singularity or not, and then
execute the remaining of the script inside the container. This allows us
to load modules and environment in the same way regardless of
Singularity execution or not.
#7 Updated by Dennis Box about 2 years ago
Instructions on enabling unprivileged singularity here https://opensciencegrid.github.io/docs/worker-node/install-singularity/
#8 Updated by Dennis Box about 2 years ago
- Status changed from New to Feedback
- Assignee changed from Dennis Box to Marco Mambelli
Code changes in origin branch v3/17639.
Commit notes: Tests singularity that lives in SINGULARITY_BIN as
specified in glideinwms.xml. If execution is successful, passes
this singularity and appropriate env variables on to user_job_wrapper.sh
If SINGULARITY_BIN is unsuccessful, for example if its an unprivileged
singularity running on an architecture that doesnt support this,
falls back to singularity in /usr/bin. If that one is successful,
passes all of this along with appropriate env and condor settings to
the user job wrapper.
The osg-flock changes I incorporated into our glideinwms scripts were to 1) test that singularity is properly inheriting environment variables from the parent environment and to 2) keep track of the original filename of $GWMS_SINGULARITY_IMAGE using a new env variable $GWMS_SINGULARITY_IMAGE_HUMAN.
To test the new feature point SINGULARITY_BIN to a CE that supports unprivileged mode, I have such a CE set up on fermicloud127. Here is an xml fragment that demonstrates:
<entry name="TEST_SITE_4" auth_method="grid_proxy" enabled="True" gatekeeper="fermicloud127.fnal.gov fermicloud127.fnal.gov:9619" gridtype="condor" rsl="(queue=default)(jobtype=single)" schedd_name="fermicloud173.fnal.gov" trust_domain="OSG" verbosity="fast" work_dir="OSG">
<default_per_frontend glideins="5000" held="50" idle="100"/>
<per_entry glideins="10000" held="1000" idle="2000"/>
<release max_per_cycle="20" sleep="0.2"/>
<remove max_per_cycle="5" sleep="0.2"/>
<restrictions require_glidein_glexec_use="False" require_voms_proxy="False"/>
<submit cluster_size="10" max_per_cycle="100" sleep="0.2" slots_layout="partitionable">
<attr name="CONDOR_ARCH" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="x86_64"/>
<attr name="CONDOR_OS" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="rhel7"/>
<attr name="CONDOR_VERSION" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="8.6.5"/>
<attr name="GLIDEIN_SINGULARITY_REQUIRE" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="False"/>
<attr name="SINGULARITY_BIN" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="/cvmfs/oasis.opensciencegrid.org/mis/singularity/el7-x86_64/bin"/>
<attr name="GLIDEIN_CPUS" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="int" value="2"/>
<attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="Test_Site_4"/>
<attr name="GLIDEIN_Supported_VOs" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="OSG"/>
<attr name="USE_CCB" const="True" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="string" value="True"/>
#9 Updated by Dave Dykstra about 2 years ago
It now occurs to me, despite what I put in the initial ticket description, that it would be better to first try singularity in /usr/bin (really, in the PATH) and try the unprivileged singularity second. The use case is if some system administrator wants some special configuration to take priority. In particular I'm thinking about system administrators wanting to bind in some extra paths. That is actually not currently supported when the --contain option is used (and grid jobs do use it), but a singularity developer considers that to be a bug so that may change in the future. Additionally it's slightly lower overhead to try something from local disk than from cvmfs, and it is more likely to succeed if present. In fact it should probably be a fatal error if it is present in the PATH and fails.
For that matter, if you try unprivileged singularity first and it fails, can you tell if it failed because singularity didn't work or is that indistinguishable from when the payload fails? It would be bad for a job to run for a long time under unprivileged singularity before failing, then tried again under regular singularity. The algorithm should probably be to use a singularity from $PATH if present, otherwise use singularity from SINGULARITY_BIN, and not do anything different based on success or failure.
I don't know if this needs a discussion or if you'd like to go ahead and just do it that way. I think it's probably the only way that makes sense.