Project

General

Profile

Feature #23011

Add always --contain to the Singularity invocation and update wrapper adding improvement in the OSG one

Added by Marco Mambelli over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
07/29/2019
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

After an email discussion w/ Dave, Edgar and Mats, it seems that the removal of --contain for GPU jobs is old stuff, needed w/ Singularity 2.4 or older, not needed any more for 2.6 or 3.3.
The --nv option is sufficient.

As part of the ticket, changes in the OSG wrapper should be checked too

singularity_setup.sh (13.1 KB) singularity_setup.sh Marco Mambelli, 08/02/2019 02:27 PM
singularity_lib.sh (49.9 KB) singularity_lib.sh Marco Mambelli, 08/02/2019 02:27 PM
default_singularity_wrapper.sh (20.9 KB) default_singularity_wrapper.sh Marco Mambelli, 08/02/2019 09:20 PM

History

#1 Updated by Marco Mambelli over 1 year ago

  • Subject changed from Add always --contain to the Singularity invocation to Add always --contain to the Singularity invocation and update wrapper adding improvement in the OSG one

Changes in v34/23011
Wrapper updated using OSG wrapper #394b564
https://github.com/opensciencegrid/osg-flock/blob/master/job-wrappers/user-job-wrapper.sh

Email discussions w/ Mats Rynge and Dave Dykstra
  • --contains should always be used, if there are problems w/ GPUs should be fixed in Singularity
  • Should LSMOD_BETA turn on the use of module?
  • problem with home dirs management (--home) in some Singularity versions (see discussion below)

Singularity --home discussion:

On 8/2/19 4:03 AM, Dave Dykstra wrote:
Not only does it have to have an older 3.x version it also has to have
set that singularity.conf option which I don't think was a good idea in
the first place.  I don't think that CMS has made this change.  I think
it should be good enough to require that those sites that really want
"mount home = no" to upgrade to 3.2.1-1.1.   Mats, do you agree that
that is reasonable?

Yes, I think most places where we encountered this problem are pretty aggressive and have moved on already. My current version distribution is (first column is the count):

  7269 2.6.0-dist
   868 2.6.1-dist
   224 3.2.0-1.el7
     8 3.2.1-1.1.el7
   383 3.2.1-1.1.osg34.el6
   210 3.2.1-1.1.osg34.el7
   173 3.2.1-1.el7
   602 3.2.1-1.osg34.el6
  3658 3.2.1-1.osg34.el7

After the change, we did have a few pieces of software which was really upset with $HOME not being set or being set to a non-writeable home directory.

On Thu, Aug 01, 2019 at 09:18:40PM -0500, Marco Mambelli wrote:
Thanks Dave,
so adding something like:

--home \"$PWD\":/srv --bind $PWD:/srv  --pwd /srv 

Would use home if it works (and give a warning and ignoring the bind
mount) and doing the bind mount in the versions that ignore --home

And no known problems for --pwd

So this should work w/ all versions, am I correct?

It wouldn't quite be the same because it wouldn't set $HOME.

Or should I go the no-home route until we are sure that there are no
older 3.x around?

Not only does it have to have an older 3.x version it also has to have
set that singularity.conf option which I don't think was a good idea in
the first place.  I don't think that CMS has made this change.  I think
it should be good enough to require that those sites that really want
"mount home = no" to upgrade to 3.2.1-1.1.   Mats, do you agree that
that is reasonable?

Dave

Thanks,
Marco

On Aug 1, 2019, at 9:00 PM, Dave Dykstra <dwd@fnal.gov> wrote:

Hi Marco,

There was the issue in versions 3.x through 3.2.1-1 where --home was
being ignored on sites that set "mount home = no" in singularity.conf.
This was fixed in 3.2.1-1.1.

Dave

On Thu, Aug 01, 2019 at 08:17:25PM -0500, Marco Mambelli wrote:
Mats,
I'm updating the GWMS singularity job wrapper with the improvements that you did on the OSG one.

There are some changes in the use of home and initial directories that I don't know if are because of older versions of Singularity or if I should implement them. 

Dave, 
I'd like also your input on this if possible.

Previously the options for singularity exec included:
                                 --home $PWD:/srv \
                                 --pwd /srv \

Now instead have:
                                 --bind $PWD:/srv \
                                 --no-home  \
and then set manually home and pwd inside the singularity execution:
  cd /srv
  export HOME=/srv

Is the result equivalent?
Is the second better because the options in Singularity were unreliable?

Thanks,
Marco

PS I use also --ipc --pid --contain

Module discussion:

They are all for debugging modules, and I don't think you need any of those. We don't have the LMOD_BETA so I will remove it from my script as well.

On 8/1/19 6:33 PM, Marco Mambelli wrote:
Hi Mats,
there are some variables related to the use of modules in Singularity:
LMOD_BETA
InitializeModulesEnv
and I had also a MODULE_USE that I don't remember where is coming from
Is LMOD_BETA a switch also to turn on the use of module or just a selector of the type ignored if module is not used?
I.e Should LMOD_BETA=1 => InitializeModulesEnv=1 ?
In the GWMS wrapper I had MODULE_USE, LMOD_BETA and
[[ "x$LMOD_BETA" = "x1" ]] && MODULE_USE=1
So LMOD_BETA=1 was sufficient to enable the use of module
The older OSG wrapper had a similar behavior because had only LMOD_BETA
Then I'm always using module in the glidein as one of the possible ways to get the singularity binary.
Should I avoid that if InitializeModulesEnv is not 1?

#2 Updated by Marco Mambelli over 1 year ago

Adding patch for GlideinWMS 3.4.5 as attached files.
This version is backward compatible w/ 3.4.x scripts.

Factory:
  • backup and replace singularity_lib.sh and singularity_setup.sh in /var/lib/gwms-factory/web-base
  • run upgrade: /bin/systemctl stop gwms-factory; /usr/sbin/gwms-factory upgrade && /bin/systemctl start gwms-factory && echo "all OK"
Frontend:
  • backup and replace default_singularity_wrapper.sh in /var/lib/gwms-frontend/web-base/frontend
  • run upgrade: /bin/systemctl stop gwms-frontend; /usr/sbin/gwms-frontend upgrade && /bin/systemctl start gwms-frontend && echo "all OK" || echo "FAILED"

Patch notification sent:

Hi all,
here is what I think will be the 3.4.6 version of the GlideinWMS Singularity scripts.
These can be used as patch in 3.4.5 (and 3.4.2) Frontend and Factories.
It is an update of what I sent yesterday.

Files are attached to the ticket 23011 
Just replace the existing scripts using the instructions in the ticket (last comment):
https://cdcvs.fnal.gov/redmine/issues/23011

This fixes tickets 23011, 22998, 22962 

Both Factory and Frontend scripts are also compatible with the un-patched singularity scripts (the patch I sent yesterday required both frontend and factory to be patched)
Today's Frontend script includes some other fixes that were added to the OSG wrapper in the last months (e.g. better GPU and $HOME support)

The factory patch is critical to use GlideinWMS Singularity support.
The frontend patch is recommended 

I'm waiting on 2 critical ticket for 3.4.6 (#22999 and #22779), we should release mid next week. Then it will go to OSG testing

Thank you,
Marco

#3 Updated by Marco Mambelli over 1 year ago

  • Assignee changed from Marco Mambelli to Dennis Box
  • Status changed from New to Feedback

#4 Updated by Marco Mambelli over 1 year ago

  • File deleted (default_singularity_wrapper.sh)

#6 Updated by Dennis Box over 1 year ago

  • Assignee changed from Dennis Box to Marco Mambelli

#7 Updated by Marco Mambelli over 1 year ago

  • Status changed from Feedback to Resolved

#8 Updated by Marco Mambelli over 1 year ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF