Project

General

Profile

Feature #13807

Support Singularity (future replacement for glexec)

Added by Parag Mhashilkar over 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
09/08/2016
Due date:
% Done:

0%

Estimated time:
Stakeholders:

CMS, OSG

Duration:

Description

Hi all,

To keep everyone up-to-date:

I just posted an initial branch for singularity support in HTCondor:

https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5828

When the support is desired, we'd need to set:

SINGULARITY_JOB = true
SINGULARITY_IMAGE_EXPR = "/cvmfs/cernvm-prod.cern.ch/cvm3"
MOUNT_UNDER_SCRATCH = /tmp, /var/tmp

Parag: What would it take to integrate this into glideinWMS (assuming the feature doesn't change too much during the review + release process)?

Todd: can you speculate who on the HTCondor FW team should review this work?

Brian

History

#1 Updated by Parag Mhashilkar over 2 years ago

On Sep 1, 2016, at 3:54 PM, Parag A Mhashilkar wrote:

We may have few more things to add but here is the initial list

1) HTCondor with singularity support and how to settings in the condor_config for glidein

Here's the HTCondor ticket:

https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5828

This includes the proposed HTCondor configuration knobs.

2) Site with singularity deployed and well tested and with some one intimate with the product to help us with the validation part. i.e equivalent of glexec_setup.sh

It's probably best to wait for this - the git branch referenced above relies on some non-released features (whose command line arguments are not yet settled). I expect things to be released in September.

Once the release occurs, we'll deploy across Nebraska: this should help bootstrap the use on OSG.

3) Given that glideinwms need to support both glexec and singularity for sometime, this will require development on the glideinwms side

Hopefully not too much?

My plan is, if Singularity is present, to automatically disable glexec. HTCondor will automatically advertise if Singularity is functional - I assume that the VO can provide the HTCondor config settings if they want to REQUIRE singularity, NEVER use it, or only use it if PRESENT. Would this avoid having any singularity-specific code in glexec?

Brian

#2 Updated by Parag Mhashilkar over 2 years ago

Unless there is a reason not to, my plan is to mirror what we do in case of glexec (same as you describe)
1. Factory advertises availability of singularity for an entry.
<attr name=“SINGULARITY_BIN” …/>
<attr name=“SINGULARITY_JOB” …/>
2. Frontend decides how to use it: REQUIRED | OPTIONAL | NEVER
<attr name=“GLIDEIN_Singularity_Use” …/>

#3 Updated by Parag Mhashilkar over 2 years ago

  • Stakeholders updated (diff)

#4 Updated by Parag Mhashilkar over 2 years ago

  • Stakeholders updated (diff)

#5 Updated by Parag Mhashilkar over 2 years ago

  • Target version changed from v3_2_16 to v3_2_17

#6 Updated by Parag Mhashilkar over 2 years ago

  • Target version changed from v3_2_17 to v3_2_18

#7 Updated by Marco Mambelli about 2 years ago

  • Target version changed from v3_2_18 to v3_2_19

#9 Updated by Parag Mhashilkar about 2 years ago

  • Priority changed from Normal to High

#10 Updated by Parag Mhashilkar almost 2 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

#11 Updated by Marco Mambelli almost 2 years ago

  • Assignee changed from Marco Mambelli to HyunWoo Kim

#12 Updated by Marco Mambelli almost 2 years ago

  • Target version changed from v3_2_19 to v3_2_20

#13 Updated by HyunWoo Kim almost 2 years ago

  • Status changed from New to Assigned

Today, finally I could successfully include Brian's scripts in my test glideinwms cincumstances
in which I have been adding support for singularity.
I removed some parts of his scripts that I determined to be irrelevant to my development
but most of features in his scripts were tested successfully today.

So, at this point, my next steps would be:
- to determine how we will support the use of users' own singularity images
(the default image will be obviously the standard images in singularity cvmfs repo)
- to run some real world jobs..

#15 Updated by HyunWoo Kim almost 2 years ago

  • Status changed from Assigned to Feedback

I think it's time to have this ticket reviewed by Marco Mambelli.

#16 Updated by HyunWoo Kim almost 2 years ago

Today, I modified the code further in order to make GLIDEIN_Singularity_Use=OPTIONAL more sensible in Frontend configuration.
Now, in this case(OPTIONAL), if the matched entry does not have SINGULARITY_BIN set, the singularity setup script will simply set HAS_SINGULARITY to False
and the wrapper script will do nothing and just executes the user job directly (outside of singularity).

I also updated both Frontend and Factory instructions with how to configure them to use or support Singularity.

#17 Updated by Marco Mambelli over 1 year ago

  • Status changed from Feedback to Assigned

#18 Updated by HyunWoo Kim over 1 year ago

I found two potential bugs and fixed them today.

1. So far, default_singularity_setup.sh (in Factory) had to be put explicitly into entry in glideinWMS.xml.
But if GLIDEIN_Glexec_Use=OPTIONAL and default_singularity_wrapper.sh is found in fontendl.xml
and if glideinWMS.xml does not have default_singularity_setup.sh in an Entry and if this Entry does not support singularity,
the default_singularity_wrapper.sh might fail to run the user job successfully.
This might be true, but I decided to make default_singularity_setup.sh a default script in Factory just like glexec_setup.sh
If Entry admin wants to customize this singularity_setup.sh, they have to modify this file instead of using a new script..

2. If a Group in frontend.xml is missing GLIDEIN_Singularity_Use attribute completely,
current code in default_singularity_setup.sh will set use_singularity (ordinarily GLIDEIN_Singularity_Use ) to OPTIONAL
and in this case if Group in frontend.xml happens to have default_singularity_wrapper.sh (by mistake),
the user job will run inside singularity when this Group is matched to Entry with SINGULARITY_BIN set..
This might not be what Group intended..
So, I am modifying default_singularity_setup.sh to set use_singularity to NEVER

With these 2 new changes, I tested the following combinations of Entry configurations and Group configurations regarding singularity.

1. If an Entry wants all glideins to run inside singularity, the Entry must
- set GLIDEIN_SINGULARITY_REQUIRE=True
AND
- set SINGULARITY_BIN to a real path name

A. when GLIDEIN_Glexec_Use=REQUIRED
- Only SINGULARITY_BIN=pathname is required in query_expr AND SINGULARITY_BIN=pathname is required in match_expr
result> experiment success

B. when GLIDEIN_Glexec_Use=OPTIONAL
- default_singularity_wrapper.sh will be executed, i.e. user job will run inside singularity
result> experiment success

C. when GLIDEIN_Glexec_Use=NEVER
- cvWParamDict.py will NOT match this Group with the current Entry.
- singularity_setup.sh will exit 1
result> experiment success, unmatched..as expected..

2. If an Entry permits some glideins that want to run inside singularity to run inside singularity BUT does not enforce the use of singularity
the Entry must
- set GLIDEIN_SINGULARITY_REQUIRE=False
AND
- set SINGULARITY_BIN to a real path

A. when GLIDEIN_Glexec_Use=REQUIRED
-  SINGULARITY_BIN=pathname is required in query_expr AND SINGULARITY_BIN=pathname is required in match_expr
expectation> user job will run inside singularity
result> experiment success, as expected, user job ran inside singularity

B. when GLIDEIN_Glexec_Use=OPTIONAL
- default_singularity_wrapper.sh will be executed, i.e. user job will run inside singularity
result> experiment success as expected, the user job ran inside singularity..

C. when GLIDEIN_Glexec_Use=NEVER
- cvWParamDict.py will match this Group with the current Entry
- singularity_setup.sh will exit 0 if default_singularity_wrapper.sh is missing in frontend.xml
result> experiment success, sleep job ran outside singularity

if I want to deal with a case where default_singularity_wrapper.sh is mistakenly used, 
  singularity_setup.sh will have to set advertise HAS_SINGULARITY "False" "C" 
expectation> user job will run outside singularity
result> experiment success,  as expected, the user job...ran ...outside singularity...

3. If Entry does not have singularity installed OR wants no glideins to run inside singularity even if Entry has singularity installed
the Entry
- should NOT set GLIDEIN_SINGULARITY_REQUIRE to True
AND
- should NOT set SINGULARITY_BIN to a real path

A. when GLIDEIN_Glexec_Use=REQUIRED
- SINGULARITY_BIN=pathname is required in query_expr AND SINGULARITY_BIN=pathname is required in match_expr
- singularity_setup.sh is not available.
- expectation> will not be matched..
result> experiment success, as expected, unmatched...

B. when GLIDEIN_Glexec_Use=OPTIONAL
- HK problem found> cvWParamDict.py does NOT do anything because this is glexec syntax 
   which is based on the assumption that glexec_setup.sh is always running.
<decision to make>
1. if, as is the case currently, default_singularity_setup.sh is optional(this means the Entry operator can use their own version of singularity_se\
tup.sh),
   GLIDEIN_Glexec_Use=OPTIONAL should NOT be allowed.
2. if GLIDEIN_Glexec_Use=OPTIONAL should be allowed(some Groups will prefer OPTIONAL), default_singularity_setup.sh should be used by default
   and this means the Entry operator can NOT use their own version of singularity_setup.sh.
</decision to make>

- So, for now, I will modify the code such that default_singularity_setup.sh is used by default always everywhere..
  (this means the Entry operator can NOT use their own version of singularity_setup.sh),
  then, cvWParamDict.py does NOT do anything BUT, default_singularity_setup.sh will set  advertise HAS_SINGULARITY "False" "C" 
expectation> will be matched BUT, default_singularity_setup.sh will make sure the user job runs outside singularity
result> experiment success, as expected, the user job ran outside singularity

C. when GLIDEIN_Glexec_Use=NEVER
- cvWParamDict.py will match this Group with the current Entry(as GLIDEIN_SINGULARITY_REQUIRE = False),
- singularity_setup.sh conduct a redundant check to see if GLIDEIN_SINGULARITY_REQUIRE = True, if so, will exit 1.
- singularity_setup.sh will set advertise HAS_SINGULARITY "False" "C" and exit 0
- and the user job will run outside singularity..
result> experiment success, the user job will ran outside singularity in both cases
   + <attr name="GLIDEIN_SINGULARITY_REQUIRE" ..... value="False"/>
   + and when this line is completely missing

Extra Test:
What if GLIDEIN_Singularity_Use is missing from frontend.xml???

To simulate this siguation, I removed the following code from frontend.xml  <groups>    <group name="main" enabled="True">
         <attrs>
            <attr name="GLIDEIN_Singularity_Use" glidein_publish="True" job_publish="True" parameter="False" type="string" value="REQUIRED"/>
         </attrs>
         <files>
            <file absfname="/var/lib/gwms-frontend/web-base/frontend/default_singularity_wrapper.sh" wrapper="True" comment="comment"/>
         </files>

I turns out that  default_singularity_setup.sh has the following code
    use_singularity=`grep '^GLIDEIN_Singularity_Use ' $glidein_config | awk '{print $2}'`
    if [ -z "$use_singularity" ]; then
        echo "`date` GLIDEIN_Singularity_Use not configured. Defaulting it to OPTIONAL" 
        use_singularity="OPTIONAL" 
    fi
in other words, use_singularity will be set to OPTIONAL, in this situation:

expectation> in current code, use_singularity will be set to OPTIONAL in default_singularity_setup.sh and the user job will run inside singularity
but this might NOT be what Group wants,
i.e. I might have to modify default_singularity_setup.sh so that use_singularity is set to NEVER when
group does not use   <attr name="GLIDEIN_Singularity_Use" ... value="REQUIRED"/>

result> experiment, as expected, the usr job ran outside singularity because default_wrapper.sh was missing even though use_sing was set to OPTIONAL,
result> If there were default_sing_wrapper.sh, the user job would be running inside singularity

The following is an evidence that default_sing_setup.sh set use_singularity to OPTOINAL
[root@fermicloud025 ~]# grep -i singularity /tmp/glide_pCNjBU/glidein_config
SINGULARITY_BIN /usr/bin
HAS_SINGULARITY True
GWMS_SINGULARITY_PATH /usr/bin/singularity
GWMS_SINGULARITY_IMAGE_DEFAULT /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo:el6

but the job actually ran outside singularity simply because default_sing_wrapper.sh was missing in frontend.xml

So, this again justifies the new change to default_singularity_setup.sh
    use_singularity=`grep '^GLIDEIN_Singularity_Use ' $glidein_config | awk '{print $2}'`
    if [ -z "$use_singularity" ]; then
        use_singularity="NEVER" 
    fi

i.e. when the Group's intention is not using singularity when not using <attr name="GLIDEIN_Singularity_Use" ... value="REQUIRED"/>,
but if <file absfname="/var/lib/gwms-frontend/web-base/frontend/default_singularity_wrapper.sh" wrapper="True" comment="comment"/>
is still there, the user job would run inside singularity because default_sing_setup.sh will set use_singularity to OPTIONAL...

result> experiment as expected the user job ran outside singularity
result> even if there were default_sing_wrapper.sh, the user job would be running outside singularity
The following is an evidence that default_sing_setup.sh set use_singularity to NEVER
[root@fermicloud025 ~]# grep -i singularity  /tmp/glide_t9AElH/glidein_config
SINGULARITY_BIN /usr/bin
HAS_SINGULARITY False

#19 Updated by HyunWoo Kim over 1 year ago

  • Status changed from Assigned to Feedback
  • Assignee changed from HyunWoo Kim to Marco Mambelli

Now, singularity_setup.sh fully follows the custom script guideline
so that when it(singularity_setup.sh) checks the existence of singularity binary and the default singularity image in cvmfs
and either of them fails, it(singularity_setup.sh) reports to glidein_startup.sh properly.
I also did some code clean-ups.
Now, I am tossing this to Marco Mambelli for feedback.

I also modified glideinwms.spec so that singularity_setup.sh will be included in the factory rpm.
I tested this in the jenkins build master..

#20 Updated by Marco Mambelli over 1 year ago

  • Status changed from Feedback to Assigned
  • Assignee changed from Marco Mambelli to HyunWoo Kim

Comments sent via email, needs changes

#21 Updated by HyunWoo Kim over 1 year ago

  • Status changed from Assigned to Resolved

Implemented comments from Marco Mambelli, tested them
and then merged my branch into branch_v3_2.

#22 Updated by Parag Mhashilkar over 1 year ago

  • Status changed from Resolved to Assigned

#23 Updated by HyunWoo Kim over 1 year ago

  • Status changed from Assigned to Feedback
  • Assignee changed from HyunWoo Kim to Marco Mambelli

Please review again.
The changes are based on email exchanges with Mats Rynge and Brian Bockelman.

#24 Updated by Marco Mambelli over 1 year ago

  • Assignee changed from Marco Mambelli to HyunWoo Kim

#25 Updated by Marco Mambelli over 1 year ago

  • Status changed from Feedback to Resolved

latest v3/13807_2 has been merged

#26 Updated by HyunWoo Kim over 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF