Project

General

Profile

Bug #24381

condor_chirp (old non Python) not working for Singularity jobs

Added by Marco Mambelli 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Immediate
Category:
Glidein
Target version:
Start date:
05/06/2020
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

As noted by Kenyi (CMS operator) condor_chirp is not working correctly for Singularity jobs using the GWMS wrapper.
condor_chirp is used extensively by CMS.
This needs to be fixed and a hotfix for the production version 3.6.2 should be provided.

Hi Marco and Marco,

I recently noticed condor_chirp doesn't seem to appear inside the singularity container anymore. You can find an example from a worker running at UCSD.

WMAgent uses condor_chirp to set some classads related to the cmsRun exitcode, exception messages, etc. Gridpacks also use condor_chirp in CMS Connect (which is how I found out about this issue).

The old wrapper used to bring those from the pilot here:
https://gitlab.cern.ch/CMSSI/CMSglideinWMSValidation/blob/master/singularity_wrapper.sh#L161-167

is the new GlideinWMS implementation doing anything with respect to condor_chirp?

[1]
Operating System:
CentOS release 6.10 (Final)
hostname:
sdsc-29.t2.ucsd.edu
Site:
GLIDEIN_CMSSite=T2_US_UCSD
which condor_chirp
which: no condor_chirp in (/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.49/el6-x86_64/usr/bin:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.49/el6-x86_64/usr/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin)
ls -l /srv
total 32
-rw-r--r--. 1 cuser5 cuser5     5 May  5 17:21 _condor_scratch_dir.txt
-rw-r--r--. 1 cuser5 cuser5   379 May  5 17:21 _condor_stderr
-rw-r--r--. 1 cuser5 cuser5   710 May  5 17:21 _condor_stdout
-rwxr-xr-x. 1 cuser5 cuser5  1236 May  5 17:21 condor_exec.exe
-rw-r--r--. 1 cuser5 cuser5    12 May  5 17:21 hellov1.txt
-rw-------. 1 cuser5 cuser5 11170 May  5 17:21 x509up_u24239

Best regards,
Kenyi

Looking at the code, the files are not copied within the Singularity image.
The fix should be applied before the merge of [#24294] and [#24295] but after [#24244], 24963469a7286530f97cf4c2a84567e5fdd926bf.
And a hotfix to 3.6.2 should be prepared (download files can be attached to this ticket).

default_singularity_wrapper.gwms24381_20200506.patch (1.22 KB) default_singularity_wrapper.gwms24381_20200506.patch Marco Mambelli, 05/06/2020 06:55 PM
default_singularity_wrapper.sh (23 KB) default_singularity_wrapper.sh Marco Mambelli, 05/06/2020 06:55 PM
default_singularity_wrapper.sh (24.1 KB) default_singularity_wrapper.sh CMS version of the singularity wrapper Marco Mambelli, 05/07/2020 09:53 AM

History

#1 Updated by Marco Mambelli 2 months ago

Fix in v36/24381
The file changed is default_singularity_wrapper.sh (there is a change also in creation/web_base/singularity_lib.sh b/creation/web_base/singularity_lib.sh but is only a comment, it should be ignored for the hotfix)

Here attached are the drop-in replacement (default_singularity_wrapper.sh) and the patch file (default_singularity_wrapper.gwms24381_20200506.patch).

To apply the hotfix:
  • replace /var/lib/gwms-frontend/web-base/frontend/default_singularity_wrapper.sh with the provided default_singularity_wrapper.sh file
  • run the upgrade command:
    • stop the Frontend
    • run /usr/sbin/gwms-frontend upgrade
    • start the Frontend

If you copied the singularity wrapper in a custom location and invoke that wrapper using the file section configuration, then you should update also that file and run the upgrade command.
Please contact GlideinWMS support for help or clarification.

#2 Updated by Marco Mambelli 2 months ago

Here is a version of the singularity wrapper including the sourcing of the osg_wn software, required by CMS images

The patching instructions are the same

#3 Updated by Marco Mascheroni about 1 month ago

  • Assignee changed from Marco Mascheroni to Marco Mambelli
  • Status changed from Feedback to Accepted

#4 Updated by Marco Mambelli about 1 month ago

  • Status changed from Accepted to Resolved


Also available in: Atom PDF