Project

General

Profile

Support #17662

singularity jobs each need a separate linux session to support restricted-access cvmfs

Added by Dave Dykstra almost 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
09/06/2017
Due date:
% Done:

0%

Estimated time:
Stakeholders:

CMS

Duration:

Description

There's a new and relatively little used (so far) feature in cvmfs that limits access to cvmfs repositories to those processes with an acceptable voms proxy. The /cvmfs/ligo.osgstorage.org repository has been using this for a while, and starting next Tuesday there should be a /cvmfs/cms.osgstorage.org repository. The latter will require a CMS voms proxy to even access the metadata of file and directory names (ligo uses it only to limit access to the data in the files). The way the protection works in CVMFS, every process within the same linux session shares the voms proxy permissions. As a result, when using singularity to separate jobs running under the same pilot, we need to make sure that each job is in a separate session by using the setsid() system call. I will suggest some possible implementations in a followup comment.

History

#1 Updated by Dave Dykstra almost 2 years ago

It's conceivable that condor could take care of this in its startd or starter process, which as I understand it are the condor processes that are above singularity in the process tree. Brian Bockelman confirmed that setsid() is not currently being invoked by condor anywhere, so it would need to change to support this. In addition, in my experiments making use of the setsid command to invoke singularity, using it changes the parentage so the process running under it becomes a child of the init process number 1, which is not very desirable. On the other hand, I confirmed that this does not automatically happen when using the setsid() system call; it only happens if a parent process exits, leaving a child an orphan, so this could probably be implemented as a condor feature.

If we want to do it without a change to condor, I also found that invoking the setsid command inside of a singularity container does not lead to an orphan process; there the process tree continues to show the user job processes under singularity when setsid is used. The only problem with using setsid alone under singularity is that signal connections get lost; a signal to the session that singularity is in does not automatically get sent to the child session. So one possibility to handle this is to use a process that traps and forwards signals in addition to invoking setsid. For example the following script seems to work in my testing:

#!/bin/bash

SID="" 
sighandler()
{
    if [ -n "$SID" ]; then
        kill -$1 -$SID
    fi
}

let SIG=1
while [ $SIG -le 15 ]; do
    trap "sighandler $SIG" $SIG
    let SIG+=1
done

setsid "$@" &
SID=$!
wait $SID

#2 Updated by Dave Dykstra almost 2 years ago

A third alternative is to add an option to singularity to do setsid and forward signals.

#3 Updated by Marco Mambelli almost 2 years ago

In short, you can already have new sessions, started by condor, by setting GLIDEIN_Use_PGroups to true

More in detail:
Condor already provides this using USE_PROCESS_GROUPS which invokes setsid:
A boolean value that defaults to True. When False, HTCondor daemons on Unix machines will not create new sessions or process groups. HTCondor uses processes groups to help it track the descendants of processes it creates. This can cause problems when HTCondor is run under another job execution system.
http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#25253

This option is true by default in condor but glideinwms sets it to false by default because of the concerns mentioned. I have to investigate more what they are and will update.
It can be changed to true by setting GLIDEIN_Use_PGroups, see http://glideinwms.fnal.gov/doc.prd/factory/custom_vars.html

Another question after reading condor's DISCARD_SESSION_KEYRING_ON_STARTUP (see below) I was wondering if a new session is needed or different keyrings are sufficient.

Thanks, Marco

DISCARD_SESSION_KEYRING_ON_STARTUP
A boolean value that defaults to True. When True, the condor_master daemon will replace the kernel session keyring it was invoked with with a new keyring named htcondor. Various Linux system services, such as OpenAFS and eCryptFS, use the kernel session keyring to hold passwords and authentication tokens. By replacing the keyring on start up, the condor_master ensures these keys cannot be unintentionally obtained by user jobs.

#4 Updated by Brian Bockelman almost 2 years ago

Process groups are not sessions (multiple processes are in a process group; multiple process groups are in a session); CVMFS really needs new sessions.

As far as I can tell, the referenced HTCondor knob affects process groups.

#5 Updated by Marco Mambelli almost 2 years ago

the condor team said that they they call "setsid" when this knob is on.
Dave D said that setsid is all is needed to set a new session.

#6 Updated by Marco Mambelli almost 2 years ago

Here a follow-up from Todd (condor team) about potential problems:
"""Imagine HTCondor is running under another batch system, and this batch system relies on process sessions created by calls to setsid() to kill the job. For example, imagine PBS is implemented to create a new process session when starting a job, and that when it wants to kill a job it sends signal SIGKILL (9) to all processes in the session. If HTCondor is configured to start a new session for the job, the result is PBS would send a signal 9 to all the HTCondor daemons but not to the HTCondor job itself, resulting in the job being "leaked".
Note I have no idea if PBS behaves this way or not, just using it as potential example."""

#7 Updated by Parag Mhashilkar over 1 year ago

  • Assignee set to Marco Mambelli
  • Target version set to v3_2_22

Any updates on the this issue? Do we need to set this parameter or does HTCondor does it?

#8 Updated by Dave Dykstra over 1 year ago

I looked in the current master branch of condor, and I see USE_PROCESS_GROUPS invoking setsid(). Brian, would it be an acceptable solution for CMS to set the GlideinWMS option GLIDEIN_Use_PGroups=true?

The parameter DISCARD_SESSION_KEYRING_ON_STARTUP invokes syscall(__NR_keyctl, KEYCTL_JOIN_SESSION_KEYRING, "htcondor"). That only clears the htcondor sesion keyring, which has nothing to do with the cvmfs authorizations. It's unclear to me if cvmfs uses kernel session keyrings at all.

#9 Updated by Marco Mambelli over 1 year ago

  • Target version changed from v3_2_22 to v3_2_23

#10 Updated by Marco Mambelli about 1 year ago

  • Target version changed from v3_2_23 to v3_4_0

#11 Updated by Dave Dykstra about 1 year ago

Today I verified (with Marco's help) that GLIDEIN_Use_PGroups=true does what we want. First without the setting I ran two jobs in a row, one with "use_x509userproxy = True" in the condor submit file and one without, and both could access /cvmfs/ligo.osgstorage.org/test_access/access_dev. With GLIDEIN_Use_PGroups=true, the second job got Permission denied.

#12 Updated by Marco Mambelli about 1 year ago

Add to the condor variable defaults something like:
GLIDEIN_Use_PGroups C HAS_SINGULARITY=?=True USE_PROCESS_GROUPS N N -
would set the default value to HAS_SINGULARITY=?=True in the schedd config, allowing VOs to override the attribute but not requiring them to know about it and have things work as long as they use HAS_SINGULARITY to decide whether to use singularity or not.

Have to verify that HAS_SINGULARITY=?=True is something that can go in the configuration or =?= is only for classads and some other expression needs to be used.

#13 Updated by Marco Mambelli about 1 year ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Lorena Lobato Pardavila

Changes in v34/17662

#14 Updated by Marco Mambelli about 1 year ago

  • Assignee changed from Lorena Lobato Pardavila to Marco Mascheroni

#15 Updated by Marco Mambelli about 1 year ago

  • Status changed from Feedback to Work in progress
  • Assignee changed from Marco Mascheroni to Marco Mambelli

#16 Updated by Marco Mambelli about 1 year ago

  • Status changed from Work in progress to Feedback
  • Assignee changed from Marco Mambelli to Marco Mascheroni

#17 Updated by Marco Mascheroni about 1 year ago

  • Assignee changed from Marco Mascheroni to Marco Mambelli

Looks good to me

#18 Updated by Marco Mambelli about 1 year ago

Follow-up ticket [#20031]

#19 Updated by Marco Mambelli about 1 year ago

  • Status changed from Feedback to Resolved

#20 Updated by Marco Mambelli about 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF