- Table of contents
- Jobsub Environment Variables
- Jobsub Generated Classad Attributes
- Jobsub User Job Execution Flow
Jobsub Environment Variables¶
Variable Name | Type | Definition | Example / Remarks | |
---|---|---|---|---|
CONDOR_DIR_INPUT | string | directory where -f <file> input ends up | $_CONDOR_JOB_IWD/no_xfer/0/TRANSFERRED_INPUT_FILES | |
CONDOR_EXEC | string | DO NOT USE leftover code from jobsub_tools/gpsn01 | WILL BE REMOVED SOON | |
CONDOR_TMP | string | DO NOT USE leftover code from jobsub_tools/gpsn01 | WILL BE REMOVED SOON | |
EXPERIMENT | string | VO or experiment that submitted job, needed by IFDH | nova | |
DAGMANJOBID | integer | cluster id of DAG controller | ||
GRID_USER | string | uid of person or script that submitted the job | dbox | |
HAS_USAGE_MODEL | string | set by --resource-provides=usage_model to steer job | DEDICATED | |
IFDH_BASE_URI | string | used by ifdh | http://samweb.fnal.gov:8480/sam/nova/api | |
IFOS_installed | string | set by --OS, steers to SL5,SL6,etc | SL6 | |
JOBSUBJOBID | string | unique ID of job, $CLUSTER.PROCESS@submitting_schedd | 3450190.0@fifebatch2.fnal.gov | |
JOBSUBJOBSECTION | string | sequential number for jobs in a DAG created using --maxConcurrent | ||
JOBSUBPARENTJOBID | string | $DAGMANJOBID@submitting_schedd | ||
JOBSUB_EXE_SCRIPT | string | job that user submitted | /printenv.sh | |
JOBSUB_MAX_JOBLOG_SIZE | string | if set, max size (bytes) of stdout/stderr from job | see feature #7072 | |
JOBSUB_MAX_JOBLOG_HEAD_SIZE | string | if set, how many bites of beginning of stdout/stderr output from job to include prior to truncation | see feature #7072 | |
JOBSUB_MAX_JOBLOG_TAIL_SIZE | string | if set, how many bites of tail of stdout/stderr output from job to include prior to truncation | see feature #7072 | |
JSB_TMP | string | a tmp dir used by jobsub for utility scripts | $_CONDOR_JOB_IWD/jsb_tmp | |
LOGNAME | string | uid that job runs under on grid node | novagli | |
OSG_APP | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | /grid/app | |
OSG_DATA | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | /grid/data | |
OSG_SITE_READ | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | dcap://fndca1.fnal.gov:24525//pnfs/fnal.gov/usr/fermigrid/volatile | |
OSG_SITE_WRITE | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | srm://fndca1.fnal.gov:8443/pnfs/fnal.gov/usr/fermigrid/volatile | |
OSG_SQUID_LOCATION | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | squid.fnal.gov:3128 | |
OSG_STORAGE_ELEMENT | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | True | |
OSG_WN_TMP | string | see https://twiki.grid.iu.edu/bin/view/Documentation/Release3/EnvironmentVariables | /local/stage1/disk7/dir_22327 | |
SAM_DATASET | string | |||
SAM_GROUP | string | |||
SAM_PROJECT | string | |||
SAM_PROJECT_NAME | string | |||
SAM_STATION | string | |||
SAM_USER | string | |||
TEMP | string | synonym for $_CONDOR_SCRATCH_DIR | ||
TMP | string | synonym for $_CONDOR_SCRATCH_DIR | ||
TMPDIR | string | synonym for $_CONDOR_SCRATCH_DIR | ||
WN_OS | string | same as IFOS_installed | SL6 | |
X509_CERT_DIR | string | /etc/grid-security/certificates/ | ||
X509_USER_CERT | string | |||
X509_USER_KEY | string | |||
X509_USER_PROXY | string | |||
_CONDOR_JOB_IWD | string | initial working directory that job lands in | ||
_CONDOR_SCRATCH_DIR | string | 'scratch' or 'tmp' directory, files placed here will not be transferred back |
$_CONDOR_JOB_IWD/no_xfer |
|
Jobsub Generated Classad Attributes¶
Variable Name | Type | Definition | Example / Remarks |
---|---|---|---|
+AccountingGroup | string | "group_nova.dbox" | |
+GeneratedBy | string | "jobsub_tools v1_3_12 -f Linux+2 -z /fnal/ups/db fifebatch2.fnal.gov" | |
+JobsubClientDN | string | "/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=Dennis D. Box/CN=UID:dbox" | |
+JobsubClientIpAddress | string | "131.225.154.204" | |
+JobsubServerVersion | string | "1.1.4.1" | |
+JobsubClientVersion | string | "1.1.5-rc1" | |
+JobsubClientKerberosPrincipal | string | "dbox@FNAL.GOV" | |
+Jobsub_Group | string | "nova" |
Jobsub User Job Execution Flow¶
Preliminary¶
- The user job will land in $_CONDOR_JOB_IWD on the grid worker node it executes on.
- It will be known to the HTCondor schedd that submitted it as $CLUSTER.$PROCESS .
- It will be known to the jobsub server as $JOBSUBJOBID .
- If the job is part of a DAG, $DAGMANJOBID will reference the $JOBSUBJOBID of the parent DAG.
Execution¶
Jobsub will wrap the job in a 'wrapper script' that will do the following:
- handle job input
- an ifdh wrapper script will be installed at $JSB_TMP/ifdh.sh. This is done so jobsub can set up and use ifdh without trashing any ifdh environment that the users job expects.
- $IFDH_BASE_URI will be defined and exported for your job. This variable must be defined for ifdh and samweb to work properly.
- if --tar_file_name was specified, the tar file will land in $INPUT_TAR_FILE on the worker node via the default htcondor file transfer mechanism where it will be untarred.
- any file for which -f was specified, will end up in $CONDOR_DIR_INPUT via ifdh
- if --dataset_definition was specified, its value will be exported as $SAM_DATASET
- the following variables will then be automatically defined and exported to the user job. They can be overridden with the jobsub_submit -e option:
- $SAM_PROJECT
- $SAM_STATION
- $SAM_USER (default is user id that ran jobsub_submit)
- $SAM_GROUP (default is input value of --group )
- prior to running user jobs a DAG will be generated. It will start a SAM project with the following ifdh commands that so the user jobs can pull files from the SAM project:
- ${JSB_TMP}/ifdh.sh describeDefinition $SAM_DATASET
- ${JSB_TMP}/ifdh.sh startProject $SAM_PROJECT $SAM_STATION $SAM_DATASET $SAM_USER $SAM_GROUP
- the following variables will then be automatically defined and exported to the user job. They can be overridden with the jobsub_submit -e option:
- users job execution
- wrapper script will change directories to $_CONDOR_SCRATCH_DIR ($_CONDOR_JOB_IWD/no_xfer) . This is legacy behavior from gpsn01 which probably should be changed, but needs to be done carefully so as not to break existing scripts. This directory can be also be referenced by $TEMP , $TEMP, $TMPDIR, and $OSG_WN_TMP per past user requests. Redefining OSG variables is now considered bad practice.
- #7768 jobsub_tools should not redefine OSG_WN_TMP
- $JOBSUB_EXE_SCRIPT (the user job) will be executed and its return status saved, it will be the exit status of the wrapper script
- $GRID_USER will be the username ($USER) on the machine where jobsub_submit was invoked
- $EXPERIMENT will be whatever was specified with -G during jobsub_submit. NB this environment variable is used by ifdh.
- if a SAM project has been started via --dataset_definition, the project URL will be $IFDH_BASE_URI/projects/$SAM_STATION/$SAM_PROJECT
- this URL can also be found with the command ifdh findProject $SAM_PROJECT
- example ifdh dumpProject $IFDH_BASE_URI/projects/$SAM_STATION/$SAM_PROJECT will give you lots of information about your project
- wrapper script will change directories to $_CONDOR_SCRATCH_DIR ($_CONDOR_JOB_IWD/no_xfer) . This is legacy behavior from gpsn01 which probably should be changed, but needs to be done carefully so as not to break existing scripts. This directory can be also be referenced by $TEMP , $TEMP, $TMPDIR, and $OSG_WN_TMP per past user requests. Redefining OSG variables is now considered bad practice.
- handle job output
- any directories for which -d was specified will be copied out via ifdh
- the size of stdout and stderr will be checked against $JOBSUB_MAX_JOBLOG_SIZE . If it is too large it will be truncated using the algorithm described here: jobsub_max_joblog_head_size
- if a SAM project was defined, it will be closed with ifdh endProject command