Project

General

Profile

jobsub_submit

Options Syntax

The syntax of the 'new' client has deliberately been kept close to that of the 'old' jobsub client, but some changes were necessary. Recall that the 'old' client had a syntax like this:

jobsub (jobsub options) user_script (user_script_args)

The syntax for sending a job with the 'new' client is

jobsub_submit (jobsub client options) (jobsub server options) file://<path_to_user_script> (user_script_args)

User script is supplied in the form of a URI and should be preceded by file://
Path to the user script can be in the form of relative path or full path

Client Options

To see the jobsub_client_options, type jobsub_submit -h:


$ jobsub_submit -h

Usage: jobsub_submit [Client Options] [Server Options] file://user_script [user_script_args]

Provide --group and --jobsub-server to see full help

Options:
  --version             show program's version number and exit

  Client Options:
    -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup>
                        Group/Experiment/Subgroup for priorities and
                        accounting
    --role=<VOMS Role>  VOMS Role for priorities and accounting
    --dag               submit and run a dagNabbit input file
    --jobsub-server=<JobSub Server>
                        Alternate location of JobSub server to use
    --dropbox-server=<Dropbox Server>
                        Alternate location of Dropbox server to use
    --json-config=<JSON submit config file>
                        a JSON dict file of jobsub_submit options and values
    --tarball-exclusion-file=<tarball exclusion regex file>
                        A file of python regex's to exclude from  tarballs.
                        Use with --tar_file_name tardir:// Default exclusion
                        list includes .git and .svn directories, pdf, eps, log
                        files, and human readable images.
    --debug             Print debug messages including server contacted, http
                        response, response time
    --jobid-output-only
                        Return only jobsub jobid in response to a successful
                        submission
    -h, --help          Show this help message and exit

REQUIRED arguments are (--group AND  file://[your_grid_job_here]). Please
direct questions, comments, or problems to the service desk

$ 

json-config file syntax

  • a json-config file is a json dictionary of key:value pairs
    • example { "key1": "val1", "key2":"val2" }
    • keys are double quoted strings
    • values may be:
      • double quoted strings
      • in some instances lists of double quoted strings i.e. [ "val3", "val4", "val5" ]
  • keys are one of 3 types:
    • (1) jobsub_submit input flags
      • examples with double-quoted strings as corresponding values
        • "--group":"nova",
        • "--debug":"True",
      • From the online help for jobsub_submit: (jobsub_submit -G nova -h)
      • "You can have as many instances of -c, -d, -e, -f, and -l as you need."
      • for this type of input flag use a list of double quoted strings example:
        • "-l": [ "+foo=\\\"bar\\\"", "+baz=\\\"boing\\\"" ] ,
    • (2) comments
      • comments are key:value pairs where the first character of the key is '#" . They do not make their way into the jobsub_submit submission
        • "# this is a comment " : "",
        • "# this is another comment","",
    • (3) the path to the user submission. The value corresponding to this key type is the argument or arguments for the user job.
      • example 1 with single arg for user_job.sh
        • "file://path/to/my/user_job.sh": "arg1",
      • example2 with list of args for user_job.sh
        • "file://path/to/my/user_job.sh": [ "arg1", "arg2", "arg3", "arg4" ],

An example input file

$ cat jobsub_submit.json
{
  "# example --json-config input file": "",
  "# used as input for jobsub_submit ": "",
  "# every line in this file is a dict key:value pair": "",
  "# dict keys are jobsub_submit flags ":"",
  "# dict values get passed on to jobsub_submit as the input value of that flag":"",
  "# An exception is for dict keys that begin with '#' ":"",
  "# these are ignored by jobsub_submit and can used as comments. ":"",
  "# Some jobsub_submit flags such as --help or --debug do not expect a value parameter":"",
  "# to be passed to them.  To use these in a json file give them a value":"",
  "# of True  for example":"",
  "# ":"",
  "--debug": "True",
  "#":"",
  "# One way to turn off such flags ": "",
  "# is do the following": "",
  "--version": "False",
  "--help": "False",
  "--jobid-output-only": "False",
  "# One can simply omit or comment out these flags as well": "",
  "#                                                  ": "",    
  "#                                                  ": "",    
  "# all other flags expect values which will be passed on ": "",
  "# to the jobsub server via jobsub_submit": "",
  "# values can be strings or lists":"",
  "# if a flag is used only once in jobsub_submit its value ":"",
  "# will be a string":"",
  "# examples":"",
  "--group": "nova",
  "--jobsub-server": "fermicloud042.fnal.gov",
  "-N": "10",
  "#":"         ",
  "#":"         ",
  "#":"         ",
  "#":"         ",
  "#":"flags such as  -c, -e, --environment, -l, and --lines ",
  "#":"can be repeated as many times as desired. To do this use lists  ",
  "#":"",
  "#":"an example of jobsub_submit --environment FOO=BAR --environment BAZ=BOING",
  "--environment": [ "FOO=BAR", "BAZ=BOING" ],
  "#":"",
  "# an example of":"jobsub_submit -c False=!=True -c True=!=False -c False=?=False ",
  "-c": [ "False=!=True",
          "True=!=False",
          "False=?=False" ],
  "# more examples":"",
  "-l": [ "+foo=\\\"bar\\\"", "+baz=\\\"boing\\\"" ] ,
  "--lines": [ "+foop=\\\"barp\\\"", "+bazp=\\\"boingp\\\"" ] ,
  "# lastly the job executable can be passed in this way": "",
  "# if it was not specified at the command line": "",
  "file:///home/dbox/nova_sleep.sh": "30",
  "#file:///home/dbox/nova_sleep.sh": [ "param1_for_nova_sleep.sh", 
                                       "param2",
                                       "param3", "etc" ],
  "# " : "" 
}

tarball-exclusion-file syntax

  • The tarball exclusion file is a newline delimited file of python regular expressions or comments.
  • A comment is any line that begins with a '#' in column 1
  • Any other line is a python regular expression

Here is an example exclusion file, showing the files that jobsub_client excludes by default:

# example of a jobsub tarball exclusion rule file
# the following rules are also the default if
# --tarball-exclusion-file is not specified
#
# exclude any file in a .git/  or .svn/ directory
\.git/
\.svn/
#exclude .core files
\.core$
# exclude emacs backups
\~.*$
# exclude pdfs and eps files
\.pdf$
\.eps$
# NO PICTURES OF CATS
\.png$
\.PNG$
\.gif$
\.GIF$
\.jpg$
\.jpeg$
\.JPG$
\.JPEG$
# no .log .out or .err files
\.log$
\.err$
\.out$
# no tarfiles or zipfiles
\.tar$
\.tgz$
\.zip$
\.gz$

Server Options

To see the jobsub_server options (aka the 'old' jobsub options), add the group information as well. Available options for the 'old' jobsub
varied by experiment. Here is an example for the minerva experiment:

$ jobsub_submit -G nova --help
Usage: jobsub_submit [Client Options] [Server Options] file://user_script [user_script_args]

Provide --group and --jobsub-server to see full help

Options:
  --version             show program's version number and exit

  Client Options:
    -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup>
                        Group/Experiment/Subgroup for priorities and
                        accounting
    --role=<VOMS Role>  VOMS Role for priorities and accounting
    --dag               submit and run a dagNabbit input file
    --jobsub-server=<JobSub Server>
                        Alternate location of JobSub server to use
    --dropbox-server=<Dropbox Server>
                        Alternate location of Dropbox server to use
    --tarball-exclusion-file=<tarball exclusion regex file>
                        A file of python regex's to exclude from  tarballs.
                        Use with --tar_file_name tardir:// Default exclusion
                        list includes .git and .svn directories, pdf, eps, log
                        files, and human readable images.
    --debug             Print debug messages including server contacted, http
                        response, response time
    --jobid-output-only
                        Return only jobsub jobid in response to a successful
                        submission
    -h, --help          Show this help message and exit

REQUIRED arguments are (--group AND  file://[your_grid_job_here]). Please
direct questions, comments, or problems to the service desk

  Server Options:
  --version             show program's version number and exit
    -h, --help            show this help message and exit

    Generic Options:
      --timeout=NUMBER[UNITS]
                          kill user job if still running after NUMBER[UNITS] of
                          time . UNITS may be `s' for seconds (the default), `m'
                          for minutes, `h' for hours or `d' h for days.
      --expected-lifetime='short'|'medium'|'long'|NUMBER[UNITS]
                          Expected lifetime of the job.  Used to match against
                          resources advertising that they have
                          REMAINING_LIFETIME seconds left.  The shorter your
                          EXPECTED_LIFTIME is, the more resources (aka slots,
                          cpus) your job can potentially match against and the
                          quicker it should start.  If your job runs longer than
                          EXPECTED_LIFETIME it *may* be killed by the batch
                          system.  If your specified  EXPECTED_LIFETIME is too
                          long your job may take a long time to match against  a
                          resource a sufficiently long REMAINING_LIFETIME.
                          Valid inputs for this parameter are 'short', 'medium',
                          'long', or NUMBER[UNITS] of time.  IF [UNITS] is
                          omitted, value is NUMBER  seconds. Allowed values for
                          UNITS are 's', 'm', 'h', 'd' representing seconds,
                          minutes, etc.The values for 'short','medium',and
                          'long' are configurable by Grid Operations, they
                          currently are '3h' , '8h' , and '85200s' but this may
                          change in the future. Default value of
                          EXPECTED_LIFETIME is currently '8h' .
      --maxConcurrent=MAXCONCURRENT
                            max number of jobs running concurrently at given
                          time. Use in  conjunction with -N option to protect a
                          shared resource.  Example: jobsub -N 1000
                          -maxConcurrent 20 will  only run 20 jobs at a time
                          until all 1000 have completed.  This is implemented by
                          running the jobs in a DAG. Normally when  jobs are run
                          with the -N option, they all have the same $CLUSTER
                          number and differing, sequential $PROCESS numbers, and
                          many submission  scripts take advantage of this.  When
                          jobs are run with this  option in a DAG each job has a
                          different $CLUSTER number and a  $PROCESS number of 0,
                          which may break scripts that rely on the  normal -N
                          numbering scheme for $CLUSTER and $PROCESS. Groups of
                          jobs run with this option will have the same
                          $JOBSUBPARENTJOBID,  each individual job will have a
                          unique and sequential  $JOBSUBJOBSECTION.  Scripts may
                          need modification to take this into account
      --disk=NUMBER[UNITS]
                          Request worker nodes have at least NUMBER[UNITS] of
                          disk space.    If UNITS is not specified default is
                          'KB' (a typo in earlier versions  said that default
                          was 'MB', this was wrong).  Allowed values for  UNITS
                          are 'KB','MB','GB', and 'TB'
      --memory=NUMBER[UNITS]
                            Request worker nodes have at least NUMBER[UNITS]  of
                          memory.  If UNITS is not specified default is 'MB'.
                          Allowed values for  UNITS are 'KB','MB','GB', and 'TB'
      --cpu=NUMBER          request worker nodes have at least NUMBER cpus
      --drain               mark this job to be allowed to be drained or killed
                          during  downtimes
      --schedd=SCHEDD     name of alternate schedd to submit to
      --OS=OS               specify OS version of worker node. Example  --OS=SL5
                          Comma seperated list  '--OS=SL4,SL5,SL6' works as well
                          . Default is any available OS
      --show-parsing        print out how command line was parsed into argv
                          list and exit.  Useful for seeing how quotes in
                          options are parsed)
      --generate-email-summary
                            generate and mail a summary report of
                          completed/failed/removed  jobs in a DAG
      --email-to=NOTIFY_USER
                            email address to send job reports/summaries to
                          (default is $USER@fnal.gov)
      -G ACCOUNTINGGROUP, --group=ACCOUNTINGGROUP
                            Group/Experiment for priorities and accounting
      --subgroup=SUBGROUP
                            Subgroup for priorities and accounting. See
                          https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/
                          Jobsub_submit#Groups-Subgroups-Quotas-Priorities  for
                          more documentation on using --subgroup to set job
                          quotas and priorities
      -v, --verbose         dump internal state of program (useful for
                          debugging)
      --resource-provides=RESOURCE_PROVIDES
                          request specific  resources by changing condor jdf
                          file.  For example: --resource-provides=CVMFS=OSG
                          will add +CVMFS="OSG" to the job classad  attributes
                          and '&&(CVMFS=="OSG")' to the  job requirements
      -Q, --mail_never    never send mail about job results
      -q, --mail_on_error
                          send mail only when job fails due to error (default)
      -M, --mail_always   always  mail when job completes or fails
      -g, --grid          run job on the  FNAL GP  grid. Other flags can modify
                          target sites to include other  areas of the Open
                          Science Grid
      --nowrapfile        DISABLED:  formerly was 'do not generate shell
                          wrapper, disabled per request  from fermigrid
                          operations.  The wrapfiles used to not  work off
                          site, now they do.
      -c APPEND_REQUIREMENTS, --append_condor_requirements=APPEND_REQUIREMENTS
                          append condor requirements
      --overwrite_condor_requirements=OVERWRITEREQUIREMENTS
                          overwrite default  condor requirements with supplied
                          requirements
      --override=OVERRIDE
                          override some  other value: --override 'requirements'
                          'gack==TRUE' would produce  the same condor command
                          file as --overwrite_condor_requirements  'gack==TRUE'
                          if you want to use this option, test it first with -n
                          to see what you get as output
      -C                  execute on  grid from directory you are currently in
                          NOTE:WILL STOP WORKING SOON WHEN BLUARC  UNMOUNTED
                          FROM WORKER NODES
      -e ENV_VAR, --environment=ENV_VAR
                          -e  ADDED_ENVIRONMENT exports this variable with its
                          local  value to worker node environment. For example
                          export FOO="BAR";  jobsub -e FOO <more stuff>
                          guarantees that the value of $FOO on  the worker node
                          is "BAR" .  Alternate format which does not require
                          setting the env var first is the -e VAR=VAL, idiom
                          which  sets the value of $VAR to 'VAL' in the worker
                          environment. The  -e  option can be used as many times
                          in one jobsub_submit  invocation as desired
      --site=COMMA,SEP,LIST,OF,SITES
                          submit jobs to these sites
      --blacklist=COMMA,SEP,LIST,OF,SITES
                          ensure that jobs do not land at these sites
      -n, --no_submit     generate  condor_command file but do not submit
      -N NUM              submit N copies  of this job. Each job will  have
                          access to the environment variable  $PROCESS that
                          provides the job number (0 to  NUM-1), equivalent to
                          the number following the decimal point in  the job ID
                          (the '2' in 134567.2).

    File Options:
      -l "line", --lines="line" 
                          [Expert option]  Add  "line" to the Condor  submission
                          (.cmd) file, typically as a classad attribute.   See
                          the HTCondor documentation  for more.
      -L JOBLOGFILE, --log_file=JOBLOGFILE
                          Log file to hold log output from job.
      --compress_log      Compress --log_file output .
      --no_log_buffer     write log file  directly to disk. DOES NOT WORK WITH
                          PNFS, WILL NOT WORK WITH  BLUEARC WHEN IT IS UNMOUNTED
                          FROM WORKER NODES VERY SOON.  Default is to copy it
                          back after job is completed.  This option is useful
                          for debugging but can be VERY DANGEROUS as  joblogfile
                          typically is sent to bluearc.  Using this option
                          incorrectly can cause all grid submission systems at
                          FNAL to  become overwhelmed resulting in angry admins
                          hunting you down, so  USE SPARINGLY.
      --use_gftp          use grid-ftp to transfer file back
      --tar_file_name=
                                dropbox://PATH/TO/TAR_FILE
                                tardir://PATH/TO/DIRECTORY

                          specify TAR_FILE  or DIRECTORY to be  transferred to
                          worker node. TAR_FILE will be copied   to an area
                          specified in the jobsub server configuration,
                          transferred to the job and unpacked there.  TAR_FILE
                          will   be accessible to the user job on the worker
                          node via the   environment variable $INPUT_TAR_FILE.
                          The unpacked   contents will be in the same directory
                          as   $INPUT_TAR_FILE.
      -f INPUT_FILE       at runtime, INPUT_FILE will be copied to directory
                          $CONDOR_DIR_INPUT on the execution node.  Example :-f
                          /grid/data/minerva/my/input/file.xxx  will be copied
                          to $CONDOR_DIR_INPUT/file.xxx  Specify as many -f
                          INPUT_FILE_1 -f INPUT_FILE_2  args as you need.  To
                          copy file at submission time  instead of run time, use
                          -f dropbox://INPUT_FILE to  copy the file.
      -d OUTPUT_DIR_ARRAY
                            -d<tag> <dir>  Writable directory $CONDOR_DIR_<tag>
                          will  exist on the execution node.  After job
                          completion,  its contents will be moved to <dir>
                          automatically  Specify as many <tag>/<dir> pairs as
                          you need.

    SAM Options:
      --dataset_definition=DATASET_DEFINITION
                            SAM dataset definition used in a Directed Acyclic
                          Graph (DAG)
      --project_name=PROJECT_NAME
                          optional project name for SAM DAG

    Nova Specific Options:
      -i RELDIR           release_directory for Nova Software
      -t TESTRELDIR       release_directory for test Nova Software
      -r REL              release_version for  Nova Software

          NOTES
          You can have as many instances of -c, -d, -e, -f, -l and -y as you need.

          The -d directory mapping works on non-Grid nodes, too.

          export IFDH_VERSION=some_version then use -e IFDH_VERSION to use
          the (some_version) release of ifdh for copying files in and out
          with -d and -f flags instead of the current ifdh version in KITS

          More documentation is available at
          https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client
          address questions or problems to the service desk

Almost all of the original jobsub args are supported, there are some that make no sense in the new architecture and are 'no ops' but are retained to aid in porting of user submission scripts.

Absolutely Needed Options: --group and --resource-provides

Accounting group option --group

This is needed for authorization in the Fermilab VOMS.

The URL https://fifebatch.fnal.gov:8443/jobsub/acctgroups/ shows which groups are currently supported

URL output on 5/31/18:


        admx
        annie
        argoneut
        captmnv
        cdf
        cdms
        chips
        coupp
        darkside
        des
        dune
        dzero
        fermilab
        genie
        gm2
        icarus
        lar1
        lar1nd
        lariat
        lsst
        marsaccel
        marsgm2
        marslbne
        marsmu2e
        minerva
        miniboone
        minos
        mu2e
        next
        numix
        noble
        nova
        patriot
        sbnd
        seaquest
        test
        uboone

Steering jobs with --resource-provides

  • This information is specific to SERVER https://fifebatch.fnal.gov:8443 as of June 2014
    • all slots on fifebatch are grid slots, there are no local batch. The -g flag is on by default
    • steering to quota slots, opportunistic slots, offsite, fermigrid and amazon is supported using --resource-provides.
    • The --opportunistic flag has no effect on fifebatch1
DESIRED COMPUTING RESOURCE COMMAND LINE OPTION
Amazon --resource-provides=usage_model=PAID_CLOUD
Fermigrid (Dedicated/Slots with Quota) --resource-provides=usage_model=DEDICATED
Fermigrid (Opportunistic Slots) --resource-provides=usage_model=OPPORTUNISTIC
Fermicloud --resource-provides=usage_model=FERMICLOUD
OSG --resource-provides=usage_model=OFFSITE

Example: Submitting a job to quotaed worker nodes Fermigrid

$ jobsub_submit -G nova --resource-provides=usage_model=DEDICATED file://$HOME/nova.sh 1
Server response code: 200
Response OUTPUT:
/fife/local/scratch/uploads/nova/dbox/2014-06-09_144727.804063_7006

/fife/local/scratch/uploads/nova/dbox/2014-06-09_144727.804063_7006/nova.sh_20140609_144728_12037_0_1.cmd

submitting....

Submitting job(s).

1 job(s) submitted to cluster 269.

JobsubJobId of first job: 269.0@fifebatch2.fnal.gov

Use job id 269.0@fifebatch2.fnal.gov to retrieve output

Remote Submission Processing Time: 0.379333019257 sec

Advance Use Cases

Following table lists command line options that can be used to specify additional job constraints

USE CASE OPTION
Add custom integer attribute in job's classad (e.g. my_int_var = 9) --lines=+my_int_var=9
Add custom string attribute in job's classad (e.g. my_str_var = "foo") --lines=+my_str_var="foo"

Submitting a Tar Ball

For a more detailed discussion, please see Tardir_and_dropbox_URIs

NOTE:there are many methods of sending input files to your job, such as using the -f input_file option, using wget to pull your files from a squid cache, or using condors transfer_input_file mechanism to upload a file to the worker node from the jobsub_server which is described below. If the server is not configured correctly for the transfer_input_files method you may overload it with network traffic causing submission problems for other users. Consult with the server operations staff prior to using this option heavily.

The jobsub_client input option --tar_file_name dropbox://(my_tar_file_name) will upload (my_tar_file_name) to the dropbox configured on the jobsub server (usually an experiment's dCache scratch or resilient area), and then transfer the tarball to the resulting condor job. When the job is started on the worker node, the envrionment variable $INPUT_TAR_FILE will point to(my_tar_file_name). The tar ball will be automatically unwound in the same directory that the users job starts in, and its contents accessed or run through the relative directory paths that were copied into the tarball.

User Job Execution Flow and Jobsub Environment Variables

jobsub_enviornment_variables

Groups Subgroups, Quotas, Priorities

  • When a jobsub job is submitted, a condor classad attribute +AccountingGroup = "group_(experiment).(username)" ie, if user dbox in group nova submits a job +AccountingGroup = "group_nova.dbox" is used for assigning this job to group nova's quotas and priorities.
  • The --subgroup flag is used to modify this attribute, which affects job priority.
    • In the above example, if user dbox in group nova added --subgroup=high_prio to his submission, the resulting classad attribute would be +AccountingGroup="group_nova.high_prio.dbox" .