- Table of contents
- jobsub_submit
jobsub_submit¶
Options Syntax¶
The syntax of the 'new' client has deliberately been kept close to that of the 'old' jobsub client, but some changes were necessary. Recall that the 'old' client had a syntax like this:
jobsub (jobsub options) user_script (user_script_args)
The syntax for sending a job with the 'new' client is
jobsub_submit (jobsub client options) (jobsub server options) file://<path_to_user_script> (user_script_args)
User script is supplied in the form of a URI and should be preceded by file://
Path to the user script can be in the form of relative path or full path
Client Options¶
To see the jobsub_client_options, type jobsub_submit -h:
$ jobsub_submit -h Usage: jobsub_submit [Client Options] [Server Options] file://user_script [user_script_args] Provide --group and --jobsub-server to see full help Options: --version show program's version number and exit Client Options: -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup> Group/Experiment/Subgroup for priorities and accounting --role=<VOMS Role> VOMS Role for priorities and accounting --dag submit and run a dagNabbit input file --jobsub-server=<JobSub Server> Alternate location of JobSub server to use --dropbox-server=<Dropbox Server> Alternate location of Dropbox server to use --json-config=<JSON submit config file> a JSON dict file of jobsub_submit options and values --tarball-exclusion-file=<tarball exclusion regex file> A file of python regex's to exclude from tarballs. Use with --tar_file_name tardir:// Default exclusion list includes .git and .svn directories, pdf, eps, log files, and human readable images. --debug Print debug messages including server contacted, http response, response time --jobid-output-only Return only jobsub jobid in response to a successful submission -h, --help Show this help message and exit REQUIRED arguments are (--group AND file://[your_grid_job_here]). Please direct questions, comments, or problems to the service desk $
json-config file syntax¶
- a json-config file is a json dictionary of key:value pairs
- example { "key1": "val1", "key2":"val2" }
- keys are double quoted strings
- values may be:
- double quoted strings
- in some instances lists of double quoted strings i.e. [ "val3", "val4", "val5" ]
- keys are one of 3 types:
- (1) jobsub_submit input flags
- examples with double-quoted strings as corresponding values
- "--group":"nova",
- "--debug":"True",
- From the online help for jobsub_submit: (jobsub_submit -G nova -h)
- "You can have as many instances of -c, -d, -e, -f, and -l as you need."
- for this type of input flag use a list of double quoted strings example:
- "-l": [ "+foo=\\\"bar\\\"", "+baz=\\\"boing\\\"" ] ,
- examples with double-quoted strings as corresponding values
- (2) comments
- comments are key:value pairs where the first character of the key is '#" . They do not make their way into the jobsub_submit submission
- "# this is a comment " : "",
- "# this is another comment","",
- comments are key:value pairs where the first character of the key is '#" . They do not make their way into the jobsub_submit submission
- (3) the path to the user submission. The value corresponding to this key type is the argument or arguments for the user job.
- example 1 with single arg for user_job.sh
- "file://path/to/my/user_job.sh": "arg1",
- example2 with list of args for user_job.sh
- "file://path/to/my/user_job.sh": [ "arg1", "arg2", "arg3", "arg4" ],
- example 1 with single arg for user_job.sh
- (1) jobsub_submit input flags
An example input file
$ cat jobsub_submit.json { "# example --json-config input file": "", "# used as input for jobsub_submit ": "", "# every line in this file is a dict key:value pair": "", "# dict keys are jobsub_submit flags ":"", "# dict values get passed on to jobsub_submit as the input value of that flag":"", "# An exception is for dict keys that begin with '#' ":"", "# these are ignored by jobsub_submit and can used as comments. ":"", "# Some jobsub_submit flags such as --help or --debug do not expect a value parameter":"", "# to be passed to them. To use these in a json file give them a value":"", "# of True for example":"", "# ":"", "--debug": "True", "#":"", "# One way to turn off such flags ": "", "# is do the following": "", "--version": "False", "--help": "False", "--jobid-output-only": "False", "# One can simply omit or comment out these flags as well": "", "# ": "", "# ": "", "# all other flags expect values which will be passed on ": "", "# to the jobsub server via jobsub_submit": "", "# values can be strings or lists":"", "# if a flag is used only once in jobsub_submit its value ":"", "# will be a string":"", "# examples":"", "--group": "nova", "--jobsub-server": "fermicloud042.fnal.gov", "-N": "10", "#":" ", "#":" ", "#":" ", "#":" ", "#":"flags such as -c, -e, --environment, -l, and --lines ", "#":"can be repeated as many times as desired. To do this use lists ", "#":"", "#":"an example of jobsub_submit --environment FOO=BAR --environment BAZ=BOING", "--environment": [ "FOO=BAR", "BAZ=BOING" ], "#":"", "# an example of":"jobsub_submit -c False=!=True -c True=!=False -c False=?=False ", "-c": [ "False=!=True", "True=!=False", "False=?=False" ], "# more examples":"", "-l": [ "+foo=\\\"bar\\\"", "+baz=\\\"boing\\\"" ] , "--lines": [ "+foop=\\\"barp\\\"", "+bazp=\\\"boingp\\\"" ] , "# lastly the job executable can be passed in this way": "", "# if it was not specified at the command line": "", "file:///home/dbox/nova_sleep.sh": "30", "#file:///home/dbox/nova_sleep.sh": [ "param1_for_nova_sleep.sh", "param2", "param3", "etc" ], "# " : "" }
tarball-exclusion-file syntax¶
- The tarball exclusion file is a newline delimited file of python regular expressions or comments.
- A comment is any line that begins with a '#' in column 1
- Any other line is a python regular expression
Here is an example exclusion file, showing the files that jobsub_client excludes by default:
# example of a jobsub tarball exclusion rule file # the following rules are also the default if # --tarball-exclusion-file is not specified # # exclude any file in a .git/ or .svn/ directory \.git/ \.svn/ #exclude .core files \.core$ # exclude emacs backups \~.*$ # exclude pdfs and eps files \.pdf$ \.eps$ # NO PICTURES OF CATS \.png$ \.PNG$ \.gif$ \.GIF$ \.jpg$ \.jpeg$ \.JPG$ \.JPEG$ # no .log .out or .err files \.log$ \.err$ \.out$ # no tarfiles or zipfiles \.tar$ \.tgz$ \.zip$ \.gz$
Server Options¶
To see the jobsub_server options (aka the 'old' jobsub options), add the group information as well. Available options for the 'old' jobsub
varied by experiment. Here is an example for the minerva experiment:
$ jobsub_submit -G nova --help Usage: jobsub_submit [Client Options] [Server Options] file://user_script [user_script_args] Provide --group and --jobsub-server to see full help Options: --version show program's version number and exit Client Options: -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup> Group/Experiment/Subgroup for priorities and accounting --role=<VOMS Role> VOMS Role for priorities and accounting --dag submit and run a dagNabbit input file --jobsub-server=<JobSub Server> Alternate location of JobSub server to use --dropbox-server=<Dropbox Server> Alternate location of Dropbox server to use --tarball-exclusion-file=<tarball exclusion regex file> A file of python regex's to exclude from tarballs. Use with --tar_file_name tardir:// Default exclusion list includes .git and .svn directories, pdf, eps, log files, and human readable images. --debug Print debug messages including server contacted, http response, response time --jobid-output-only Return only jobsub jobid in response to a successful submission -h, --help Show this help message and exit REQUIRED arguments are (--group AND file://[your_grid_job_here]). Please direct questions, comments, or problems to the service desk Server Options: --version show program's version number and exit -h, --help show this help message and exit Generic Options: --timeout=NUMBER[UNITS] kill user job if still running after NUMBER[UNITS] of time . UNITS may be `s' for seconds (the default), `m' for minutes, `h' for hours or `d' h for days. --expected-lifetime='short'|'medium'|'long'|NUMBER[UNITS] Expected lifetime of the job. Used to match against resources advertising that they have REMAINING_LIFETIME seconds left. The shorter your EXPECTED_LIFTIME is, the more resources (aka slots, cpus) your job can potentially match against and the quicker it should start. If your job runs longer than EXPECTED_LIFETIME it *may* be killed by the batch system. If your specified EXPECTED_LIFETIME is too long your job may take a long time to match against a resource a sufficiently long REMAINING_LIFETIME. Valid inputs for this parameter are 'short', 'medium', 'long', or NUMBER[UNITS] of time. IF [UNITS] is omitted, value is NUMBER seconds. Allowed values for UNITS are 's', 'm', 'h', 'd' representing seconds, minutes, etc.The values for 'short','medium',and 'long' are configurable by Grid Operations, they currently are '3h' , '8h' , and '85200s' but this may change in the future. Default value of EXPECTED_LIFETIME is currently '8h' . --maxConcurrent=MAXCONCURRENT max number of jobs running concurrently at given time. Use in conjunction with -N option to protect a shared resource. Example: jobsub -N 1000 -maxConcurrent 20 will only run 20 jobs at a time until all 1000 have completed. This is implemented by running the jobs in a DAG. Normally when jobs are run with the -N option, they all have the same $CLUSTER number and differing, sequential $PROCESS numbers, and many submission scripts take advantage of this. When jobs are run with this option in a DAG each job has a different $CLUSTER number and a $PROCESS number of 0, which may break scripts that rely on the normal -N numbering scheme for $CLUSTER and $PROCESS. Groups of jobs run with this option will have the same $JOBSUBPARENTJOBID, each individual job will have a unique and sequential $JOBSUBJOBSECTION. Scripts may need modification to take this into account --disk=NUMBER[UNITS] Request worker nodes have at least NUMBER[UNITS] of disk space. If UNITS is not specified default is 'KB' (a typo in earlier versions said that default was 'MB', this was wrong). Allowed values for UNITS are 'KB','MB','GB', and 'TB' --memory=NUMBER[UNITS] Request worker nodes have at least NUMBER[UNITS] of memory. If UNITS is not specified default is 'MB'. Allowed values for UNITS are 'KB','MB','GB', and 'TB' --cpu=NUMBER request worker nodes have at least NUMBER cpus --drain mark this job to be allowed to be drained or killed during downtimes --schedd=SCHEDD name of alternate schedd to submit to --OS=OS specify OS version of worker node. Example --OS=SL5 Comma seperated list '--OS=SL4,SL5,SL6' works as well . Default is any available OS --show-parsing print out how command line was parsed into argv list and exit. Useful for seeing how quotes in options are parsed) --generate-email-summary generate and mail a summary report of completed/failed/removed jobs in a DAG --email-to=NOTIFY_USER email address to send job reports/summaries to (default is $USER@fnal.gov) -G ACCOUNTINGGROUP, --group=ACCOUNTINGGROUP Group/Experiment for priorities and accounting --subgroup=SUBGROUP Subgroup for priorities and accounting. See https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/ Jobsub_submit#Groups-Subgroups-Quotas-Priorities for more documentation on using --subgroup to set job quotas and priorities -v, --verbose dump internal state of program (useful for debugging) --resource-provides=RESOURCE_PROVIDES request specific resources by changing condor jdf file. For example: --resource-provides=CVMFS=OSG will add +CVMFS="OSG" to the job classad attributes and '&&(CVMFS=="OSG")' to the job requirements -Q, --mail_never never send mail about job results -q, --mail_on_error send mail only when job fails due to error (default) -M, --mail_always always mail when job completes or fails -g, --grid run job on the FNAL GP grid. Other flags can modify target sites to include other areas of the Open Science Grid --nowrapfile DISABLED: formerly was 'do not generate shell wrapper, disabled per request from fermigrid operations. The wrapfiles used to not work off site, now they do. -c APPEND_REQUIREMENTS, --append_condor_requirements=APPEND_REQUIREMENTS append condor requirements --overwrite_condor_requirements=OVERWRITEREQUIREMENTS overwrite default condor requirements with supplied requirements --override=OVERRIDE override some other value: --override 'requirements' 'gack==TRUE' would produce the same condor command file as --overwrite_condor_requirements 'gack==TRUE' if you want to use this option, test it first with -n to see what you get as output -C execute on grid from directory you are currently in NOTE:WILL STOP WORKING SOON WHEN BLUARC UNMOUNTED FROM WORKER NODES -e ENV_VAR, --environment=ENV_VAR -e ADDED_ENVIRONMENT exports this variable with its local value to worker node environment. For example export FOO="BAR"; jobsub -e FOO <more stuff> guarantees that the value of $FOO on the worker node is "BAR" . Alternate format which does not require setting the env var first is the -e VAR=VAL, idiom which sets the value of $VAR to 'VAL' in the worker environment. The -e option can be used as many times in one jobsub_submit invocation as desired --site=COMMA,SEP,LIST,OF,SITES submit jobs to these sites --blacklist=COMMA,SEP,LIST,OF,SITES ensure that jobs do not land at these sites -n, --no_submit generate condor_command file but do not submit -N NUM submit N copies of this job. Each job will have access to the environment variable $PROCESS that provides the job number (0 to NUM-1), equivalent to the number following the decimal point in the job ID (the '2' in 134567.2). File Options: -l "line", --lines="line" [Expert option] Add "line" to the Condor submission (.cmd) file, typically as a classad attribute. See the HTCondor documentation for more. -L JOBLOGFILE, --log_file=JOBLOGFILE Log file to hold log output from job. --compress_log Compress --log_file output . --no_log_buffer write log file directly to disk. DOES NOT WORK WITH PNFS, WILL NOT WORK WITH BLUEARC WHEN IT IS UNMOUNTED FROM WORKER NODES VERY SOON. Default is to copy it back after job is completed. This option is useful for debugging but can be VERY DANGEROUS as joblogfile typically is sent to bluearc. Using this option incorrectly can cause all grid submission systems at FNAL to become overwhelmed resulting in angry admins hunting you down, so USE SPARINGLY. --use_gftp use grid-ftp to transfer file back --tar_file_name= dropbox://PATH/TO/TAR_FILE tardir://PATH/TO/DIRECTORY specify TAR_FILE or DIRECTORY to be transferred to worker node. TAR_FILE will be copied to an area specified in the jobsub server configuration, transferred to the job and unpacked there. TAR_FILE will be accessible to the user job on the worker node via the environment variable $INPUT_TAR_FILE. The unpacked contents will be in the same directory as $INPUT_TAR_FILE. -f INPUT_FILE at runtime, INPUT_FILE will be copied to directory $CONDOR_DIR_INPUT on the execution node. Example :-f /grid/data/minerva/my/input/file.xxx will be copied to $CONDOR_DIR_INPUT/file.xxx Specify as many -f INPUT_FILE_1 -f INPUT_FILE_2 args as you need. To copy file at submission time instead of run time, use -f dropbox://INPUT_FILE to copy the file. -d OUTPUT_DIR_ARRAY -d <tag> <dir> Writable directory $CONDOR_DIR_<tag> will exist on the execution node. After job completion, its contents will be moved to <dir> automatically Specify as many <tag>/<dir> pairs as you need. SAM Options: --dataset_definition=DATASET_DEFINITION SAM dataset definition used in a Directed Acyclic Graph (DAG) --project_name=PROJECT_NAME optional project name for SAM DAG Nova Specific Options: -i RELDIR release_directory for Nova Software -t TESTRELDIR release_directory for test Nova Software -r REL release_version for Nova Software NOTES You can have as many instances of -c, -d, -e, -f, -l and -y as you need. The -d directory mapping works on non-Grid nodes, too. export IFDH_VERSION=some_version then use -e IFDH_VERSION to use the (some_version) release of ifdh for copying files in and out with -d and -f flags instead of the current ifdh version in KITS More documentation is available at https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client address questions or problems to the service desk
Almost all of the original jobsub args are supported, there are some that make no sense in the new architecture and are 'no ops' but are retained to aid in porting of user submission scripts.
Absolutely Needed Options: --group and --resource-provides¶
Accounting group option --group¶
This is needed for authorization in the Fermilab VOMS.
The URL https://fifebatch.fnal.gov:8443/jobsub/acctgroups/ shows which groups are currently supported
URL output on 5/31/18:
admx annie argoneut captmnv cdf cdms chips coupp darkside des dune dzero fermilab genie gm2 icarus lar1 lar1nd lariat lsst marsaccel marsgm2 marslbne marsmu2e minerva miniboone minos mu2e next numix noble nova patriot sbnd seaquest test uboone
Steering jobs with --resource-provides¶
- This information is specific to SERVER https://fifebatch.fnal.gov:8443 as of June 2014
- all slots on fifebatch are grid slots, there are no local batch. The -g flag is on by default
- steering to quota slots, opportunistic slots, offsite, fermigrid and amazon is supported using --resource-provides.
- The --opportunistic flag has no effect on fifebatch1
DESIRED COMPUTING RESOURCE | COMMAND LINE OPTION |
---|---|
Amazon | --resource-provides=usage_model=PAID_CLOUD |
Fermigrid (Dedicated/Slots with Quota) | --resource-provides=usage_model=DEDICATED |
Fermigrid (Opportunistic Slots) | --resource-provides=usage_model=OPPORTUNISTIC |
Fermicloud | --resource-provides=usage_model=FERMICLOUD |
OSG | --resource-provides=usage_model=OFFSITE |
Example: Submitting a job to quotaed worker nodes Fermigrid¶
$ jobsub_submit -G nova --resource-provides=usage_model=DEDICATED file://$HOME/nova.sh 1 Server response code: 200 Response OUTPUT: /fife/local/scratch/uploads/nova/dbox/2014-06-09_144727.804063_7006 /fife/local/scratch/uploads/nova/dbox/2014-06-09_144727.804063_7006/nova.sh_20140609_144728_12037_0_1.cmd submitting.... Submitting job(s). 1 job(s) submitted to cluster 269. JobsubJobId of first job: 269.0@fifebatch2.fnal.gov Use job id 269.0@fifebatch2.fnal.gov to retrieve output Remote Submission Processing Time: 0.379333019257 sec
Advance Use Cases¶
Following table lists command line options that can be used to specify additional job constraints
USE CASE | OPTION |
---|---|
Add custom integer attribute in job's classad (e.g. my_int_var = 9) | --lines=+my_int_var=9 |
Add custom string attribute in job's classad (e.g. my_str_var = "foo") | --lines=+my_str_var="foo" |
Submitting a Tar Ball¶
For a more detailed discussion, please see Tardir_and_dropbox_URIs
NOTE:there are many methods of sending input files to your job, such as using the -f input_file option, using wget to pull your files from a squid cache, or using condors transfer_input_file mechanism to upload a file to the worker node from the jobsub_server which is described below. If the server is not configured correctly for the transfer_input_files method you may overload it with network traffic causing submission problems for other users. Consult with the server operations staff prior to using this option heavily.
The jobsub_client input option --tar_file_name dropbox://(my_tar_file_name) will upload (my_tar_file_name) to the dropbox configured on the jobsub server (usually an experiment's dCache scratch or resilient area), and then transfer the tarball to the resulting condor job. When the job is started on the worker node, the envrionment variable $INPUT_TAR_FILE will point to(my_tar_file_name). The tar ball will be automatically unwound in the same directory that the users job starts in, and its contents accessed or run through the relative directory paths that were copied into the tarball.
Using the Rapid Code Distribution Service via CVMFS in jobsub 1.3 and later¶
The procedure for this is the same for submitting a tarball, but currently, you will also have to specify the extra option:
--use-cvmfs-dropbox
before the file:// URI. More details and a full writeup of the feature can be found here.
User Job Execution Flow and Jobsub Environment Variables¶
Groups Subgroups, Quotas, Priorities¶
- When a jobsub job is submitted, a condor classad attribute +AccountingGroup = "group_(experiment).(username)" ie, if user dbox in group nova submits a job +AccountingGroup = "group_nova.dbox" is used for assigning this job to group nova's quotas and priorities.
- The --subgroup flag is used to modify this attribute, which affects job priority.
- In the above example, if user dbox in group nova added --subgroup=high_prio to his submission, the resulting classad attribute would be +AccountingGroup="group_nova.high_prio.dbox" .