Submitting jobs via Jobsub Client¶
This document describes submitting jobs to both FermiGrid and the Open Science Grid using Jobsub Client. The older jobsub_tools product has been retired and is no longer supported for submission. Note, you may want to read how to get started on General Physics Computing Facility or general information about GPCF before proceeding.
Getting Help¶
For any operational help or if the services are down please open a service desk ticket.
For you find bugs or have questions using jobsub please send an email to jobsub-support@fnal.gov
Jobsub Tools¶
The jobsub_tools package for job submission is no longer supported. Users should transition to the Jobsub Client-Server model described below. Users will be unable to submit jobs using the jobsub_tools package after March 1, 2015.
Jobsub Client-Server¶
Jobsub tools had several limitations managing production on a large scale. As several FIFE experiments share submission infrastructure, memory on the submission machine soon became a limiting factor as the number of experiments and thus the number of production jobs increased, along with limited overall bandwidth due to having only a single server. To circumvent the bottleneck, one can deploy several submission hosts, but the user becomes responsible for identifying the submission machine with least amount of load and fewer running jobs. The jobsub client-server architecture addresses these drawbacks by hiding the submission infrastructure behind the Jobsub server. User interfaces with the system using the jobsub_client commands that are similar to the tools provided by the jobsub_tools.
Using Jobsub Client¶
For detailed list of commands and their use refer to Jobsub project wiki https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client
General command line help
[dbox@novagpvm01 client]$ jobsub_submit -h Usage: jobsub_submit [Client Options] [Server Options] user_script [user_script_args] Provide --group and --jobsub-server to see full help Options: --version show program's version number and exit Client Options: -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup> Group/Experiment/Subgroup for priorities and accounting --role=<VOMS Role> VOMS Role for priorities and accounting --dag submit and run a dagNabbit input file --jobsub-server=<JobSub Server> Alternate location of JobSub server to use --dropbox-server=<Dropbox Server> Alternate location of Dropbox server to use --debug Print debug messages to stdout -h, --help Show this help message and exit questions or comments may be sent to jobsub-support@fnal.gov
Experiment specific command line help
jobsub_submit --group nova --help (substitute nova with your experiment) Usage: jobsub_submit [Client Options] [Server Options] user_script [user_script_args] Provide --group and --jobsub-server to see full help Options: --version show program's version number and exit Client Options: -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup> Group/Experiment/Subgroup for priorities and accounting --role=<VOMS Role> VOMS Role for priorities and accounting --dag submit and run a dagNabbit input file --jobsub-server=<JobSub Server> Alternate location of JobSub server to use --dropbox-server=<Dropbox Server> Alternate location of Dropbox server to use --debug Print debug messages to stdout -h, --help Show this help message and exit questions or comments may be sent to jobsub-support@fnal.gov Server Options: --version show program's version number and exit -h, --help show this help message and exit Generic Options: --maxConcurrent=MAXCONCURRENT max number of jobs running concurrently at given time. Use in conjunction with -N option to protect a shared resource. Example: jobsub -N 1000 -maxConcurrent 20 will only run 20 jobs at a time until all 1000 have completed. This is implemented by running the jobs in a DAG --disk=DISK request worker nodes have at least this many MB of disk space --memory=MEMORY request worker nodes have at least this many MB of memory --cpu=CPU request worker nodes have at least this many cpus --drain mark this job to be allowed to be drained or killed during downtimes --OS=OS specify OS version of worker node. Example --OS=SL5 Comma seperated list '--OS=SL4,SL5,SL6' works as well . Default is any available OS --generate-email-summary generate and mail a summary report of completed/failed/removed jobs in a DAG --email-to=NOTIFY_USER email address to send job reports/summaries to (default is $USER@fnal.gov) -G ACCOUNTINGGROUP, --group=ACCOUNTINGGROUP Group/Experiment/Subgroup for priorities and accounting -v, --verbose dump internal state of program (useful for debugging) --resource-provides=RESOURCE_PROVIDES request specific resources by changing condor jdf file. For example: --resource-provides=CVMFS=OSG will add +CVMFS="OSG" to the job classad attributes and '&&(CVMFS=="OSG")' to the job requirements -M, --mail_always send mail when job completes or fails -q, --mail_on_error send mail only when job fails due to error (default) -Q, --mail_never never send mail (default is to send mail on error) -T, --test_queue Submit as a test job. Job will run with highest possible priority, but you can only have one such job in the queue at a time. -g, --grid run job on the FNAL GP grid. Other flags can modify target sites to include other areas of the Open Science Grid --nowrapfile DISABLED: formerly was 'do not generate shell wrapper for fermigrid operations. (default is to generate a wrapfile)' This flag now does nothing. The wrapfiles work off site and protect file systems from user error -c APPEND_REQUIREMENTS, --append_condor_requirements=APPEND_REQUIREMENTS append condor requirements --overwrite_condor_requirements=OVERWRITEREQUIREMENTS overwrite default condor requirements with supplied requirements --override=OVERRIDE override some other value: --override 'requirements' 'gack==TRUE' would produce the same condor command file as --overwrite_condor_requirements 'gack==TRUE' if you want to use this option, test it first with -n to see what you get as output -C execute on grid from directory you are currently in -e ADDED_ENVIRONMENT, --environment=ADDED_ENVIRONMENT -e ADDED_ENVIRONMENT exports this variable and its local value to worker node environment. For example export FOO="BAR"; jobsub -e FOO <more stuff> guarantees that the value of $FOO on the worker node is "BAR" . Can use this option as many times as desired --submit_host=SUBMIT_HOST submit to different host --site=SITE submit jobs to this site -n, --no_submit generate condor_command file but do not submit --opportunistic submit opportunistically to Fermigrid GP Grid and CDF Grid. This option will allow you to potentially get more slots than your Fermigrid quota, but these slots are subject to preemption -N QUEUECOUNT submit N copies of this job. Each job will have access to the environment variable $PROCESS that provides the job number (0 to <num>-1), equivalent to the decimal point in the job ID (the '2' in 134567.2). -x X509_USER_PROXY, --X509_USER_PROXY=X509_USER_PROXY location of X509_USER_PROXY (expert mode) File Options: -l LINES, --lines=LINES [Expert option] Add the line <line> to the Condor submission (.cmd) file. See Condor help for more. -L JOBLOGFILE, --log_file=JOBLOGFILE Log file to hold log output from job. --no_log_buffer write log file directly to disk. Default is to copy it back after job is completed. This option is useful for debugging but can be VERY DANGEROUS as joblogfile typically is sent to bluearc. Using this option incorrectly can cause all grid submission systems at FNAL to become overwhelmed resulting in angry admins hunting you down, so USE SPARINGLY. --use_gftp use grid-ftp to transfer file back --tar_file_name=TAR_FILE_NAME name of tarball to transfer to worker node. Will be added to the transfer_input_files list, and visible to the user job as $INPUT_TAR_FILE. Does not work on submit host gpsn01, use the -f option to transfer a tar file to gpsn01 -f INPUT_DIR_ARRAY -f <file> input file <file> will be copied to directory $CONDOR_DIR_INPUT on the execution node. Example :-f /grid/data/minerva/my/input/file.xxx will be copied to $CONDOR_DIR_INPUT/file.xxx Specify as many -f file1 -f file2 args as you need. -d OUTPUT_DIR_ARRAY -d<tag> <dir> Writable directory $CONDOR_DIR_<tag> will exist on the execution node. After job completion, its contents will be moved to <dir> automatically Specify as many <tag>/<dir> pairs as you need. SAM Options: --dataset_definition=DATASET_DEFINITION SAM dataset definition used in a Directed Acyclic Graph (DAG) --project_name=PROJECT_NAME optional project name for SAM DAG Nova Specific Options: --SMU steer jobs to HPC.SMU grid site -i RELDIR release_directory for Nova Software -t TESTRELDIR release_directory for test Nova Software -r REL release_version for Nova Software NOTES You can have as many instances of -c, -d, -e, -f, -l and -y as you need. The -d directory mapping works on non-Grid nodes, too. export IFDH_VERSION=some_version then use -e IFDH_VERSION to use the (some_version) release of ifdh for copying files in and out with -d and -f flags instead of the current ifdh version in KITS More documentation is available at https://cdcvs.fnal.gov/redmine/projects/ifront/wiki/UsingJobSub and on this machine at $JOBSUB_TOOLS_DIR/docs/ address questions to the mailing list jobsub-support@fnal.gov