- Table of contents
- Frequently Asked Questions
- What is a fast way to get started?
- jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --constraint flag. How does it work?
- jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --jobid flag, but the input to the flag seems to be different for some of these commands. How do they work?
- jobsub_submit --help --jobsub_server (some server) returns a helpfile from the default server. Why?
- What is the difference between jobsub_submit -f /path/to/file and jobsub_submit -f dropbox:///path/to/file ?
- Give an example of using the -f and -d flags for jobsub_submit.
- Give an example of using the --tar_file_name dropbox:// and tardir:// flags for jobsub_submit
- I want my jobs to start quickly and finish with a minimum of fuss. What should I do to make this happen?
- How can I monitor my jobs progress?
Frequently Asked Questions¶
What is a fast way to get started?¶
- the following assumes you have an FNAL kerberized linux account. Log into bash shell on a linux machine that has /cvmfs or bluarc (/grid/fermiapp) nfs mounted
- from the bash shell command line:
- $ source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups.sh #(if cvmfs not available, try sourcing /grid/fermiapp/products/common/etc/setups.sh)
- $ setup jobsub_client
- $ jobsub #will produce the following output:
jobsub has been replaced with a suite of client tools listed here: jobsub_submit -- submit a job jobsub_submit_dag -- submit a dag description of jobs jobsub_status -- list OSG sites that are available to submit jobs to jobsub_q -- check status of jobs in queue (like condor_q) jobsub_hold -- hold jobs in queue jobsub_release -- release held jobs jobsub_rm -- remove jobs from queue jobsub_fetchlog -- retrieve job log files from jobsub_server jobsub_history -- check status of finished jobs (like conndor_history) all of these tools respond to the --help flag by listing their input options along with explanations for more information see
https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client
jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --constraint flag. How does it work?¶
- The --constraint flag accepts a single quoted HTCondor constraint expression to filter jobs in the job queue by their classad attributes. Extensive documentation on constraints and classad attributes is available via web searches, for example here
- A good way to see all the classadd attributes for a single job is the command: 'jobsub_q --group my_experiment --jobid my_jobsub_jobid --long
- In practice, the following example covers about 80% of use cases encountered. Classad attribute 'jobstatus' has the following values/meanings:
- 1 idle
- 2 running
- 3 removed
- 4 completed
- 5 held
- User 'dbox', a member of the 'nova' experiment, wants to remove all his held jobs.
- First he checks that his constraint is correct with jobsub_q:
$ jobsub_q -G nova --constraint '(JobStatus=?=5)&&(Owner=?="dbox")' --jobsub-server https://fermicloud119.fnal.gov:8443 JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 508.0@fermicloud119.fnal.gov dbox 06/10 17:45 0+00:10:27 H 0 732.4 nova_noisy.sh_20160610_174528_1573442_0_1_wrap.sh 517.0@fermicloud119.fnal.gov dbox 06/10 17:46 0+00:10:08 H 0 0.3 condor_dagman 533.0@fermicloud119.fnal.gov dbox 06/10 17:46 0+00:09:31 H 0 0.3 condor_dagman 539.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:09:18 H 0 0.3 condor_dagman 540.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:09:20 H 0 41.5 testNoSAM.sh_20160610_174651_1605431_1_1_wrap.sh 542.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:09:15 H 0 41.5 testNoSAM.sh_20160610_174651_1605431_2_1_wrap.sh 544.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:09:18 H 0 0.3 condor_dagman 545.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:09:06 H 0 97.7 testNoSAM.sh_20160610_174651_1605431_3_1_wrap.sh 593.0@fermicloud119.fnal.gov dbox 06/10 17:47 0+00:08:17 H 0 46.4 nova_maxConcurrent.sh_20160610_174701_1612564_5_1_wrap.sh 610.0@fermicloud119.fnal.gov dbox 06/10 17:48 0+00:07:23 H 0 73.2 nova_maxConcurrent.sh_20160610_174700_1611514_6_1_wrap.sh 611.0@fermicloud119.fnal.gov dbox 06/10 17:48 0+00:07:23 H 0 73.2 nova_maxConcurrent.sh_20160610_174700_1611514_7_1_wrap.sh 613.0@fermicloud119.fnal.gov dbox 06/10 17:48 0+00:07:22 H 0 34.2 nova_maxConcurrent.sh_20160610_174700_1611514_8_1_wrap.sh 614.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:20 H 0 73.2 nova_maxConcurrent.sh_20160610_174701_1612564_6_1_wrap.sh 615.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:20 H 0 73.2 nova_maxConcurrent.sh_20160610_174701_1612564_7_1_wrap.sh 616.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:19 H 0 73.2 nova_maxConcurrent.sh_20160610_174700_1611514_9_1_wrap.sh 617.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:17 H 0 97.7 jobF.sh_20160610_174613_1582395_0_1_wrap.sh 618.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:17 H 0 73.2 jobF.sh_20160610_174606_1579725_0_1_wrap.sh 619.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:16 H 0 34.2 nova_maxConcurrent.sh_20160610_174701_1612564_8_1_wrap.sh 620.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:13 H 0 24.4 nova_maxConcurrent.sh_20160610_174700_1611514_10_1_wrap.sh 621.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:07:11 H 0 0.0 jobG.sh_20160610_174611_1581646_0_1_wrap.sh 622.0@fermicloud119.fnal.gov dbox 06/10 17:49 0+00:00:00 H 0 0.0 jobG.sh_20160610_174609_1580731_0_1_wrap.sh 21 jobs; 0 completed, 0 removed, 0 idle, 0 running, 21 held, 0 suspended
- Satisfied that the constraint is correct, he removes them with jobsub_rm. He could have released them with jobsub_release with the same arguments.
$ jobsub_rm -G nova --constraint '(JobStatus=?=5)&&(Owner=?="dbox")' --jobsub-server https://fermicloud119.fnal.gov:8443 removing jobs with constraint=(JobStatus=?=5)&&(Owner=?="dbox") 21 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Performed REMOVE on 21 jobs matching your request $
- First he checks that his constraint is correct with jobsub_q:
jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --jobid flag, but the input to the flag seems to be different for some of these commands. How do they work?¶
- A jobsub jobid is of the form 'cluster_num.proc_num@hostname', for example '531304.0@fermicloud042.fnal.gov'
- all four commands accept a --jobid of this form as an argument
- all four commands accept a wildcard or regular expression jobid of the form 'cluster_num.@hostname' for example '531304.@fermicloud042.fnal.gov'
- in all of these cases, the action will be performed on all jobs in the cluster
- jobsub_hold, jobsub_release, and jobsub_rm accept a comma separated list of jobids, for example 531304.1@fermicloud042.fnal.gov,531304.2@fermicloud042.fnal.gov,531304.3@fermicloud042.fnal.gov
- in these cases, the commands perform the action on each jobid in the list
- jobsub_q does not support a comma seperated list of jobids.
Example: submit 20 jobs
[dbox@fermicloud042 ~]$ jobsub_submit -G nova -N 20 --jobsub-server https://$HOSTNAME:8443 file://sleep.sh /fife/local/scratch/uploads/nova/dbox/2017-01-17_163340.878854_635 /fife/local/scratch/uploads/nova/dbox/2017-01-17_163340.878854_635/sleep.sh_20170117_163353_1595566_0_1_.cmd submitting.... Submitting job(s).................... 20 job(s) submitted to cluster 531304. JobsubJobId of first job: 531304.0@fermicloud042.fnal.gov Use job id 531304.0@fermicloud042.fnal.gov to retrieve output
Hold and Release a job using a single jobid
[dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.1@fermicloud042.fnal.gov Holding job with jobid=531304.1@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.1@fermicloud042.fnal.gov Releasing job with jobid=531304.1@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied
Queue a single jobid, showing 'over generous' matching
[dbox@fermicloud042 ~]$ jobsub_q -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.1@fermicloud042.fnal.gov JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 531304.1@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:06:55 I 0 26.9 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.10@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.11@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.12@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.13@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.14@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.15@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.16@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.17@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.18@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 531304.19@fermicloud042.fnal.gov dbox 01/17 16:33 0+00:00:00 I 0 0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 11 jobs; 0 completed, 0 removed, 11 idle, 0 running, 0 held, 0 suspended [dbox@fermicloud042 ~]$
Query, Hold, Release, and Remove all jobs in a cluster using wildcard form of jobid
[dbox@fermicloud042 ~]$ jobsub_q -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 532191.@fermicloud042.fnal.gov JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 532191.0@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.1@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.2@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.3@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.4@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.5@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.6@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.7@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.8@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.9@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.10@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.11@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.12@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.13@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.14@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.15@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.16@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.17@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.18@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 532191.19@fermicloud042.fnal.gov dbox 01/17 17:08 0+00:00:00 I 0 0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 20 jobs; 0 completed, 0 removed, 20 idle, 0 running, 0 held, 0 suspended [dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 532191.@fermicloud042.fnal.gov Holding job with jobid=532191.@fermicloud042.fnal.gov 20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 532191.@fermicloud042.fnal.gov Releasing job with jobid=532191.@fermicloud042.fnal.gov 20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$ jobsub_rm -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 532191.@fermicloud042.fnal.gov Removing job with jobid=532191.@fermicloud042.fnal.gov 20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$
Hold, Release, and Remove a comma separated list of jobs
[dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov Holding job with jobid=531304.12@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Holding job with jobid=531304.13@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Holding job with jobid=531304.14@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov Releasing job with jobid=531304.12@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Releasing job with jobid=531304.13@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Releasing job with jobid=531304.14@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied [dbox@fermicloud042 ~]$ jobsub_rm -G nova --jobsub-server https://$HOSTNAME:8443 --jobid 531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov Removing job with jobid=531304.12@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Removing job with jobid=531304.13@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied Removing job with jobid=531304.14@fermicloud042.fnal.gov 1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied
jobsub_submit --help --jobsub_server (some server) returns a helpfile from the default server. Why?¶
- There are a couple of features to jobsub_submit and jobsub_dag_submit that can be confusing:
- If --jobsub-server (not --jobsub_server) is omitted, the default https://fifebatch.fnal.gov:8443 is used.
- any options the client doesn't know how to handle are passed to the server to see if it understands them. So --jobsub_server was passed to the server as an option, but ignored as it found the --help option and responded by outputting the help from the old jobsub ups product.
What is the difference between jobsub_submit -f /path/to/file and jobsub_submit -f dropbox:///path/to/file ?¶
- Jobsub servers have a file storage 'dropbox' area for storing and transferring user or data files from an area visible (i.e. nfs mounted) to the client to the worker node.
- Using jobsub_submit -f dropbox:// or --tar-file-name dropbox:// transfers the file from the client to the servers dropbox using ssh at submit time, then to the worker node using whatever method condor is configured to use at job execution time. The 'dropbox' specification works for locally mounted files, and for both pnfs and bluearc volumes that are visible to the client via an nfs mount. The dropbox file is checksummed prior to transfer to the server, and if it already exists this step is omitted, the second transfer to the worker node still takes place.
- Using jobsub_submit -f /path/to/file ( no dropbox://) transfers the file from the original directory to the worker node using ifdh at job execution time.
- SUMMARY:
- Use -f /path/to/file if /path/to/file is visible on the worker node
- examples: /pnfs/path/to/file , /grid/fermiapp/path/to/file, /cvmfs/path/to/file
- Do not use -f /path/to/file if your file is on your laptop/desktop that does not run any services which ifdh can use to transfer the file from the source directory to the worker node
- Using -f dropbox:// for a pnfs or bluearc location incurs an unnecessary second file transfer and generally should be avoided.
- Use -f dropbox://path/to/file if /path/to/file is not visible on the worker node, or if you have some workflow where you want to submit /path/to/file with a job, then change it, then submit it again with a different job.
- examples: your home directory on your laptop, /tmp
- Using -f dropbox:// for a path that is not visible via ifdh on a worker node is REQUIRED
- Use -f /path/to/file if /path/to/file is visible on the worker node
Give an example of using the -f and -d flags for jobsub_submit.¶
jobsub_submit -G ${GROUP} -f dropbox://baz.txt \ -f ${PNFS_DIR}/foo.txt \ -f ${PNFS_DIR}/bar.txt \ -d A ${PNFS_DIR}/A \ -d B ${PNFS_DIR}/B \ -d C ${PNFS_DIR}/C \ -d D ${PNFS_DIR}/D \ --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC $JOBSUB_SERVER \ --debug \ file://transfer_test.sh A B C D
- This submission example does the following:
- Prior to user job execution, the -f options copy 'baz.txt' from the local file system and 'foo.txt' and 'bar.txt' from specified pnfs directory into a local directory in the grid worker nodes environment accessible by the bash environment variable $CONDOR_DIR_INPUT. The -d options create 4 directories in the worker nodes environment accessible by $CONDOR_DIR_A , $CONDOR_DIR_B, $CONDOR_DIR_C, and $CONDOR_DIR_D
- Executes the user job. In this example user job is 'transfer_test.sh' with arguments 'A B C D'. An assumption here is that $CONDOR_DIR_A and the other directories will have data put into them by the user job.
- After user job completes, creates ${PNFS_DIR}/A through ${PNFS_DIR}/D directories if they do not already exist. Copies contents of $CONDOR_DIR_A to ${PNFS_DIR}/A , $CONDOR_DIR_B to ${PNFS_DIR}/B, etc.
Give an example of using the --tar_file_name dropbox:// and tardir:// flags for jobsub_submit¶
For a more detailed discussion, please see Tardir_and_dropbox_URIs.
If you use the dropbox:// URI with the --tar_file_name flag, jobsub_client will use IFDH to transfer the tarball specified in the URI to the jobsub dropbox. This dropbox location is set per experiment and is usually within your experiment's dCache scratch or resilient area. To find out where the dropbox is, or to request changes to it, please open a ServiceNow ticket to the "Batch Job Management (jobsub condorsubmit) - Standard" service under Scientific Computing Services --> Distributed Computing.
When your job starts, jobsub will transfer the tarball into the working directory of the job and untar it there.
So in this example, we have a pre-existing tarball at /nashome/s/sbhat/testdir.tar. By specifying --tar_file_name=dropbox:///nashome/s/sbhat/testdir.tar, jobsub_client will transfer testdir.tar to the jobsub dropbox, and then from there to the user job. In the user job, testdir.tar will automatically be unwound in the directory the job starts in, without the need for the user to unwind that tarball. testdir.tar will also be accessible via the environment variable $INPUT_TAR_FILE in the user job.
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis --tar_file_name=dropbox:///nashome/s/sbhat/testdir.tar file:///grid/fermiapp/common/tools/probe /fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980 /fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980/probe_20180215_151431_872601_0_1_.cmd submitting.... Submitting job(s). 1 job(s) submitted to cluster 3817624. JobsubJobId of first job: 3817624.0@jobsub01.fnal.gov Use job id 3817624.0@jobsub01.fnal.gov to retrieve output
When you use the tardir:// URI with the --tar_file_name flag, jobsub_client will create a bzipped tar file of the directory you specify in the URI and place it in the current directory. It will then perform the same actions on that tarball as if you had used the dropbox:// URI.
In the following example, we have a directory /nashome/s/sbhat/testdir that we want to tar up and send to the job. Using --tar_file_name=tardir:///nashome/s/sbhat/testdir tells jobsub_client to create a tarball from the directory testdir (which will be called testdir.tar), and send that to the job (via the jobsub dropbox). Once the job starts up, testdir.tar will be transferred from the jobsub dropbox and unwound automatically into the directory the user job starts in. Like before, the tarball testdir.tar will be accessible via the environment variable $INPUT_TAR_FILE in the user job.
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis --tar_file_name=tardir:///nashome/s/sbhat/testdir file:///grid/fermiapp/common/tools/probe /fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584 /fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584/probe_20180215_150742_2466467_0_1_.cmd submitting.... Submitting job(s). 1 job(s) submitted to cluster 4122376. JobsubJobId of first job: 4122376.0@jobsub02.fnal.gov Use job id 4122376.0@jobsub02.fnal.gov to retrieve output
I want my jobs to start quickly and finish with a minimum of fuss. What should I do to make this happen?¶
- See Ken Herner's presentation Job Management Best Practices for an excellent overview.
- See Mike Kirby's presentation OSG Site Selection for additional information
How can I monitor my jobs progress?¶
- jobsub_q, jobsub_history gives basic information
- jobsub_q --better-analyze may help understanding why it won't start
- See Kevin Reitzke's Fifemon Tutorial for lots of useful information