Project

General

Profile

Frequently Asked Questions

What is a fast way to get started?

  • the following assumes you have an FNAL kerberized linux account. Log into bash shell on a linux machine that has /cvmfs or bluarc (/grid/fermiapp) nfs mounted
  • from the bash shell command line:
    1. $ source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups.sh #(if cvmfs not available, try sourcing /grid/fermiapp/products/common/etc/setups.sh)
    2. $ setup jobsub_client
    3. $ jobsub #will produce the following output:
      
      jobsub has been replaced with a suite of client tools listed here:
      
      jobsub_submit     -- submit a job    
      jobsub_submit_dag -- submit a dag description of jobs
      
      jobsub_status     -- list OSG sites that are available to submit jobs to  
      
      jobsub_q          -- check status of jobs in queue (like condor_q)    
      
      jobsub_hold       -- hold jobs in queue   
      jobsub_release    -- release held jobs   
      jobsub_rm         -- remove jobs from queue  
      
      jobsub_fetchlog   -- retrieve job log files from jobsub_server
      jobsub_history    -- check status of finished jobs (like conndor_history) 
      
      all of these tools respond to the --help flag by listing their
      input options along with explanations
      
      for more  information see 
      

https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client

jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --constraint flag. How does it work?

  • The --constraint flag accepts a single quoted HTCondor constraint expression to filter jobs in the job queue by their classad attributes. Extensive documentation on constraints and classad attributes is available via web searches, for example here
  • A good way to see all the classadd attributes for a single job is the command: 'jobsub_q --group my_experiment --jobid my_jobsub_jobid --long
  • In practice, the following example covers about 80% of use cases encountered. Classad attribute 'jobstatus' has the following values/meanings:
    • 1 idle
    • 2 running
    • 3 removed
    • 4 completed
    • 5 held
  • User 'dbox', a member of the 'nova' experiment, wants to remove all his held jobs.
    • First he checks that his constraint is correct with jobsub_q:
      $ jobsub_q -G nova --constraint '(JobStatus=?=5)&&(Owner=?="dbox")' --jobsub-server https://fermicloud119.fnal.gov:8443
      JOBSUBJOBID                           OWNER SUBMITTED     RUN_TIME   ST PRI SIZE CMD
      508.0@fermicloud119.fnal.gov          dbox            06/10 17:45   0+00:10:27 H   0  732.4 nova_noisy.sh_20160610_174528_1573442_0_1_wrap.sh
      517.0@fermicloud119.fnal.gov          dbox            06/10 17:46   0+00:10:08 H   0   0.3 condor_dagman
      533.0@fermicloud119.fnal.gov          dbox            06/10 17:46   0+00:09:31 H   0   0.3 condor_dagman
      539.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:09:18 H   0   0.3 condor_dagman
      540.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:09:20 H   0  41.5 testNoSAM.sh_20160610_174651_1605431_1_1_wrap.sh
      542.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:09:15 H   0  41.5 testNoSAM.sh_20160610_174651_1605431_2_1_wrap.sh
      544.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:09:18 H   0   0.3 condor_dagman
      545.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:09:06 H   0  97.7 testNoSAM.sh_20160610_174651_1605431_3_1_wrap.sh
      593.0@fermicloud119.fnal.gov          dbox            06/10 17:47   0+00:08:17 H   0  46.4 nova_maxConcurrent.sh_20160610_174701_1612564_5_1_wrap.sh
      610.0@fermicloud119.fnal.gov          dbox            06/10 17:48   0+00:07:23 H   0  73.2 nova_maxConcurrent.sh_20160610_174700_1611514_6_1_wrap.sh
      611.0@fermicloud119.fnal.gov          dbox            06/10 17:48   0+00:07:23 H   0  73.2 nova_maxConcurrent.sh_20160610_174700_1611514_7_1_wrap.sh
      613.0@fermicloud119.fnal.gov          dbox            06/10 17:48   0+00:07:22 H   0  34.2 nova_maxConcurrent.sh_20160610_174700_1611514_8_1_wrap.sh
      614.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:20 H   0  73.2 nova_maxConcurrent.sh_20160610_174701_1612564_6_1_wrap.sh
      615.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:20 H   0  73.2 nova_maxConcurrent.sh_20160610_174701_1612564_7_1_wrap.sh
      616.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:19 H   0  73.2 nova_maxConcurrent.sh_20160610_174700_1611514_9_1_wrap.sh
      617.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:17 H   0  97.7 jobF.sh_20160610_174613_1582395_0_1_wrap.sh
      618.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:17 H   0  73.2 jobF.sh_20160610_174606_1579725_0_1_wrap.sh
      619.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:16 H   0  34.2 nova_maxConcurrent.sh_20160610_174701_1612564_8_1_wrap.sh
      620.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:13 H   0  24.4 nova_maxConcurrent.sh_20160610_174700_1611514_10_1_wrap.sh
      621.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:07:11 H   0   0.0 jobG.sh_20160610_174611_1581646_0_1_wrap.sh
      622.0@fermicloud119.fnal.gov          dbox            06/10 17:49   0+00:00:00 H   0   0.0 jobG.sh_20160610_174609_1580731_0_1_wrap.sh
      
      21 jobs; 0 completed, 0 removed, 0 idle, 0 running, 21 held, 0 suspended
      
    • Satisfied that the constraint is correct, he removes them with jobsub_rm. He could have released them with jobsub_release with the same arguments.
      $ jobsub_rm -G nova --constraint '(JobStatus=?=5)&&(Owner=?="dbox")'  --jobsub-server https://fermicloud119.fnal.gov:8443
      removing jobs with constraint=(JobStatus=?=5)&&(Owner=?="dbox")
      21 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied
      Performed REMOVE on 21 jobs matching your request
      $ 
      

jobsub_q, jobsub_rm, jobsub_hold, jobsub_release all have a --jobid flag, but the input to the flag seems to be different for some of these commands. How do they work?

Example: submit 20 jobs

[dbox@fermicloud042 ~]$ jobsub_submit -G nova -N 20 --jobsub-server https://$HOSTNAME:8443 file://sleep.sh
/fife/local/scratch/uploads/nova/dbox/2017-01-17_163340.878854_635

/fife/local/scratch/uploads/nova/dbox/2017-01-17_163340.878854_635/sleep.sh_20170117_163353_1595566_0_1_.cmd

submitting....

Submitting job(s)....................

20 job(s) submitted to cluster 531304.

JobsubJobId of first job: 531304.0@fermicloud042.fnal.gov

Use job id 531304.0@fermicloud042.fnal.gov to retrieve output

Hold and Release a job using a single jobid


[dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.1@fermicloud042.fnal.gov
Holding job with jobid=531304.1@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.1@fermicloud042.fnal.gov
Releasing job with jobid=531304.1@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Queue a single jobid, showing 'over generous' matching


[dbox@fermicloud042 ~]$ jobsub_q -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.1@fermicloud042.fnal.gov
JOBSUBJOBID                           OWNER           SUBMITTED     RUN_TIME   ST PRI SIZE CMD
531304.1@fermicloud042.fnal.gov       dbox            01/17 16:33   0+00:06:55 I   0  26.9 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.10@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.11@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.12@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.13@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.14@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.15@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.16@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.17@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.18@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 
531304.19@fermicloud042.fnal.gov      dbox            01/17 16:33   0+00:00:00 I   0   0.0 sleep.sh_20170117_163353_1595566_0_1_wrap.sh 

11 jobs; 0 completed, 0 removed, 11 idle, 0 running, 0 held, 0 suspended
[dbox@fermicloud042 ~]$ 

Query, Hold, Release, and Remove all jobs in a cluster using wildcard form of jobid


[dbox@fermicloud042 ~]$ jobsub_q -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  532191.@fermicloud042.fnal.gov
JOBSUBJOBID                           OWNER           SUBMITTED     RUN_TIME   ST PRI SIZE CMD
532191.0@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.1@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.2@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.3@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.4@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.5@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.6@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.7@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.8@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.9@fermicloud042.fnal.gov       dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.10@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.11@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.12@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.13@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.14@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.15@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.16@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.17@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.18@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 
532191.19@fermicloud042.fnal.gov      dbox            01/17 17:08   0+00:00:00 I   0   0.0 sleep.sh_20170117_170859_2691419_0_1_wrap.sh 

20 jobs; 0 completed, 0 removed, 20 idle, 0 running, 0 held, 0 suspended
[dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  532191.@fermicloud042.fnal.gov
Holding job with jobid=532191.@fermicloud042.fnal.gov
20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  532191.@fermicloud042.fnal.gov
Releasing job with jobid=532191.@fermicloud042.fnal.gov
20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ jobsub_rm -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  532191.@fermicloud042.fnal.gov
Removing job with jobid=532191.@fermicloud042.fnal.gov
20 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ 

Hold, Release, and Remove a comma separated list of jobs

[dbox@fermicloud042 ~]$ jobsub_hold -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov
Holding job with jobid=531304.12@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Holding job with jobid=531304.13@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Holding job with jobid=531304.14@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ jobsub_release -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov
Releasing job with jobid=531304.12@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Releasing job with jobid=531304.13@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Releasing job with jobid=531304.14@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

[dbox@fermicloud042 ~]$ jobsub_rm -G nova --jobsub-server https://$HOSTNAME:8443 --jobid  531304.12@fermicloud042.fnal.gov,531304.13@fermicloud042.fnal.gov,531304.14@fermicloud042.fnal.gov
Removing job with jobid=531304.12@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Removing job with jobid=531304.13@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

Removing job with jobid=531304.14@fermicloud042.fnal.gov
1 Succeeded, 0 Failed, 0 Not Found, 0 Bad Status, 0 Already Done, 0 Permission Denied

jobsub_submit --help --jobsub_server (some server) returns a helpfile from the default server. Why?

  • There are a couple of features to jobsub_submit and jobsub_dag_submit that can be confusing:
  1. If --jobsub-server (not --jobsub_server) is omitted, the default https://fifebatch.fnal.gov:8443 is used.
  2. any options the client doesn't know how to handle are passed to the server to see if it understands them. So --jobsub_server was passed to the server as an option, but ignored as it found the --help option and responded by outputting the help from the old jobsub ups product.

What is the difference between jobsub_submit -f /path/to/file and jobsub_submit -f dropbox:///path/to/file ?

  • Jobsub servers have a file storage 'dropbox' area for storing and transferring user or data files from an area visible (i.e. nfs mounted) to the client to the worker node.
    • Using jobsub_submit -f dropbox:// or --tar-file-name dropbox:// transfers the file from the client to the servers dropbox using ssh at submit time, then to the worker node using whatever method condor is configured to use at job execution time. The 'dropbox' specification works for locally mounted files, and for both pnfs and bluearc volumes that are visible to the client via an nfs mount. The dropbox file is checksummed prior to transfer to the server, and if it already exists this step is omitted, the second transfer to the worker node still takes place.
    • Using jobsub_submit -f /path/to/file ( no dropbox://) transfers the file from the original directory to the worker node using ifdh at job execution time.
  • SUMMARY:
    • Use -f /path/to/file if /path/to/file is visible on the worker node
      • examples: /pnfs/path/to/file , /grid/fermiapp/path/to/file, /cvmfs/path/to/file
      • Do not use -f /path/to/file if your file is on your laptop/desktop that does not run any services which ifdh can use to transfer the file from the source directory to the worker node
      • Using -f dropbox:// for a pnfs or bluearc location incurs an unnecessary second file transfer and generally should be avoided.
    • Use -f dropbox://path/to/file if /path/to/file is not visible on the worker node, or if you have some workflow where you want to submit /path/to/file with a job, then change it, then submit it again with a different job.
      • examples: your home directory on your laptop, /tmp
      • Using -f dropbox:// for a path that is not visible via ifdh on a worker node is REQUIRED

Give an example of using the -f and -d flags for jobsub_submit.

jobsub_submit -G ${GROUP} -f dropbox://baz.txt \
     -f ${PNFS_DIR}/foo.txt \
     -f ${PNFS_DIR}/bar.txt \
     -d A ${PNFS_DIR}/A \
     -d B ${PNFS_DIR}/B \
     -d C ${PNFS_DIR}/C \
     -d D ${PNFS_DIR}/D \
     --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC $JOBSUB_SERVER \
     --debug \
     file://transfer_test.sh A B C D
  • This submission example does the following:
    1. Prior to user job execution, the -f options copy 'baz.txt' from the local file system and 'foo.txt' and 'bar.txt' from specified pnfs directory into a local directory in the grid worker nodes environment accessible by the bash environment variable $CONDOR_DIR_INPUT. The -d options create 4 directories in the worker nodes environment accessible by $CONDOR_DIR_A , $CONDOR_DIR_B, $CONDOR_DIR_C, and $CONDOR_DIR_D
    2. Executes the user job. In this example user job is 'transfer_test.sh' with arguments 'A B C D'. An assumption here is that $CONDOR_DIR_A and the other directories will have data put into them by the user job.
    3. After user job completes, creates ${PNFS_DIR}/A through ${PNFS_DIR}/D directories if they do not already exist. Copies contents of $CONDOR_DIR_A to ${PNFS_DIR}/A , $CONDOR_DIR_B to ${PNFS_DIR}/B, etc.

Give an example of using the --tar_file_name dropbox:// and tardir:// flags for jobsub_submit

For a more detailed discussion, please see Tardir_and_dropbox_URIs.

If you use the dropbox:// URI with the --tar_file_name flag, jobsub_client will use IFDH to transfer the tarball specified in the URI to the jobsub dropbox. This dropbox location is set per experiment and is usually within your experiment's dCache scratch or resilient area. To find out where the dropbox is, or to request changes to it, please open a ServiceNow ticket to the "Batch Job Management (jobsub condorsubmit) - Standard" service under Scientific Computing Services --> Distributed Computing.

When your job starts, jobsub will transfer the tarball into the working directory of the job and untar it there.

So in this example, we have a pre-existing tarball at /nashome/s/sbhat/testdir.tar. By specifying --tar_file_name=dropbox:///nashome/s/sbhat/testdir.tar, jobsub_client will transfer testdir.tar to the jobsub dropbox, and then from there to the user job. In the user job, testdir.tar will automatically be unwound in the directory the job starts in, without the need for the user to unwind that tarball. testdir.tar will also be accessible via the environment variable $INPUT_TAR_FILE in the user job.

-bash-4.1$ jobsub_submit -G nova  --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis  --tar_file_name=dropbox:///nashome/s/sbhat/testdir.tar file:///grid/fermiapp/common/tools/probe
/fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980

/fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980/probe_20180215_151431_872601_0_1_.cmd

submitting....

Submitting job(s).

1 job(s) submitted to cluster 3817624.

JobsubJobId of first job: 3817624.0@jobsub01.fnal.gov

Use job id 3817624.0@jobsub01.fnal.gov to retrieve output

When you use the tardir:// URI with the --tar_file_name flag, jobsub_client will create a bzipped tar file of the directory you specify in the URI and place it in the current directory. It will then perform the same actions on that tarball as if you had used the dropbox:// URI.

In the following example, we have a directory /nashome/s/sbhat/testdir that we want to tar up and send to the job. Using --tar_file_name=tardir:///nashome/s/sbhat/testdir tells jobsub_client to create a tarball from the directory testdir (which will be called testdir.tar), and send that to the job (via the jobsub dropbox). Once the job starts up, testdir.tar will be transferred from the jobsub dropbox and unwound automatically into the directory the user job starts in. Like before, the tarball testdir.tar will be accessible via the environment variable $INPUT_TAR_FILE in the user job.

-bash-4.1$ jobsub_submit -G nova  --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis  --tar_file_name=tardir:///nashome/s/sbhat/testdir file:///grid/fermiapp/common/tools/probe
/fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584

/fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584/probe_20180215_150742_2466467_0_1_.cmd

submitting....

Submitting job(s).

1 job(s) submitted to cluster 4122376.

JobsubJobId of first job: 4122376.0@jobsub02.fnal.gov

Use job id 4122376.0@jobsub02.fnal.gov to retrieve output

I want my jobs to start quickly and finish with a minimum of fuss. What should I do to make this happen?

How can I monitor my jobs progress?

  • jobsub_q, jobsub_history gives basic information
  • jobsub_q --better-analyze may help understanding why it won't start
  • See Kevin Reitzke's Fifemon Tutorial for lots of useful information