tardir and dropbox URIs¶
jobsub_client provides a few ways to automatically transfer files and directories from locally-mounted storage to your jobs. These can be used in a number of different ways depending on the desired outcome.
In this article, we'll cover using the tardir:// and dropbox:// URIs. The advantage of using these is that they instruct the jobsub client to upload the files or directories (as described below) to a directory specified in the jobsub configuration file (usually in pnfs resilient or scratch space), and then from there to the job when it starts up. jobsub will then handle cleaning up these files after 30 days if they're not in use (for more details on that, see Jobsub Dropbox Cleanup Tool). This allows users to not have to worry about cleaning up these files after their jobs have finished using them.
One reason these features were developed was to allow users to easily and transparently take advantage of the dCache resilient pools, which features 20x file replication. This high replication factor makes them ideal for use with user code that many jobs need to access. If you want to set up dCache resilient space and have jobsub automatically upload files there, please open a ServiceNow ticket to Distributed Computing Support.
The --tar-file-name option, used with the tardir:// or dropbox:// URIs, will transfer a directory (by tarring it up) or tar file, respectively, to an area specified in the jobsub server configuration at submit time. When the job starts running, the tarfile or the tarred-up directory will be transferred to the job and unpacked. The tarfile can be accessed using the environment variable $INPUT_TAR_FILE or can be found in the initial working directory of the job (which can be accessed at $_CONDOR_JOB_IWD). The unpacked contents of the tarball will also be located under $_CONDOR_JOB_IWD. Here are a few examples of using each:
Using --tar_file_name dropbox://<file>
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis --tar_file_name=dropbox:///nashome/s/sbhat/testdir.tar file:///grid/fermiapp/common/tools/probe /fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980 /fife/local/scratch/uploads/nova/sbhat/2018-02-15_151430.968258_4980/probe_20180215_151431_872601_0_1_.cmd submitting.... Submitting job(s). 1 job(s) submitted to cluster 3817624. JobsubJobId of first job: firstname.lastname@example.org Use job id email@example.com to retrieve output
In this case, testdir.tar will be uploaded from /nashome/s/sbhat/testdir.tar to the job and unpacked there. If I want to access testdir.tar in my job, it's at $INPUT_TAR_FILE. If I want to access the contents that were unpacked by jobsub, I can either look at the directory $INPUT_TAR_FILE is in, or directly access it by looking through $_CONDOR_JOB_IWD.
Here's an example using --tar_file_name tardir://<file>:
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis --tar_file_name=tardir:///nashome/s/sbhat/testdir file:///grid/fermiapp/common/tools/probe /fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584 /fife/local/scratch/uploads/nova/sbhat/2018-02-15_150742.299034_9584/probe_20180215_150742_2466467_0_1_.cmd submitting.... Submitting job(s). 1 job(s) submitted to cluster 4122376. JobsubJobId of first job: firstname.lastname@example.org Use job id email@example.com to retrieve output
In this example, jobsub_client will tar up /nashome/s/sbhat/testdir and create testdir.tar in my working directory. It will then transfer that tarball to the job. The behavior after that in the job is the same as dropbox:// : the contents of testdir.tar will be unpacked in $_CONDOR_JOB_IWD, and testdir.tar will itself be available in my job as $INPUT_TAR_FILE.
If I wanted to exclude certain files from being transferred to my job, the command to use would be:
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis --tar_file_name=tardir:///nashome/s/sbhat/testdir --tarball-exclusion-file /path/to/exclude_these.txt file:///grid/fermiapp/common/tools/probe
where /path/to/exclude_these.txt follows the syntax described here.
There are some cases when it makes sense to use the -f option instead of --tar_file_name:
- You don't want your tarfile unpacked automatically
- You want to transfer multiple tarballs into your job
- You want to keep your transferred-in files separate from your initial working directory
In these cases, using the -f option with the dropbox:// or tardir:// URIs will transfer your files to the job, but into a separate directory that can be accessed using the environment variable $CONDOR_DIR_INPUT. As mentioned above, any tarballs transferred in will not be untarred automatically in this case.
Below are a few examples on how to do each:
Simple -f dropbox:// or tardir://¶
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis -f dropbox:///nashome/s/sbhat/myfile file:///grid/fermiapp/common/tools/probe
In this example, myfile will be accessible in the job as $CONDOR_DIR_INPUT/myfile. If myfile were a tarball, it would not be unpacked: the job must do that manually.
-bash-4.1$ jobsub_submit -G nova --resource-provides=usage_model=OPPORTUNISTIC --role=Analysis -f tardir:///nashome/s/sbhat/testdir file:///grid/fermiapp/common/tools/probe
In this example, testdir.tar will be created in the working directory from /nashome/s/sbhat/testdir, and testdir.tar will be accessible in the job as $CONDOR_DIR_INPUT/testdir.tar. Again, this tarball will not be unpacked: the job must do that manually.
This is where the -f option is really helpful. You can transfer multiple files or directories using multiple -f flags:
jobsub_submit -G uboone -f tardir:///home/sbhat/test_dir -f dropbox:///home/sbhat/myfile -f dropbox:///home/sbhat/test_tar_2.tar file:///home/sbhat/test_untar.sh
In this case, a new tarfile test_dir.tar will be created from /home/sbhat/test_dir, and uploaded. Further, /home/sbhat/myfile and /home/sbhat/test_tar_2.tar will be uploaded. All three files (test_dir.tar, myfile, and test_tar_2.tar) will be available at $CONDOR_DIR_INPUT.