Project

General

Profile

Submitting jobs via Jobsub Client

This document describes submitting jobs to both FermiGrid and the Open Science Grid using Jobsub Client. The older jobsub_tools product has been retired and is no longer supported for submission. Note, you may want to read how to get started on General Physics Computing Facility or general information about GPCF before proceeding.

Getting Help

For any operational help or if the services are down please open a service desk ticket.

For you find bugs or have questions using jobsub please send an email to

Jobsub Tools

The jobsub_tools package for job submission is no longer supported. Users should transition to the Jobsub Client-Server model described below. Users will be unable to submit jobs using the jobsub_tools package after March 1, 2015.

Jobsub Client-Server

Jobsub tools had several limitations managing production on a large scale. As several FIFE experiments share submission infrastructure, memory on the submission machine soon became a limiting factor as the number of experiments and thus the number of production jobs increased, along with limited overall bandwidth due to having only a single server. To circumvent the bottleneck, one can deploy several submission hosts, but the user becomes responsible for identifying the submission machine with least amount of load and fewer running jobs. The jobsub client-server architecture addresses these drawbacks by hiding the submission infrastructure behind the Jobsub server. User interfaces with the system using the jobsub_client commands that are similar to the tools provided by the jobsub_tools.

Using Jobsub Client

For detailed list of commands and their use refer to Jobsub project wiki https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client

General command line help

[dbox@novagpvm01 client]$ jobsub_submit -h
Usage: jobsub_submit [Client Options] [Server Options] user_script [user_script_args]

Provide --group and --jobsub-server to see full help

Options:
  --version             show program's version number and exit

  Client Options:
    -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup>
                        Group/Experiment/Subgroup for priorities and
                        accounting
    --role=<VOMS Role>  VOMS Role for priorities and accounting
    --dag               submit and run a dagNabbit input file
    --jobsub-server=<JobSub Server>
                        Alternate location of JobSub server to use
    --dropbox-server=<Dropbox Server>
                        Alternate location of Dropbox server to use
    --debug             Print debug messages to stdout
    -h, --help          Show this help message and exit

questions or comments may be sent to jobsub-support@fnal.gov

Experiment specific command line help

jobsub_submit --group nova --help   (substitute nova with your experiment)
Usage: jobsub_submit [Client Options] [Server Options] user_script [user_script_args]

Provide --group and --jobsub-server to see full help

Options:
  --version             show program's version number and exit

  Client Options:
    -G <Group/Experiment/Subgroup>, --group=<Group/Experiment/Subgroup>
                        Group/Experiment/Subgroup for priorities and
                        accounting
    --role=<VOMS Role>  VOMS Role for priorities and accounting
    --dag               submit and run a dagNabbit input file
    --jobsub-server=<JobSub Server>
                        Alternate location of JobSub server to use
    --dropbox-server=<Dropbox Server>
                        Alternate location of Dropbox server to use
    --debug             Print debug messages to stdout
    -h, --help          Show this help message and exit

questions or comments may be sent to jobsub-support@fnal.gov

  Server Options:
  --version             show program's version number and exit
    -h, --help            show this help message and exit

   Generic Options:
      --maxConcurrent=MAXCONCURRENT
                          max number of jobs running concurrently at given time.
                          Use in conjunction with -N option to protect a shared
                          resource.  Example: jobsub -N 1000 -maxConcurrent 20
                          will only run 20 jobs at a time until all 1000 have
                          completed.  This is implemented by running the jobs in
                          a DAG
      --disk=DISK         request worker nodes have at least this many MB of
                          disk space
      --memory=MEMORY     request worker nodes have at least this many MB of
                          memory
      --cpu=CPU           request worker nodes have at least this many cpus
      --drain             mark this job to be allowed to be drained or killed
                          during downtimes
      --OS=OS             specify OS version of worker node. Example --OS=SL5
                          Comma seperated list '--OS=SL4,SL5,SL6' works as well
                          . Default is any available OS
      --generate-email-summary
                          generate and mail a summary report of
                          completed/failed/removed jobs in a DAG
      --email-to=NOTIFY_USER
                          email address to send job reports/summaries to
                          (default is $USER@fnal.gov)
      -G ACCOUNTINGGROUP, --group=ACCOUNTINGGROUP
                          Group/Experiment/Subgroup for priorities and
                          accounting
      -v, --verbose       dump internal state of program (useful for debugging)
      --resource-provides=RESOURCE_PROVIDES
                          request specific resources by changing condor jdf
                          file.  For example: --resource-provides=CVMFS=OSG will
                          add +CVMFS="OSG" to the job classad attributes and
                          '&&(CVMFS=="OSG")' to the job requirements
      -M, --mail_always   send mail when job completes or fails
      -q, --mail_on_error
                          send mail only when job fails due to error (default)
      -Q, --mail_never    never send mail (default is to send mail on error)
      -T, --test_queue    Submit as a test job.  Job will run with highest
                          possible priority, but you can only have one such
                          job in the queue at a time.
      -g, --grid          run job on the FNAL GP  grid. Other flags can modify
                          target sites to include other areas of the Open
                          Science Grid
      --nowrapfile        DISABLED: formerly was 'do not generate shell wrapper
                          for fermigrid operations. (default is to generate a
                          wrapfile)' This flag now does nothing. The wrapfiles
                          work off site and protect file systems from user error
      -c APPEND_REQUIREMENTS, --append_condor_requirements=APPEND_REQUIREMENTS
                          append condor requirements
      --overwrite_condor_requirements=OVERWRITEREQUIREMENTS
                          overwrite default condor requirements with supplied
                          requirements
      --override=OVERRIDE
                          override some other value: --override 'requirements'
                          'gack==TRUE' would produce the same condor command
                          file as --overwrite_condor_requirements 'gack==TRUE'
                          if you want to use this option, test it first with -n
                          to see what you get as output
      -C                  execute on grid from directory you are currently in
      -e ADDED_ENVIRONMENT, --environment=ADDED_ENVIRONMENT
                          -e ADDED_ENVIRONMENT exports this variable and its
                          local value to worker node environment. For example
                          export FOO="BAR"; jobsub -e FOO <more stuff>
                          guarantees that the value of $FOO on the worker node
                          is "BAR" .  Can use this option as many times as
                          desired
      --submit_host=SUBMIT_HOST
                          submit to different host
      --site=SITE         submit jobs to this site
      -n, --no_submit     generate condor_command file but do not submit
      --opportunistic     submit opportunistically to Fermigrid GP Grid and CDF
                          Grid.  This option will allow you to potentially get
                          more slots than your Fermigrid quota, but these slots
                          are subject to preemption
      -N QUEUECOUNT       submit N copies of this job. Each job will
                          have access to the environment variable
                          $PROCESS that provides the job number (0 to
                          <num>-1), equivalent to the decimal point in
                          the job ID (the '2' in 134567.2).
      -x X509_USER_PROXY, --X509_USER_PROXY=X509_USER_PROXY
                          location of X509_USER_PROXY (expert mode)

  File Options:
      -l LINES, --lines=LINES
                          [Expert option]  Add the line <line> to the
                          Condor submission (.cmd) file.  See Condor
                          help for more.
      -L JOBLOGFILE, --log_file=JOBLOGFILE
                          Log file to hold log output from job.
      --no_log_buffer     write log file directly to disk. Default is to copy it
                          back after job is completed.  This option is useful
                          for debugging but can be VERY DANGEROUS as joblogfile
                          typically is sent to bluearc.  Using this option
                          incorrectly can cause all grid submission systems at
                          FNAL to become overwhelmed resulting in angry admins
                          hunting you down, so USE SPARINGLY.
      --use_gftp          use grid-ftp to transfer file back
      --tar_file_name=TAR_FILE_NAME
                          name of tarball to transfer to worker node. Will be
                          added to the transfer_input_files list, and visible to
                          the user job as $INPUT_TAR_FILE.  Does not work on
                          submit host gpsn01, use the -f option to transfer a
                          tar file to gpsn01
      -f INPUT_DIR_ARRAY  -f <file>         input file <file> will be copied to
                          directory
                          $CONDOR_DIR_INPUT on the execution node.
                          Example :-f /grid/data/minerva/my/input/file.xxx
                          will be copied to $CONDOR_DIR_INPUT/file.xxx
                          Specify as many -f file1 -f file2 args as you need.
      -d OUTPUT_DIR_ARRAY
                           -d<tag> <dir>  Writable directory $CONDOR_DIR_<tag>
                          will
                          exist on the execution node.  After job completion,
                          its contents will be moved to <dir> automatically
                          Specify as many <tag>/<dir> pairs as you need.

    SAM Options:
      --dataset_definition=DATASET_DEFINITION
                          SAM dataset definition used in a Directed Acyclic
                          Graph (DAG)
      --project_name=PROJECT_NAME
                          optional project name for SAM DAG

    Nova Specific Options:
      --SMU               steer jobs to HPC.SMU grid site
      -i RELDIR           release_directory for Nova Software
      -t TESTRELDIR       release_directory for test Nova Software
      -r REL              release_version for  Nova Software

        NOTES
        You can have as many instances of -c, -d, -e, -f, -l and -y as you need.

        The -d directory mapping works on non-Grid nodes, too.

        export IFDH_VERSION=some_version then use -e IFDH_VERSION to use 
          the (some_version) release of ifdh for copying files in and out 
          with -d and -f flags instead of the current ifdh version in KITS

        More documentation is available at 
        https://cdcvs.fnal.gov/redmine/projects/ifront/wiki/UsingJobSub
        and on this machine at 
        $JOBSUB_TOOLS_DIR/docs/
          address questions to the mailing list jobsub-support@fnal.gov