Project

General

Profile

Bug #3648

Error should be handled more gracefully

Added by Joe Boyd over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
Start date:
04/02/2013
Due date:
% Done:

0%

Estimated time:
Duration:

Description

We should handle more gracefully if jobsub is run on an unsupported machine. Instead of saying it "will probably not work" we should just bail and maybe give some message about where it should be run.


Caller: Margaret Votava
Assignment Group: REX-GRID-Support
Assigned To: Joe Boyd
Summary:

hi,

I logged into gpsn01 and tried to setup jobsub, but get the following warning:

gpsn01> source /grid/fermiapp/products/common/etc/setups.sh
gpsn01> setup jobsub_tools
tried to set up with gid=gpcf. this product will probably not work
dirname: missing operand
Try `dirname --help' for more information.
dirname: missing operand
Try `dirname --help' for more information.

Looks like we need an additional check in the setup script.

Margaret

History

#1 Updated by Dennis Box over 6 years ago

  • Target version set to v1.2

In defense of the existing code and error messages, 'probably will not work' is accurate.
Submission will indeed work from any machine as long as X509_USER_PROXY, CONDOR_TMP,
and CONDOR_EXEC are set correctly. If they are not set, submission will not work, and its probable
that someone who ignores the login message on gpsn01 to not use it but to use
one of the gpvm* machines instead has not looked into the situation enough to know
how to set these :)

I believe Margaret was trying to look at the online help, which does work.

[dbox@gpsn01 ~]$ . /grid/fermiapp/products/common/etc/setups.sh
[dbox@gpsn01 ~]$ setup jobsub_tools
tried to set up with gid=gpcf. this product will probably not work
dirname: missing operand
Try `dirname --help' for more information.
dirname: missing operand
Try `dirname --help' for more information.
[dbox@gpsn01 ~]$ jobsub -h
usage: jobsub [options] your_script [your_script_args]
submit your_script to local batch or to the OSG grid

options:
--version show program's version number and exit
-h, --help show this help message and exit

Generic Options:
-G ACCOUNTINGGROUP, --group=ACCOUNTINGGROUP
Group/Experiment/Subgroup for priorities and
accounting
-v, --verbose dump internal state of program (useful for debugging)
-M, --mail_always send mail when job completes or fails
-q, --mail_on_error
send mail only when job fails due to error (default)
-Q, --mail_never never send mail (default is to send mail on error)
-T, --test_queue Submit as a test job. Job will run with highest
possible priority, but you can only have one such
job in the queue at a time.
-g, --grid run job on the FNAL GP grid. Other flags can modify
target sites to include other areas of the Open
Science Grid
--nowrapfile do not generate shell wrapper for fermigrid
operations. (default is to generate a wrapfile)
-c REQUIREMENTS, --condor_requirements=REQUIREMENTS
append condor requirements
-C execute on grid from directory you are currently in
-e ADDED_ENVIRONMENT, --environment=ADDED_ENVIRONMENT
-e ADDED_ENVIRONMENT exports this variable and its
local value to worker node environment. For example
export FOO="BAR"; jobsub -e FOO <more stuff>
guarantees that the value of $FOO on the worker node
is "BAR" . Can use this option as many times as
desired
--submit_host=SUBMIT_HOST
submit to different host
-p use parrot to run on afs (only makes sense with -a
flag)
--pOff turn parrot off explicitly (this is the default)
-n, --no_submit generate condor_command file but do not submit
--opportunistic submit opportunistically to Fermigrid GP Grid and CDF
Grid. This option will allow you to potentially get
more slots than your Fermigrid quota, but these slots
are subject to preemption
-N QUEUECOUNT submit N copies of this job. Each job will
have access to the environment variable
$PROCESS that provides the job number (0 to
<num>-1), equivalent to the decimal point in
the job ID (the '2' in 134567.2).
-x X509USERPROXY, --X509_USER_PROXY=X509USERPROXY
location of X509_USER_PROXY (expert mode)
File Options:
-l LINES, --lines=LINES
[Expert option] Add the line <line> to the
Condor submission (.cmd) file. See Condor
help for more.
-a, --needafs run on afs using parrot (this is discouraged)
-L JOBLOGFILE, --log_file=JOBLOGFILE
Log file to hold log output from job.
--no_log_buffer write log file directly to disk. Default is to copy it
back after job is completed. This option is useful
for debugging but can be VERY DANGEROUS as joblogfile
typically is sent to bluearc. Using this option
incorrectly can cause all grid submission systems at
FNAL to become overwhelmed resulting in angry admins
hunting you down, so USE SPARINGLY.
--use_gftp use grid-ftp to transfer file back
-f INPUT_DIR_ARRAY -f <file> input file <file> will be copied to
directory
$CONDOR_DIR_INPUT on the execution node.
Example :-f /grid/data/minerva/my/input/file.xxx
will be copied to $CONDOR_DIR_INPUT/file.xxx
Specify as many -f file1 -f file2 args as you need.
-d OUTPUT_DIR_ARRAY
-d<tag> <dir> Writable directory $CONDOR_DIR_<tag>
will
exist on the execution node. After job completion,
its contents will be moved to <dir> automatically
Specify as many <tag>/<dir> pairs as you need.
SAM Options:
--dataset_definition=DATASET_DEFINITION
SAM dataset definition used in a Directed Acyclic
Graph (DAG)
--project_name=PROJECT_NAME
optional project name for SAM DAG
NOTES
You can have as many instances of -c, -d, -e, -f, -l and -y as you need.
The -d directory mapping works on non-Grid nodes, too.
export IFDH_VERSION=some_version then use -e IFDH_VERSION to use 
the (some_version) release of ifdh for copying files in and out
with -d and -f flags instead of the current ifdh version in KITS
More documentation is available at 
https://cdcvs.fnal.gov/redmine/projects/ifront/wiki/UsingJobSub
and on this machine at
$JOBSUB_TOOLS_DIR/docs/

0
[dbox@gpsn01 ~]$
[dbox@gpsn01 ~]$ setup jobsub_tools
tried to set up with gid=gpcf. this product will probably not work
for more information:
https://cdcvs.fnal.gov/redmine/projects/ifront/wiki/UsingJobSub#The-probably-will-not-work-error-how-to-make-it-work
dirname: missing operand
Try `dirname --help' for more information.
dirname: missing operand
Try `dirname --help' for more information.

Anyway, I have fixed this in jobsub_tools v1_2_rc2
The new error message ( I can't get rid of the dirname easily):

#2 Updated by Dennis Box over 6 years ago

  • Status changed from New to Closed


Also available in: Atom PDF