Necessary Maintenance #11783
jobsub_submit wiki documentation needs updating
The information on the job time request parameters for jobsub_submit seems to be now out of date given
the recent announcement of job length parameters, and it's not entirely clear how to specify times.
The documentation is here:
and the confusing/out-of-date bits are these:
kill user job if still running after NUMBER[UNITS] of
time . UNITS may be `s' for seconds (the default), `m'
for minutes, `h' for hours or `d' h for days.
Expected lifetime of the job. Used to match against
resources advertising that they have
REMAINING_LIFETIME seconds left. The shorter your
EXPECTED_LIFTIME is, the more resources (aka slots,
cpus) your job can potentially match against and the
quicker it should start. If your job runs longer than
EXPECTED_LIFETIME it may be killed by the batch
system. If your specified EXPECTED_LIFETIME is too
long your job may take a long time to match against a
resource a sufficiently long REMAINING_LIFETIME.
Valid inputs for this parameter are 'short', 'medium',
'long', or an integer which represents
EXPECTED_LIFETIME in seconds. The values for
'short','medium',and 'long' are configurable by Grid
Operations, they currently are '6 hours' , '12 hours'
, and '24 hours' but this may change in the future.
Default value of EXPECTED_LIFETIME is currently 24
Several questions arise:
1) In the --timeout block, it says a job is killed if it goes over time, while it is announced that they are held.
2) The default expected_lifetime is now 8 hours and not 24.
3) How long is long now? The e-mail from the Service Desk just now didn't specify. There is talk about 48 hours in the e-mail, and only the short and medium times are given. The wiki above says a long job is given 24 hours.
Recent e-mail from the Service Desk:
WHAT ARE WE DOING?
We are making some changes to the enforcement of job run time requests and to the expected job run time parameters in jobsub.
1. We will begin enforcing run time limits on jobs submitted to fifebatch as follows:
• Jobs that request an expected run time of 48 hours or less will be held when their run time reaches 48 hours.
• Jobs that request longer than 48 hours will be held once they reach their run time request.
• PLEASE NOTE: This does NOT guarantee that your job will be allowed to run for 48 hours; the job slot may expire much earlier than 48 hours or a job could be evicted for another reason, such as preemption.
2. The "short" and "medium" jobsub lifetime presets will be changed from six and 12 hours to three and eight hours, respectively.
3. The jobsub default lifetime will be changed from 24 hours to eight hours, which is the same as the "medium" lifetime preset. Current measurements show that approximately 97% of all submitted jobs complete within eight hours.
WHEN WILL THIS OCCUR?
Thursday, Feb. 25; noon Central Time
WHAT IS THE IMPACT TO YOU?
• Jobs that do not exceed their run time requests will not be affected.
• When submitting jobs using one of the jobsub lifetime presets, please be aware that the "short" and "medium" presets have changed to three and eight hours, respectively.
• If you submit jobs to fifebatch without specifying an expected job lifetime, your jobs may be matched to slots that have as few as eight hours of run time available.
WHAT DO YOU NEED TO DO?
Make sure your run time requests accurately reflect how long you expect the jobs to take.
• Users who legitimately need to run jobs for longer than 48 hours must specify an appropriate expected job lifetime using the expected-lifetime option in jobsub_submit.
• Users who need to run jobs for longer than eight hours, the new default job lifetime, must specify an appropriate expected job lifetime using the expected-lifetime option in jobsub_submit.
• Users who expect their jobs to be significantly shorter than eight hours should specify an appropriate expected job lifetime so resources can be used effectively.
#3 Updated by Dennis Box about 5 years ago
- Status changed from New to Resolved
fixed in rc3 just deployed to fifebatch-dev
- the --timeout option does in fact kill your job using the TIMEOUT shell command
- the short, medium, long, and default values for --expected-lifetime are now read from the jobsub server configuration file and output to the contents of the --help command.