Project

General

Profile

Feature #12534

jobsub connetions should query RecentDaemonCoreDutyCycle

Added by Joe Boyd over 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
05/05/2016
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

The schedd occasionally gets very busy. One of the reasons it can get really busy is if there is a large job submission. We'd like jobsub to try to throttle things a bit.

We're seeing users submitting jobs in loops. We currently have a limit of only being able to submit 10k procs in a cluster but if they're submitting jobs in a loop we still get an overload.

jobsub could query the parameter RecentDaemonCoreDutyCycle from the schedd when it's looking at the schedds to see which one to submit a job to. If the schedd it's going to submit to has a high duty cycle then the jobsub server should pause for a little while and give it time to recover. If the user is submitting jobs in a loop, and they're submitting a large number of procs, this should slow the submission down and give the schedd a chance to keep up.

History

#1 Updated by Dennis Box over 3 years ago

  • Target version set to v1.2.4

#2 Updated by Dennis Box over 3 years ago

  • Target version changed from v1.2.4 to v1.2.3

#3 Updated by Dennis Box over 3 years ago

Hi Joe,
I am assuming the real answer to this is 'make it configurable' but do you have a feel for how high of a RecentDaemonCoreDutyCycle should trigger a throttle, and how long of a pause should happen?

Cheers
Dennis

#4 Updated by Joe Boyd over 3 years ago

When we run into problems it's usually hitting 100% busy. If there is a schedd that isn't busy I think you can just submit it there. Otherwise maybe wait 60 seconds and try again printing out some message for the user.

Looking at the recent schedd numbers we haven't been seeing this so much lately.

https://fifemon-pp.fnal.gov/dashboard/db/schedd-statistics?panelId=15&fullscreen&from=now-30d&to=now&var-schedd=fifebatch1

Maybe if the schedd is over 85% busy don't submit more jobs to it until it comes down? 90% might be acceptable.

#5 Updated by Tanya Levshina about 3 years ago

Dennis,

It would be great if you have it in the next version. Please see INC000000786287

#6 Updated by Parag Mhashilkar about 3 years ago

  • Priority changed from Normal to High

#7 Updated by Dennis Box almost 3 years ago

  • Status changed from New to Resolved

#8 Updated by Dennis Box almost 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF