Project

General

Profile

Feature #2962

Add held job count to factoryStatusNow page

Added by Burt Holzman about 7 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
09/14/2012
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Request from Jeff Dost:

Hello glideinWMS,

A few weeks back it came to our attention that our factories were not filling up queues fast enough on some of the really large sites (specifically for ATLAS). This was easily solved by tweaking our submit limits in the factory (cluster_size, max_per_cycle, sleep).

However the problem is we didn't notice until the site complained. The problem is how can we quickly detect this in our daily routine. One way to spot it is on our factoryStatusNow page:

http://glidein-1.t2.ucsd.edu:8319/osg_gfactory/factoryStatusNow.html

If we click troubleshoot and click the Idle Diff column to it sorts low to high, it is likely sites we aren't submitting fast enough to will have a large negative "Idle Diff." However this isn't the only reason for large negative "Idle Diff". It can also happen if a site is experiencing problems and a large number of glideins are going Held.

I propose a simple addition to quickly be able to differentiate "negative idle diff because we aren't submitting fast enough" and "negative idle diff due to held jobs"

Simply add the "Held" column to the factoryStatusNow troubleshoot view. That way if we sort by high negative Idle Diff we can focus on the sites that have little or no "Held" glideins to consider for sites we aren't submitting fast enough to.

Thanks,
Jeff Dost
OSG Glidein Factory Operations

History

#1 Updated by Burt Holzman almost 7 years ago

  • Assignee changed from Douglas Strain to Parag Mhashilkar

#2 Updated by Parag Mhashilkar over 6 years ago

  • Target version changed from v2_7_x to v3_x


Also available in: Atom PDF