Project

General

Profile

Feature #17825

Adding cores counting statistics to Factory monitoring

Added by Marco Mambelli about 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
10/04/2017
Due date:
% Done:

0%

Estimated time:
Stakeholders:

Factory Ops

Duration:

Description

This is a continuation of [#14559].
Jeff Dost sent a request suggesting the counters that would be more useful to Factory operators:

As an update,

I re-ran tests and checked the client ads the frontend publishes to the factory collector, and as Parag said, in fact the core counts are already sent over, and look correct.

Please see the ads attached, I did 3 runs again, for partitionable, fixed, and single core.

UCSDSleep_gw2_part_fact_ad was a re-run of a partitionable pilot that I sent 1 4-core user job to, and 2 single core user jobs to. Based on this, I think the factory monitoring is mapping the values in the following way:
GlideinMonitorGlideinsTotal -> Registered at collector
GlideinMonitorGlideinsRunning  -> Claimed
GlideinMonitorGlideinsIdle  -> Unmatched

I think that instead the monitoring should show the equivalent Core versions:
GlideinMonitorGlideinsTotaCores
GlideinMonitorGlideinsRunningCores
GlideinMonitorGlideinsIdleCores

And that should be enough.  If this change is made across the board, single core pilots and fixed slot pilots still will show the correct values, since in those cases the Cores and non-Cores ads happen to be identical.

If this is easy enough, do you think it can make it into 3.2.20 along with the rundiff update I suggested in the previous email?
Thanks,
Jeff

History

#1 Updated by Marco Mambelli about 2 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Marco Mambelli to Dennis Box

Changes are in v3/17825
./creation/web_base/factoryEntryStatusNow.html not changed, waiting for feedback from Factory ops. Will be in separate ticket

#2 Updated by Dennis Box about 2 years ago

  • Assignee changed from Dennis Box to Marco Mambelli

OK to merge. I want to test some more but that can be done with the release candidate

#3 Updated by Marco Mambelli about 2 years ago

  • Status changed from Feedback to Resolved

#4 Updated by Marco Mambelli almost 2 years ago

  • Status changed from Resolved to Closed

#5 Updated by Parag Mhashilkar almost 2 years ago

  • Stakeholders updated (diff)

#6 Updated by Marco Mambelli over 1 year ago

  • Stakeholders updated (diff)


Also available in: Atom PDF