Project

General

Profile

Support #23104

Double-check monitoring attributes from the Frontend activity log

Added by Lorena Lobato Pardavila 3 months ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
-
Start date:
08/12/2019
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

During a discussion with Marco about blackhole detection tests with 5 jobs, we have observed in the activity logs of the Frontend:

[2019-08-12 17:44:17,776] INFO: Iteration at Mon Aug 12 17:44:17 2019
[2019-08-12 17:44:17,778] INFO: Querying schedd, entry, and glidein status using child processes.
[2019-08-12 17:44:18,188] INFO: All children terminated
[2019-08-12 17:44:18,193] INFO: Jobs found total 5 idle 5 (good 5, old(10min 5, 60min 0),  grid 0, voms 0) running 0
[2019-08-12 17:44:18,197] INFO: Group glideins found total 0 limit 100000 curb 90000; of these idle 0 limit 1000 curb 200 running 0
[2019-08-12 17:44:18,198] INFO: Frontend glideins found total 0 limit 100000 curb 90000; of these idle 0 limit 1000 curb 200
[2019-08-12 17:44:18,198] INFO: Overall slots found total 0 limit 100000 curb 90000; of these idle 0 limit 1000 curb 200
[2019-08-12 17:44:18,199] INFO: Updating usermap
[2019-08-12 17:44:18,200] INFO: Match
[2019-08-12 17:44:18,214] INFO: Active forks = 3, Forks to finish = 9
[2019-08-12 17:44:18,249] INFO: Active forks = 3, Forks to finish = 6
[2019-08-12 17:44:18,279] INFO: Active forks = 3, Forks to finish = 3
[2019-08-12 17:44:18,306] INFO: Active forks = 0, Forks to finish = 0
[2019-08-12 17:44:18,307] INFO: All children terminated - took 0.106522798538 seconds
[2019-08-12 17:44:18,307] INFO: Total matching idle 5 (old 10min 5 60min 0) running 0 limit 10000
[2019-08-12 17:44:18,311] INFO:             Jobs in schedd queues                 |           Slots         |       Cores       | Glidein Req | Factory/Entry Information
[2019-08-12 17:44:18,312] INFO: Idle (match  eff   old  uniq )  Run ( here  max ) | Total  Idle   Run  Fail | Total  Idle   Run | Idle MaxRun | State Factory
[2019-08-12 17:44:18,315] INFO:     3(    5     3     3     0)     0(    0 10000) |     0     0     0     0 |     0     0     0 |     2     4 | Up   ITB_FC_CE2@gfactory_instance@gfactory_service@fermicloud018.fnal.gov
[2019-08-12 17:44:18,320] INFO:     3(    5     3     3     0)     0(    0 10000) |     0     0     0     0 |     0     0     0 |     2     4 | Up   ITB_FC_HTC_SIN_CE2@gfactory_instance@gfactory_service@fermicloud018.fnal.gov
[2019-08-12 17:44:18,322] INFO:             Jobs in schedd queues                 |           Slots         |       Cores       | Glidein Req | Factory/Entry Information
[2019-08-12 17:44:18,326] INFO: Idle (match  eff   old  uniq )  Run ( here  max ) | Total  Idle   Run  Fail | Total  Idle   Run | Idle MaxRun | State Factory
[2019-08-12 17:44:18,327] INFO:     6(   10     6     6     0)     0(    0 20000) |     0     0     0     0 |     0     0     0 |     4     8 | Up   Sum of useful factories
[2019-08-12 17:44:18,328] INFO:     0(    0     0     0     0)     0(    0     0) |     0     0     0     0 |     0     0     0 |     0     0 | Down Sum of down factories
[2019-08-12 17:44:18,328] INFO:     0(    0     0     0     0)     0(    0     0) |     0     0     0     0 |     0     0     0 |     0     0 | Down Unmatched

For 5 jobs and two entries, we see there are for example 3 idle for each entry, which is being a bit confusing if you do the total per each part (sum of useful factories)

TO DO: Double-check the Frontend monitoring attributes and how the values are being distributed between both entries. It could be great to have also some description in the glideinwms documentation about the attributes for new comers, as I cannot find references to eff,old and uniq. What are for and how they are being used.



Also available in: Atom PDF