Project

General

Profile

Bug #2333

Annoying debug messages in the factory logs when entries inactive

Added by Igor Sfiligoi almost 9 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Factory
Target version:
Start date:
01/23/2012
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I just noticed that the factory log directory has huge debug logs.

Looking into them, I see messages like these:
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[HCCHTPC_US_Clemson-Palmetto][total][periods][7200][ExitedHeld]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[NEBIO_US_Harvard_SBGrid_Abitibi][total][periods][7200][ExitedHeld]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[CMS_T2_DE_DESY_cr][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[CMS_T2_PL_Warsaw_ce3][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[CMS_T2_UK_SGrid_RALPP_hep206_grid][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[CMS_T2_UK_SGrid_RALPP_hep207_Grid1000][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[CMS_T2_UK_SGrid_RALPP_hep207_Grid2000][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[HCCHTPC_US_Clemson-Palmetto][total][periods][7200][EnteredRunning]
[2011-12-20T05:30:55-07:00 8497] aggregate_data, KeyError stats[NEBIO_US_Harvard_SBGrid_Abitibi][total][periods][7200][EnteredRunning]

My best guess is that it is due to inactivity of certain entries (which do not report a period if not used), and that the factory is just not intelligent enough to handle that silently.
I think this is just creating huge debug logs, and is harmless else... but someone should verify that.

I see this both at UCSD and CERN (we still use 2_5_2_1).


Subtasks

Bug #2415: Reduce the number of debug log messages when an entry becomes inactiveClosedKrista Larson

History

#1 Updated by John Weigand almost 9 years ago

Igor,

From what I am seeing in the code, these messages are something you considered some time ago. Note the comments on each. From factory/glideFactoryMonitorAggregator.py ...

if client == 'total':
try:
aggregate_output[client][res][data_set] += float(stats[entry][client]['periods'][res][data_set])
except KeyError: # well, some may be just missing.. can happen
glideFactoryLib.log_files.logDebug("aggregate_data, KeyError stats[%s][%s][%s][%s][%s]"%(entry,client,'periods',res,data_set))
else:
if stats[entry]['frontends'].has_key(client): # not all the entries have all the frontends
try:
aggregate_output[client][res][data_set] += float(stats[entry]['frontends'][client]['periods'][res][data_set])
except KeyError: # well, some may be just missing.. can happen
glideFactoryLib.log_files.logDebug("aggregate_data, KeyError stats[%s][....." )

Your thoughts on the risk of not generating this message?

There are some others that are just as annoying.

John Weigand

#2 Updated by Igor Sfiligoi almost 9 years ago

The problem is the quantity...
if it happened a couple of times a day, I would not complain.

But yesterday alone I got 1.8M of them on the UCSD factory!!!
(and there is nothing else in the debug log)

So this is not anymore "it sometimes happen".

As said in my original post, my gut feeling is they are not real problems.
But given how often it happens, I would like someone to actually think it through, and fix it in the appropriate way.
(which may be just not printing out anything, if it is indeed just a nuisance)

#3 Updated by Krista Larson almost 9 years ago

  • Assignee set to Krista Larson

#4 Updated by Krista Larson almost 9 years ago

Does someone know under what conditions these messages show up? I know I've seen these before but I'm having a hard time triggering them now.

#5 Updated by Igor Sfiligoi almost 9 years ago

I think they are due to the frontend(s) only asking glideins for a fraction of the supported entries.

#6 Updated by Krista Larson almost 9 years ago

Created child task to reduce the number of messages.

I'm having trouble reproducing the error to debug if this is a real issue or not. This will at least help the log size while not eliminating a possible useful message.

#7 Updated by Parag Mhashilkar over 8 years ago

  • Target version changed from v2_5_5 to v2_5_6

#8 Updated by Parag Mhashilkar over 8 years ago

  • Target version changed from v2_5_6 to v2_5_7

#9 Updated by Burt Holzman over 8 years ago

  • Target version changed from v2_5_7 to v2_7_x

#10 Updated by Burt Holzman over 8 years ago

  • Status changed from New to Closed

We seem to have a partial fix for this, and haven't been able to really reproduce it since. We can open a new ticket if the behavior recurs.

#11 Updated by Parag Mhashilkar over 8 years ago

  • Target version changed from v2_7_x to v2_6

Also available in: Atom PDF