Project

General

Profile

Bug #20861

Error message related to entry in the Factory logs

Added by Lorena Lobato Pardavila almost 2 years ago. Updated 3 months ago.

Status:
Work in progress
Priority:
High
Category:
-
Target version:
Start date:
09/17/2018
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

During the RC1 for v3.4.1, a message related to "Writing stats for entries" has been observed in the Factory logs(In this example: /var/log/gwms-factory/server/entry_ITB_FC_CE2/ITB_FC_CE2.err.log):

[2018-09-17 13:55:39,751] DEBUG: glideFactoryEntry:1037: Checking security credentials for client fermicloud364-fnal-gov_OSG_gWMSFrontend.main
[2018-09-17 13:55:40,021] WARNING: glideFactoryEntryGroup:533: Error writing stats for entry 'ITB_FC_CE2'
[2018-09-17 13:55:40,021] ERROR: glideFactoryEntryGroup:534: Error writing stats for entry 'ITB_FC_CE2':
Traceback (most recent call last):
  File "/usr/sbin/glideFactoryEntryGroup.py", line 530, in iterate
    entry.writeStats()
  File "/usr/lib/python2.7/site-packages/glideinwms/factory/glideFactoryEntry.py", line 716, in writeStats
    self.gflFactoryConfig.log_stats.write_job_info(scheddName=self.scheddName, collectorName=self.gfiFactoryConfig.factory_collector)
  File "/usr/lib/python2.7/site-packages/glideinwms/factory/glideFactoryMonitoring.py", line 1246, in write_job_info
    'condor_duration': jobstats['condor_duration'],
KeyError: 'condor_duration'
[2018-09-17 13:56:39,648] DEBUG: glideFactoryEntry:1037: Checking security credentials for client fermicloud364-fnal-gov_OSG_gWMSFrontend.main

It does not seem to affect the functioning but must be investigated for v3.5.

History

#1 Updated by Marco Mambelli over 1 year ago

  • Assignee set to Lorena Lobato Pardavila

#2 Updated by Lorena Lobato Pardavila over 1 year ago

  • Status changed from New to Feedback
  • Assignee changed from Lorena Lobato Pardavila to Marco Mascheroni

Python raises a KeyError whenever a dict() object is requested and the key is not in the dictionary (in this case, with condor_duration and stats from the dictionary of all jobs that have "Entered" the "Completed" states ). Replaced the assignments format a = adict[key]) with the get() method of the dictionary to make sure we return the value for the key.

Changes in v34/20861.

#3 Updated by Marco Mascheroni about 1 year ago

  • Target version changed from v3_5 to v3_5_1

#4 Updated by Marco Mascheroni 12 months ago

  • Target version changed from v3_5_1 to v3_6_1

#5 Updated by Marco Mascheroni 10 months ago

  • Target version changed from v3_6_1 to v3_6_2

#6 Updated by Marco Mambelli 8 months ago

  • Assignee changed from Marco Mascheroni to Dennis Box

#7 Updated by Dennis Box 7 months ago

  • Assignee changed from Dennis Box to Marco Mambelli

#8 Updated by Marco Mambelli 7 months ago

  • Priority changed from Normal to High
  • Assignee changed from Marco Mambelli to Marco Mascheroni
  • Status changed from Feedback to Work in progress

The branch seems to be v35/20861 (not v34/20861)

I see some problems in the code, some raised also by Dennis. I'm reassigning this to Marco Mascheroni that worked on this with Lorena.
Marco, if you are not aware of the ticket, this has to become a new ticket assigned to someone. Requires rework, more than a review.

Questions about the current changes

  • factory/glideFactoryLogParser.py
  • An assignment became a get? (lines 445:447)
    if condor_duration is not None:
    out['condor_duration']=condor_duration
    out['stats']=slot_stats
    -->
    if condor_duration is not None:
    out.get('condor_duration', condor_duration)
    out.get('stats', slot_stats)
  • Then look if out['stats'] should be assigned anyway or it makes sense to assign it only if condor_duration is not None (as it is now)
  • factory/glideFactoryMonitoring.py
    enle['condor_duration'] and enle_stats['condor_duration'] have been replaced w/ get.
    - should instead condor_duration be always assigned in glideFactoryLogParser (whenever condor started)?
    - will the None value cause problems, should a default be used instead (I see below 'activation_claims': jobstats.get('activations_claims', 'unknown'), ...)?
    - should other variables also use get? Why (in both yes/no cases)?

Marco Mambelli

#9 Updated by Marco Mascheroni 7 months ago

  • Target version changed from v3_6_2 to v3_6_3

#10 Updated by Marco Mascheroni 3 months ago

  • Target version changed from v3_6_3 to v3_6_4


Also available in: Atom PDF