Bug #11487
AWS Probe crashing while many VMs running
Start date:
01/25/2016
Due date:
% Done:
0%
Estimated time:
Description
11:11:43 CST Gratia: processing instance i-6dfa2caa 11:11:43 CST Gratia: Creating a Record 2016-01-25T17:11:43Z 11:11:43 CST Gratia: Creating a UsageRecord 2016-01-25T17:11:43Z 11:11:43 CST Gratia: the tags are 11:11:43 CST Gratia: Name: glidein_startup.sh 11:11:43 CST Gratia: getting instance data 11:11:43 CST Gratia: getting spot price 11:11:44 CST Gratia: ERROR: Error getting data for instance i-6dfa2caa from ec2 11:11:44 CST Gratia: 'NoneType' object has no attribute 'get' 11:11:44 CST Error in Gratia probe: 'NoneType' object has no attribute 'get'
Appears to be an uncaught exception when the AWS spot price query fails.
History
#1 Updated by Kevin Retzke about 5 years ago
For quick fix and to help with debugging modified probe on fermicloud370 to catch exceptions within the instance loop and print a stack trace; now only failing instances will not be sent rather than terminating the entire run.
[root@fermicloud370 ~]# diff -uw aws-gratia-probe /usr/share/gratia/awsvm/ --- aws-gratia-probe 2016-01-06 13:14:43.000000000 -0600 +++ /usr/share/gratia/awsvm/aws-gratia-probe 2016-01-25 18:37:32.000000000 -0600 @@ -2,7 +2,7 @@ import gratia.common.Gratia as Gratia import gratia.common.GratiaCore as GratiaCore import gratia.common.GratiaWrapper as GratiaWrapper -from gratia.common.Gratia import DebugPrint, Error +from gratia.common.Gratia import DebugPrint, Error, DebugPrintTraceback import boto3; from boto3.session import Session from pprint import pprint; @@ -108,6 +108,7 @@ owneracct=reservation['OwnerId'] instances=reservation['Instances'] for instance in instances: + try: DebugPrint(4,"processing instance %s"% instance['InstanceId']) r = Gratia.UsageRecord() # set the defaults @@ -244,9 +245,11 @@ r.ResourceType("AWSVM") r.CpuDuration(0,'system') r.AdditionalInfo("Version","1.0") - + except Exception as e: + DebugPrint(1,"ERROR: uncaught exception while processing instance, not sending record") + DebugPrintTraceback() + else: DebugPrint(4,"sending record") - Gratia.Send(r)
#2 Updated by Kevin Retzke about 5 years ago
- Status changed from New to Assigned
- Assignee set to Kevin Retzke
#3 Updated by Kevin Retzke about 5 years ago
There appears to be a bug in boto3 causing this exception:
09:13:37 CST Gratia: processing instance i-d8eb381f 09:13:37 CST Gratia: Creating a Record 2016-01-26T15:13:37Z 09:13:37 CST Gratia: Creating a UsageRecord 2016-01-26T15:13:37Z 09:13:37 CST Gratia: no tags 09:13:37 CST Gratia: getting instance data 09:13:37 CST Gratia: getting spot price 09:13:38 CST Gratia: ERROR: Error getting data for instance i-d8eb381f from ec2 09:13:38 CST Gratia: 'NoneType' object has no attribute 'get' 09:13:38 CST Gratia: ERROR: uncaught exception while processing instance, not sending record 09:13:38 CST Gratia: In traceback print (0) 09:13:38 CST Gratia: In traceback print (1) 09:13:38 CST Gratia: Traceback (most recent call last): File "/usr/share/gratia/awsvm/aws-gratia-probe", line 209, in process_session market_price=ec2_util.spot_price_at_termination(instance['InstanceId']) File "/usr/lib/python2.6/site-packages/gratia/awsvm/ec2_util.py", line 130, in spot_price_at_termination match = re.match(r"Service initiated \((.*)\)",instance.state_transition_reason) File "/usr/lib/python2.6/site-packages/boto3/resources/factory.py", line 214, in property_loader return self.meta.data.get(name) AttributeError: 'NoneType' object has no attribute 'get'
Quick fix is working, losing under 10 records (out of ~1500) per run with the error.