Project

General

Profile

Bug #8134

MECAR periodically hard crashes

Added by Kevin Martin almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Start date:
03/20/2015
Due date:
% Done:

0%

Estimated time:
16.00 h
Spent time:
Duration:

Description

MECAR periodically hard crashes. This happens with a frequency of about once every 7 to 10 days. After is happens a power cycle of the VME chassis is needed to get MECAR back up and running.

After a power cycle MECAR will then run fine for another 7 to 10 days.

History

#1 Updated by Kevin Martin almost 5 years ago

Connected a serial cable from MECAR's serial console to the MI 60N Transient Recorder PC (it's in the next rack over).

I now can use a serial terminal program on transrec-mi-60n (minicom) to on an on going basis monitor MECAR's console.

My idea is that I might be able to learn one of two things:
1) MECAR spits out some error message just before it dies that could point me to the cause of the crashes OR
2) MECAR isn't even crashing. This could just be a network problem (like a bad switch port).

#2 Updated by Kevin Martin almost 5 years ago

  • Status changed from New to Assigned

We swapped the operation MECAR Chassis out with the known good one from the uPit. This is to test if the crashing problem is caused but a marginally failing chassis power supply.

#3 Updated by Kevin Martin almost 5 years ago

The crashing is still occurring even after swapping the VME Chassis. So, today I am swapping additional hardware. Three VME card are being swapped (from the backup MECAR chassis in the uPit):

1) SSM card
2) PS Link Transmitter Card
3) PS Link Receiver Card

#4 Updated by Kevin Martin over 4 years ago

No MECAR crashes occurred for 2 months. Because of this I started thinking the problem was fixed (without knowing what the actual problem had been). Then on Thursday, May 21 at 13:29, MECAR crashed yet again. It then crashed a second time when it was being rebooted.

#5 Updated by Kevin Martin over 4 years ago

I have noticed for a while that the PMCUCD driver has been spitting out quite a few messages lately. Upon digging through the PMCUCD driver's code to see the meaning of these messages I noticed that one of the messages I saw on MECAR's console just as it crashed was a printf() call from a ISR in the pmctrig_init.c module. Printf() calls are forbidden in ISRs and cause hard crashes just like have been occurring in MECAR.

I informed Charlie Briegel (the PMCUCD driver's keeper). He has fixed the code and the modified code was put into MECAR on Tuesday, May 26, 2015 at 1000.

Now we just wait and see if this was the sole cause of the MECAR crashes...or not.

#6 Updated by Kevin Martin over 4 years ago

  • Status changed from Assigned to Resolved

Crash hasn't happened since Charlie B. fixed his code. So, I am considering this problem to be fixed.

#7 Updated by Kevin Martin over 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF