MECAR periodically hard crashes
MECAR periodically hard crashes. This happens with a frequency of about once every 7 to 10 days. After is happens a power cycle of the VME chassis is needed to get MECAR back up and running.
After a power cycle MECAR will then run fine for another 7 to 10 days.
#1 Updated by Kevin Martin almost 5 years ago
Connected a serial cable from MECAR's serial console to the MI 60N Transient Recorder PC (it's in the next rack over).
I now can use a serial terminal program on transrec-mi-60n (minicom) to on an on going basis monitor MECAR's console.
My idea is that I might be able to learn one of two things:
1) MECAR spits out some error message just before it dies that could point me to the cause of the crashes OR
2) MECAR isn't even crashing. This could just be a network problem (like a bad switch port).
#3 Updated by Kevin Martin almost 5 years ago
The crashing is still occurring even after swapping the VME Chassis. So, today I am swapping additional hardware. Three VME card are being swapped (from the backup MECAR chassis in the uPit):
1) SSM card
2) PS Link Transmitter Card
3) PS Link Receiver Card
#4 Updated by Kevin Martin over 4 years ago
No MECAR crashes occurred for 2 months. Because of this I started thinking the problem was fixed (without knowing what the actual problem had been). Then on Thursday, May 21 at 13:29, MECAR crashed yet again. It then crashed a second time when it was being rebooted.
#5 Updated by Kevin Martin over 4 years ago
I have noticed for a while that the PMCUCD driver has been spitting out quite a few messages lately. Upon digging through the PMCUCD driver's code to see the meaning of these messages I noticed that one of the messages I saw on MECAR's console just as it crashed was a printf() call from a ISR in the pmctrig_init.c module. Printf() calls are forbidden in ISRs and cause hard crashes just like have been occurring in MECAR.
I informed Charlie Briegel (the PMCUCD driver's keeper). He has fixed the code and the modified code was put into MECAR on Tuesday, May 26, 2015 at 1000.
Now we just wait and see if this was the sole cause of the MECAR crashes...or not.