BST1 and IPMCNT are not responding

Added by Richard Neswold 10 months ago. Updated 9 months ago.

Both BST1 and IPMCNT front-ends have been having problems. They are fine for a while after reboot but then become unresponsive. They are both scraper front-ends and use the same software. They're both MVME-162-based systems.


#1 Updated by Richard Neswold 10 months ago

The last changes Briegel made to them was to upgrade to MOOC 4.6. Unfortunately he used -latest symbolic links in their start-up scripts. I cleaned up the scripts by specifying the versions I believe are compatible with each other. After a reboot, BST1 seemed to last longer, but still became unresponsive.

I've never modified the code, so it's at least 4 years old. The fact that both front-ends are the same hardware and the start-up scripts load the same software makes me think it's a software version issue (rebooting them loaded in a combination that wasn't quite reliable -- which is the reason we shouldn't ever use the symbolic links!)

#2 Updated by Richard Neswold 10 months ago

To rule out a hardware issue, I added shellScriptAbort() after the modules are loaded but before anything is started. BST1 should simply be an idle VxWorks machine. If the Ethernet interface still shuts down, I'll start swapping hardware.

#3 Updated by Richard Neswold 10 months ago

I remembered this: #19486

I set both front-ends' version devices to "document". I'll restart both front-ends to see if this helps.

#4 Updated by Richard Neswold 10 months ago

BST1 was locked up today so that didn't work.

I'm still not convinced it's a hardware problem:

  • One machine is in MI and the other is in Booster (rules out power surge in an area, or any other environmental issue that would cause both to fail.)
  • They are identical systems (MVME-162, both are a scraper front-end and both are running the exact same code.)
  • They both were rebooted recently and are now having the same problem.

#5 Updated by Richard Neswold 9 months ago

(focusing on BST1; what fix I find, I'll apply to IPMCNT.)

I let BST1 run overnight with all the code loaded but nothing started. It was still responding the next morning. This rules out hardware issues. Unfortunately, because Briegel chose to use the -latest symbolic link, I have no way to tell what versions of software he originally released. I'm looking through the drivers for buffer overruns and other unsafe coding practices.

