DCM stops reporting to Run Control
This seems to have started on or around 3/15/11. The main symptom is that the dcm will stop responding to Run Control heartbeat messages, may stop sending Ganglia data, but still sends out normal hit date. (I.e., the event display continues to look good.) So far when this has happened, it has been impossible to log into the DCM.
(As of the creation of this issue, there is no console line available. Coming soon.)
#2 Updated by Leon Mualem over 8 years ago
This seems to have happened again yesterday. I don't know if the not able to login part was also true or not. Here's the logbook entry.
5504, Tony Mann (mann), 04/16/2011 08:12:12 General
Karen K. phoned; she is at the detector and is going to be doing some scintillator filling this morning. She says the the previous experience is that the DAQ crashes when after she has the pumps on for awhile.
In an apparently independent happening, dcm-3-02-01 and dcm-302-02 have started skipping heartbeats.
Leon Mualem (mualem) 04/16/2011 08:39:02
CPU load is not high on any of the DCMs at this time, so why heartbeats (echoes) are not received, I don’t know why that is.
#4 Updated by Andrew Norman over 8 years ago
This morning (Sunday Apr. 17th) the machine at NDOS which has the console ports appears to no longer allow logins on the public side of the network.
I'm wondering if what we are seeing is a real problem with the network, that manifests itself in both the DCMs and now also this machine (i.e. the switch flipping out, or a network scanner running that causes problems)