Dcm problems March2013 » History » Version 39

« Previous - Version 39/61 (diff) - Next » - Current version
Andrew Norman, 03/19/2013 01:33 PM

Dcm problems March2013

DCM "hard" failures.

DCMs that can never be contacted.

DCM First reported by Date first reported Occurrences Comments Resolution

Fragile DCMs

DCMs that can be contacted for some amount of time before they seize.


  • Please use the full dcm name in the table below, to aid searching.
  • The "Occurrences" count is the number of confirmed times the problem has occurred. It may well be greater...
DCM (location) DCM (S/N) First reported by Date first reported ECL Entries Occurrences Comments Resolution
dcm-2-01-03 dcm-1143 Peter 3/8/13 800 802 808 2 Andrew noted it was using it's SN name on 3/8/13
dcm-2-01-08 dcm-1039 (?) Peter 3/8/13 800 801 1 Seems to have recovered.
dcm-2-03-01 dcm1220 Peter 3/8/13 800 819 Highly repeatable Got it to crash with repeated ssh "ps aux : grep ps"
dcm-2-03-02 dcm-1225 Peter 3/8/13 800 802 819 2 Hung after starting ssh daemon
dcm-2-03-03 dcm-1227 Peter 3/8/13 800 802 Highly repeatable Doesn't seem to finish reboot (Andrew)
dcm-2-03-06 dcm-1223 ? ? Not included in partition since 2/20/13, but no comment why in ECL.
dcm-2-03-09 dcm-1222 Peter 2/18/12 Problems were general network/DDS issues. Not clear there is a problem with this DCM.
dcm-2-04-03 dcm-1151
dcm-2-04-06 dcm-1085
dcm-2-04-09 dcm-1224
dcm-2-04-10 dcm-1135 Peter 3/8/13 810 830 Highly repeatable Hung during ps/grep command, crashed during reserve resources. Got it to crash twice with repeated ssh "ps aux : grep ps"
dcm-2-04-11 dcm-1075 Peter 2/15/13 494 819 many
dcm-2-04-12 dcm-1096
set to fnal dcm-1095 Peter near Feb.15 back@fnal(15Mar)-(ssh-ps-aux+memtester+top) at least 1 occurence of hang with no console output

DCMs from the first batch of 50

Rick K. was wondering if the 50 DCMs from the first batch (S/Ns 1006-1055) performed any better
thank the rest.

As a start, here is where they live:
DCM (location) DCM (S/N) Comments
dcm-2-01-10 dcm1032 Significant usage starting 3/14/13. Pegged CPU at times, which A) is not really a symptom of the flaky DCMs, and B) it looks like that is consistent with the high data rates on this DCM
dcm-2-01-07 dcm1038
dcm-2-01-08 dcm1039
dcm-2-01-09 dcm1041
dcm-2-01-12 dcm1043
dcm-2-01-11 dcm1044
dcm-2-04-05 dcm1051

Test Proceedures

The following document different types of testing that were done:

CPU Burning Tests
Network Data Copy Tests