Project

General

Profile

Dcm problems March2013 » History » Version 35

Ron Rechenmacher, 03/15/2013 02:23 PM

1 1 Peter Shanahan
h1. Dcm problems March2013
2 1 Peter Shanahan
3 1 Peter Shanahan
h2. DCM "hard" failures.
4 1 Peter Shanahan
5 1 Peter Shanahan
DCMs that can never be contacted.
6 1 Peter Shanahan
7 3 Peter Shanahan
|DCM| First reported by| Date first reported| Occurrences| Comments | Resolution |
8 2 Peter Shanahan
9 2 Peter Shanahan
10 2 Peter Shanahan
11 1 Peter Shanahan
h2. Fragile DCMs
12 1 Peter Shanahan
13 1 Peter Shanahan
DCMs that can be contacted for some amount of time before they seize. 
14 1 Peter Shanahan
15 8 Peter Shanahan
Notes:
16 1 Peter Shanahan
17 8 Peter Shanahan
* Please use the full dcm name in the table below, to aid searching.
18 8 Peter Shanahan
* The "Occurrences" count is the number of confirmed times the problem has occurred.  It may well be greater...
19 8 Peter Shanahan
20 9 Peter Shanahan
|DCM (location) | DCM (S/N) | First reported by| Date first reported| ECL Entries | Occurrences| Comments | Resolution |
21 18 Peter Shanahan
| dcm-2-01-03   |  dcm-1143          | Peter           |  3/8/13            | "800":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=800 "802":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=802 "808":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=808    |  2   | Andrew noted it was using it's SN name on 3/8/13 |  |
22 18 Peter Shanahan
| dcm-2-01-08  |  dcm-1039 (?) | Peter |          3/8/13 |  "800":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=800  "801":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=801       |          1 | Seems to have recovered. | |
23 29 Peter Shanahan
| dcm-2-03-01    |    dcm1220       |Peter            | 3/8/13               |   "800":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=800 "819":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=819         |   4       |     Got it to crash with repeated ssh "ps aux : grep ps"  |
24 17 Peter Shanahan
| dcm-2-03-02    |  dcm-1225      |Peter            | 3/8/13               |   "800":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=800 "802":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=802 "819":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=819         |   2 | Hung after starting ssh daemon      |          |
25 26 Peter Shanahan
| dcm-2-03-03   |  dcm-1227      |Peter            | 3/8/13               |   "800":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=800 "802":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=802         |   3 | Doesn't seem to finish reboot (Andrew)      |          |
26 29 Peter Shanahan
| dcm-2-03-06 |   dcm-1223      |     ?     |        ?      |      |     | Not included in partition since 2/20/13, but no comment why in ECL. |  |
27 24 Peter Shanahan
| dcm-2-03-09 |   dcm-1222         |     Peter        |  2/18/12            |      |      | Problems were general network/DDS issues.  Not clear there is a problem with this DCM. |  |
28 29 Peter Shanahan
| dcm-2-04-03 |  dcm-1151 |             |              |      |      |  |  |
29 29 Peter Shanahan
| dcm-2-04-06 |  dcm-1085  |             |              |      |      |  |  |
30 29 Peter Shanahan
| dcm-2-04-09 |  dcm-1224    |             |              |      |      |  |  |
31 29 Peter Shanahan
| dcm-2-04-10 |  dcm-1135   |   Peter          |   3/8/13   |  "810":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=810  "830":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=830  |  4  | Hung during ps/grep command, crashed during reserve resources.  Got it to crash twice with repeated ssh "ps aux : grep ps" |  |
32 29 Peter Shanahan
| dcm-2-04-11     |   dcm-1075        | Peter           | 2/15/13             |    "494":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=494 "819":http://dbweb0.fnal.gov/ECL/novashriver/E/show?e=819       |      many |         |       |        
33 29 Peter Shanahan
| dcm-2-04-12 |    dcm-1096     |             |              |      |      |  |  |
34 35 Ron Rechenmacher
|  set to fnal |    dcm-1095     |     Peter     |   near Feb.15  |      |      | back@fnal(15Mar)-(ssh-ps-aux+memtester+top) at least 1 occurence of hang with no console output |  |
35 30 Peter Shanahan
36 30 Peter Shanahan
37 30 Peter Shanahan
h2. DCMs from the first batch of 50
38 30 Peter Shanahan
39 30 Peter Shanahan
Rick K. was wondering if the 50 DCMs from the first batch (S/Ns 1006-1055) performed any better
40 30 Peter Shanahan
thank the rest.
41 30 Peter Shanahan
42 30 Peter Shanahan
As a start, here is where they live:
43 32 Peter Shanahan
| DCM (location) | DCM (S/N) | Comments |
44 33 Peter Shanahan
|dcm-2-01-10 | dcm1032 | Significant usage starting 3/14/13.  Pegged CPU at times, which A) is not really a symptom of the flaky DCMs, and B) it looks like that is consistent with the high data rates on this DCM |
45 31 Peter Shanahan
|dcm-2-01-07 | dcm1038| |
46 31 Peter Shanahan
|dcm-2-01-08 | dcm1039| |
47 31 Peter Shanahan
|dcm-2-01-09 | dcm1041| |
48 31 Peter Shanahan
|dcm-2-01-12 | dcm1043| |
49 31 Peter Shanahan
|dcm-2-01-11 | dcm1044| |
50 31 Peter Shanahan
|dcm-2-04-05 | dcm1051| |