Project

General

Profile

Bug #10883

Dealing with cmsstor412 and cmsstor415 losing disk issue

Added by Chih-Hao Huang about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Start date:
11/07/2015
Due date:
11/23/2015
% Done:

100%

Estimated time:
1.00 h
Spent time:
component:
base
First Occurred:
Occurs In:
Stakeholders:
Co-Assignees:
Duration: 17

Description

This is a place holder to record what has been done.

cmsstor412 and cmsstor415 lost disk and pools were automatically set to disabled.

History

#1 Updated by Chih-Hao Huang about 4 years ago

  • % Done changed from 0 to 90

[root@cmsstor412 log]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 943755732 2815772 892993208 1% /
tmpfs 32975340 0 32975340 0% /dev/shm
/dev/sda1 999320 41880 905012 5% /boot
/dev/sdb 75409749120 20303869748 55105879372 27% /storage/data1
/dev/sdc 75409749120 24160253620 51249495500 33% /storage/data2
[root@cmsstor412 log]# cd /storage/data1
[root@cmsstor412 data1]# ls
ls: cannot open directory .: Input/output error
[root@cmsstor412 data1]# cd /storage/data2
[root@cmsstor412 data2]# ls
ls: cannot open directory .: Input/output error
[root@cmsstor412 data2]# cd /var/log
[root@cmsstor412 log]# grep -i error messages
Nov 2 16:14:03 cmsstor412 mcelog: ERROR: AMD Processor family 21: mcelog does not support this processor. Please use the edac_mce_amd module instead.#012: Success
Nov 2 16:14:04 cmsstor412 xinetd7245: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/eklogin] [line=15]
Nov 2 16:14:04 cmsstor412 xinetd7245: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/ftp] [line=15]
Nov 2 16:14:04 cmsstor412 xinetd7245: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/klogin] [line=15]
Nov 2 16:14:04 cmsstor412 xinetd7245: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/kshell] [line=15]
Nov 2 16:14:04 cmsstor412 xinetd7245: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/telnet] [line=15]
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): metadata I/O error: block 0x86c1be98 ("xfs_trans_read_buf") error 5 buf count 4096
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): metadata I/O error: block 0x86c1be98 ("xfs_trans_read_buf") error 5 buf count 4096
Nov 6 18:25:48 cmsstor412 kernel: end_request: I/O error, dev sdb, sector 10920090864
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): metadata I/O error: block 0x11800677ef ("xlog_iodone") error 5 buf count 3584
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): Log I/O Error Detected. Shutting down filesystem
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:25:48 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:26:06 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:26:29 cmsstor412 kernel: qla2xxx [0000:04:00.0]-00af:6: Performing ISP error recovery - ha=ffff88041882a000.
Nov 6 18:26:36 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery
Nov 6 18:26:41 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127633312
Nov 6 18:26:41 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x100000030 ("xfs_buf_iodone_callbacks") error 5 buf count 8192
Nov 6 18:26:58 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x100000030 ("xfs_buf_iodone_callbacks") error 5 buf count 8192
Nov 6 18:26:58 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127634336
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127632288
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127631264
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127630240
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127629216
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127628192
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127627168
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127626144
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127625120
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127624096
Nov 6 18:27:48 cmsstor412 kernel: end_request: I/O error, dev sdc, sector 53127623072
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x118006d0b2 ("xlog_iodone") error 5 buf count 5632
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x118006d0bd ("xlog_iodone") error 5 buf count 1024
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x118006d0bf ("xlog_iodone") error 5 buf count 1536
Nov 6 18:27:48 cmsstor412 kernel: XFS (sdc): metadata I/O error: block 0x118006d0c2 ("xlog_iodone") error 5 buf count 4096
Nov 6 18:27:49 cmsstor412 kernel: XFS (sdb): metadata I/O error: block 0x11800677e9 ("xlog_iodone") error 5 buf count 3072
Nov 6 18:27:49 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:27:49 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:28:18 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:28:18 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:28:48 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:28:48 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:29:18 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:29:18 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:29:48 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:29:48 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 6 18:30:18 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 6 18:30:18 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
........
Nov 7 09:39:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:39:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:39:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:40:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:40:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:40:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:40:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:41:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:41:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:41:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:41:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:42:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:42:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:42:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:42:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:43:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:43:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:43:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:43:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:44:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:44:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:44:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:44:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:45:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:45:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:45:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:45:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:46:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:46:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:46:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:46:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:47:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:47:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:47:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:47:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:48:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:48:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:48:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:48:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:49:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:49:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:49:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:49:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:50:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:50:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:50:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:50:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:51:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:51:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:51:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:51:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:52:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:52:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:52:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:52:55 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:53:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:53:25 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:53:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:53:56 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:54:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:54:26 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:54:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:54:56 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:55:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:55:26 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:55:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:55:56 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:56:25 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:56:26 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
Nov 7 09:56:55 cmsstor412 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Nov 7 09:56:56 cmsstor412 kernel: XFS (sdc): xfs_log_force: error 5 returned.
[root@cmsstor412 log]# service stop puppet
stop: unrecognized service
[root@cmsstor412 log]# dcache stop
Stopping gridftp-cmsstor412Domain 0 1 done
Stopping w-cmsstor412-disk-disk2Domain 0 1 2 3 done
Stopping w-cmsstor412-disk-disk1Domain 0 1 2 done
[root@cmsstor412 log]# umount /storage/data1 /storage/data2
[root@cmsstor412 log]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 943755732 2811936 892997044 1% /
tmpfs 32975340 0 32975340 0% /dev/shm
/dev/sda1 999320 41880 905012 5% /boot
[root@cmsstor412 log]# mount -a
[root@cmsstor412 log]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 943755732 2811936 892997044 1% /
tmpfs 32975340 0 32975340 0% /dev/shm
/dev/sda1 999320 41880 905012 5% /boot
/dev/sdd 75409749120 20303869748 55105879372 27% /storage/data1
/dev/sde 75409749120 24147703740 51262045380 33% /storage/data2
[root@cmsstor412 log]# cd /storage/data1
[root@cmsstor412 data1]# ls -l
total 0
drwxr-xr-x 4 root root 71 Nov 2 16:16 write-pool
[root@cmsstor412 data1]# cd /storage/data2
[root@cmsstor412 data2]# ls -l
total 0
drwxr-xr-x 4 root root 71 Nov 2 16:16 write-pool
[root@cmsstor412 ~]# dcache start
Starting w-cmsstor412-disk-disk1Domain done
Starting w-cmsstor412-disk-disk2Domain done
Starting gridftp-cmsstor412Domain done
[root@cmsstor412 ~]# service puppet start
Starting puppet agent:
[root@cmsstor412 ~]# dcache status
DOMAIN STATUS PID USER
w-cmsstor412-disk-disk1Domain running 31328 root
w-cmsstor412-disk-disk2Domain running 31395 root
gridftp-cmsstor412Domain running 31462 root
[root@cmsstor412 ~]# service puppet status
puppet (pid 7535) is running...
[root@cmsstor412 ~]#

#2 Updated by Chih-Hao Huang about 4 years ago

cmsstor415 had exactly the same kind of error and was fixed in the same way.

#3 Updated by Gerard Bernabeu Altayo about 4 years ago

  • Status changed from Assigned to Resolved

Incidents should be reported in SNOW, I did so in INC000000624068 so closing this.

#4 Updated by Chih-Hao Huang about 4 years ago

  • Due date set to 11/23/2015
  • % Done changed from 90 to 100

qla2xxx firmware was updated to 8.01.02, along with other nodes (cmsstor411 - cmsstor434)
See Task #10967



Also available in: Atom PDF