Project

General

Profile

Bug #8957

Follow the enhanced procedure for adding new pools on cmsstor112 and add it to dCache disk Test

Added by Gerard Bernabeu Altayo over 4 years ago. Updated over 4 years ago.

Status:
Accepted
Priority:
Normal
Start date:
06/12/2015
Due date:
06/16/2015
% Done:

0%

Estimated time:
4.00 h
Spent time:
component:
base
First Occurred:
Occurs In:
Stakeholders:
Co-Assignees:
Duration: 5

Description

cmsstor112 is going to be a PERMANENT pool in the dCache diks test/itb.


Related issues

Follows CMS dCache - Task #8956: ECF-CIS tests the enhanced acceptance test on cmsstor112 and see that it works fine.Resolved05/27/201506/01/2015

Follows CMS dCache - Task #8928: Enhance procedure for adding new poolsResolved05/27/201506/11/2015

History

#1 Updated by Gerard Bernabeu Altayo over 4 years ago

  • Follows Task #8956: ECF-CIS tests the enhanced acceptance test on cmsstor112 and see that it works fine. added

#2 Updated by Gerard Bernabeu Altayo over 4 years ago

  • Follows Task #8928: Enhance procedure for adding new pools added

#4 Updated by Natalia Ratnikova over 4 years ago

  • Status changed from New to Accepted

Informed Paul (primary) that node is going to be repurposed, and alarms can be ignored.

Node is in unconfigured state in the ENC, but needs to be re-installed, as it still has some tape related stuff:

cmsstor112.fnal.gov - unconfigured/production (SLF 6.6)
4-core Core Opteron 265 (H8DSP-8); 7.63 GB RAM, 16.00 GB swap
WARNING: based on your node, cmsstor112.fnal.gov, ENSTORE_CONFIG_HOST has been set to conf-stken.fnal.gov
WARNING: If this is not correct; either reset ENSTORE_CONFIG_HOST by hand, set ENSTORE_USER_DEFINED_CONFIG_HOST by hand before running setup or use a qualifier in your setup command!

Node still belongs to dcache_pool group in zabbix.

Follow the procedure steps outlined in https://cmsweb.fnal.gov/bin/view/Storage/PoolAdd:

1)
[root@cmsstor112 ~]# service puppet status
puppet (pid 25852) is running...
[root@cmsstor112 ~]# service puppet stop
Stopping puppet agent: [ OK ]

2) empty the disks:
[root@cmsstor112 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 75758688 10640520 61263116 15% /
tmpfs 3997988 0 3997988 0% /dev/shm
/dev/sda1 999320 94196 852696 10% /boot
/dev/sde 11720232448 97004 11720135444 1% /storage/data3
/dev/sdc 11720232448 723614844 10996617604 7% /storage/data1
11720232448 95828 11720136620 1% /storage/data2

Removed all data on the disks, however failed to remove .rocks-release files.
Comeone set attribute immutable on these files: unsetting it and removing:

[root@cmsstor112 data3]# lsattr a .rocks-release
----i---------
.rocks-release
[root@cmsstor112 data3]# man chattr
[root@cmsstor112 data3]# man lsattr
[root@cmsstor112 data3]# man lsattr
[root@cmsstor112 data3]# man chattr
[root@cmsstor112 data3]# man chattr
[root@cmsstor112 data3]#
[root@cmsstor112 data3]# chattr -i .rocks-release
[root@cmsstor112 data3]# rm .rocks-release
rm: remove regular empty file `.rocks-release'? y
[root@cmsstor112 data3]# find .
.
[root@cmsstor112 data3]# cd ../
[root@cmsstor112 storage]# chattr -i data.rocks-release
data1/ data2/ data3/
[root@cmsstor112 storage]# chattr -i data*/.rocks-release
[root@cmsstor112 storage]# rm -f data*/.rocks-release
[root@cmsstor112 storage]# find
.
./data1
./data3
./data2

3)
Remount and relabele disks:
[root@cmsstor112 ~]# umount /dev/sdc;
[root@cmsstor112 ~]# umount /dev/sdd;
[root@cmsstor112 ~]# xfs_admin -L dcache-disk1 /dev/sdb;
xfs_admin: /dev/sdb is not a valid XFS filesystem (unexpected SB magic number 0xfab80010)
[root@cmsstor112 ~]# xfs_admin -L dcache-disk2 /dev/sdc;
writing all SBs
new label = "dcache-disk2"
[root@cmsstor112 ~]# xfs_admin -L dcache-disk3 /dev/sdd;
writing all SBs
new label = "dcache-disk3"
[root@cmsstor112 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 75758688 10640604 61263032 15% /
tmpfs 3997988 0 3997988 0% /dev/shm
/dev/sda1 999320 94196 852696 10% /boot
/dev/sde 11720232448 33824 11720198624 1% /storage/data3
[root@cmsstor112 ~]# mount -a
mount: special device LABEL=dcache-tape1 does not exist
mount: special device LABEL=dcache-tape2 does not exist

[root@cmsstor112 ~]# fdisk -l

Disk /dev/sdb: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f3c35

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1 1 2433 19543041 82 Linux swap / Solaris
/dev/sdb2 2434 4522 16777216 82 Linux swap / Solaris

Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003172a

Device Boot      Start         End      Blocks   Id  System
/dev/sda1 * 1 131 1048576 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 131 9730 77101056 83 Linux

Disk /dev/sdc: 12001.7 GB, 12001652244480 bytes
255 heads, 63 sectors/track, 1459117 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 402653184 bytes
Disk identifier: 0x00000000

Disk /dev/sdd: 12001.7 GB, 12001652244480 bytes
255 heads, 63 sectors/track, 1459117 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 402653184 bytes
Disk identifier: 0x00000000

Disk /dev/sde: 12001.7 GB, 12001652244480 bytes
255 heads, 63 sectors/track, 1459117 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 402653184 bytes
Disk identifier: 0x00000000

[root@cmsstor112 ~]# cat /etc/fstab
  1. HEADER: This file was autogenerated at Thu Mar 05 20:13:05 -0600 2015
  2. HEADER: by puppet. While it can still be managed manually, it
  3. HEADER: is definitely not recommended.
#
  1. /etc/fstab
  2. Created by anaconda on Wed May 21 20:46:21 2014 #
  3. Accessible filesystems, by reference, are maintained under '/dev/disk'
  4. See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info #
    UUID=bfe71ea2-4d32-4fd6-a2be-6018b3d2ae1a / ext4 defaults 1 1
    UUID=3d95f9e3-79d4-4ad7-9f91-6b568770ed9b /boot ext4 defaults 1 2
    UUID=23da0f9b-1d66-4a40-9da8-88fec3cfeb6a swap swap defaults 0 0
    tmpfs /dev/shm tmpfs defaults 0 0
    devpts /dev/pts devpts gid=5,mode=620 0 0
    sysfs /sys sysfs defaults 0 0
    proc /proc proc defaults 0 0
    LABEL=dcache-tape3 /storage/data3 xfs nobarrier,inode64,defaults 1 1
    LABEL=dcache-tape1 /storage/data1 xfs nobarrier,inode64,defaults 1 1
    LABEL=dcache-tape2 /storage/data2 xfs nobarrier,inode64,defaults 1 1
    [root@cmsstor112 ~]#

Looks like non-standard disks layout on this node.

[root@cmsstor112 ~]# for d in `fdisk -l | tr " " "\n" | grep /dev/ | grep : | tr -d :`; do echo list labels for $d: ; xfs_admin -l $d; done
list labels for /dev/sdb:
xfs_admin: /dev/sdb is not a valid XFS filesystem (unexpected SB magic number 0xfab80010)
list labels for /dev/sda:
xfs_admin: /dev/sda is not a valid XFS filesystem (unexpected SB magic number 0xeb489010)
list labels for /dev/sdc:
label = "dcache-disk2"
list labels for /dev/sdd:
label = "dcache-disk3"
list labels for /dev/sde:
label = "dcache-tape3"

Redoing manually the whole relabeling step:

[root@cmsstor112 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 75758688 10640664 61262972 15% /
tmpfs 3997988 0 3997988 0% /dev/shm
/dev/sda1 999320 94196 852696 10% /boot
/dev/sde 11720232448 33824 11720198624 1% /storage/data3
[root@cmsstor112 ~]# umount /storage/data3
[root@cmsstor112 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 75758688 10640676 61262960 15% /
tmpfs 3997988 0 3997988 0% /dev/shm
/dev/sda1 999320 94196 852696 10% /boot
[root@cmsstor112 ~]# xfs_admin -L dcache-disk1 /dev/sdc
writing all SBs
new label = "dcache-disk1"
[root@cmsstor112 ~]# xfs_admin -L dcache-disk2 /dev/sdd
writing all SBs
new label = "dcache-disk2"
[root@cmsstor112 ~]# xfs_admin -L dcache-disk3 /dev/sde
writing all SBs
new label = "dcache-disk3"
[root@cmsstor112 ~]# for d in `fdisk -l | tr " " "\n" | grep /dev/ | grep : | tr -d :`; do echo list labels for $d: ; xfs_admin -l $d; done
list labels for /dev/sdb:
xfs_admin: /dev/sdb is not a valid XFS filesystem (unexpected SB magic number 0xfab80010)
list labels for /dev/sda:
xfs_admin: /dev/sda is not a valid XFS filesystem (unexpected SB magic number 0xeb489010)
list labels for /dev/sdc:
label = "dcache-disk1"
list labels for /dev/sdd:
label = "dcache-disk2"
list labels for /dev/sde:
label = "dcache-disk3"
[root@cmsstor112 ~]#

4) nothing to be done - labels already correspond to
./hieradata/dcachepool.yaml

5)
[cmsdev33 12:39] git show
commit 94dd5268646272217e2f762afde31b1ee0007c95
Author: Natalia Ratnikova <>
Date: Mon Aug 3 16:32:32 2015 -0500

Adding cmsstor112 to dcache disk testbed (redmine #8957)

diff --git a/hosts/cmsstor112.fnal.gov.yaml b/hosts/cmsstor112.fnal.gov.yaml
index 7decb15..1e4f76e 100644
--- a/hosts/cmsstor112.fnal.gov.yaml
++ b/hosts/cmsstor112.fnal.gov.yaml
@ -1,8 +1,4 @
classes:
- role::unconfigured:
role::dcache::pool::disk_itb:
ipmi::disabled:
-

parameters:
checkmk_extra:
- - unmonitored
[cmsdev33 12:39] git push

6)

[root@cmsadmin1 ~]# cms-shoot cmsstor112
removing host from rocks on cmsrocks51, if necessary
cmsstor24.fnal.gov: no host cmsstor112 to remove
Connection to cmsrocks51 closed.
removing host from rocks on cmsrocks52, if necessary
cmssrv26.fnal.gov: no host cmsstor112 to remove
Connection to cmsrocks52 closed.
stopping puppet on cmsstor112, if applicable
telling host to netboot on next boot
cmsstor112: netboot -> True
set 1 hosts to boot
1 system(s) updated
telling cmspuppetca to remove host's cert, if present
telling cmspuppetca to update autosign information
when you're ready to start, run:
cmspower-powerit --action cycle --comment 'reinstalling' cmsstor112
don't forget to disable zabbix monitoring if applicable
[root@cmsadmin1 ~]#

Set "no tmonitored" status in zabbix for cmsstor112

power-cycle did not start re-install - the node was trying to reboot and entered maintenance mode because of the disk labels mismatch.
(still looking for disks with tape instance labels ).

Retried a few times - same thing.

Powering the node off until Tim or someone else in ECF can look into this.

[root@cmsconsole ~]# cmspower-powerit --action off -c "Reshoot does not initiate reinstall. Power off for now. NR." cmsstor112 === cmsstor112 ===
connecting to APC APCCMS1060-2, outlet 6
Outlet state: OFF
[root@cmsconsole ~]# cmspower-powerit --action status cmsstor112 === cmsstor112 ===
connecting to APC APCCMS1060-2, outlet 6
Outlet state: OFF

#5 Updated by Natalia Ratnikova over 4 years ago

Node has been added successfully.
However the downtime reboot has revealed problem with mounts in the implementation of the roles for dcache itb servers,
and until that is fixed, testbed is in unusable state.

So, althoug halmost allwork here is done, I am putting this ticket on hold until I can test it properly.



Also available in: Atom PDF