Project

General

Profile

Bug #10764

Investigation on odd configuration in some of cmsstor411 - cmsstor420

Added by Chih-Hao Huang about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Start date:
11/03/2015
Due date:
11/10/2015
% Done:

100%

Estimated time:
8.00 h
Spent time:
component:
base
First Occurred:
Occurs In:
Stakeholders:
Co-Assignees:
Duration: 8

Description

This is a placeholder to record the investigation of strange configurations on some of the pools among cmsstor411 - cmsstor419

cmsstor411 - cmsstor420 are new pool nodes recently added to dCache disk instance.
Some of the pools, such as pools on cmsstor419 show max 104 movers, while the number is supposed to be 3000, comparing to the pools on cmsstor420.
The setup files are exactly the same.

Screen Shot 2015-11-03 at 11.08.41 PM.png (382 KB) Screen Shot 2015-11-03 at 11.08.41 PM.png Pool Request Queues Chih-Hao Huang, 11/03/2015 11:21 PM
Screen Shot 2015-11-03 at 11.42.25 PM.png (382 KB) Screen Shot 2015-11-03 at 11.42.25 PM.png Chih-Hao Huang, 11/03/2015 11:43 PM

History

#2 Updated by Chih-Hao Huang about 4 years ago

  • % Done changed from 0 to 10

Pool info does show difference between w-cmsstor420-disk-disk1 and w-cmsstor419-disk-disk1.

11:20pm cmsadmin1.fnal.gov:~> dcacheadmin-disk
Warning: Remote host denied authentication agent forwarding.

dCache Admin (VII) (user=admin)

[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor420-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk1) admin > info a
--
csm (Checksum module) ---
Version : $Id$
Checksum type : ADLER32
Checkum calculation on : write enforceCRC

--- flush (Controller for centralising flushing) ---
Flushing Interval /seconds : 60
Maximum classes flushing : 1000
Minimum flush delay on error : 60
Remote controlled (hold until) : Locally Controlled

--- storagehandler (HSM integration module) ---
Version : $Id$
Restore Timeout : 172800
Store Timeout : 172800
Remove Timeout : 14400
Job Queues
to store 0(100)/0
from store 0(300)/0
delete (1)/

--- jtm (Job timeout manager) ---
Job Timeout Manager
regular (lastAccess=0;total=0)
p2p (lastAccess=0;total=0)
default (lastAccess=0;total=0)
WAN (lastAccess=0;total=0)
io (lastAccess=0;total=0)
io-0 (lastAccess=0;total=0)
io-1 (lastAccess=0;total=0)

--- pool (Main pool component) ---
Base directory : /storage/data1/write-pool/
Version : 2.2.29(2015-08-18_11-05_litvinse) (Sub=4)
Gap : 4294967296
Report remove : on
Pool Mode : enabled
Clean prec. files : on
Hsm Load Suppr. : on
Ping Heartbeat : 30 seconds
ReplicationMgr : Disabled
LargeFileStore : Precious
DuplicateRequests : Ignored
P2P File Mode : Cached
Mover Queue (regular) 2(1000)/0
Mover Queue (p2p) 0(50)/0
Mover Queue (default) 0(1000)/0
Mover Queue (WAN) 0(1000)/0

--- queue (HSM flush queue manager) ---
Version : $Id$
Classes : 0
Requests : 0

--- migration (Replica migration module client) ---

--- migration-server (Replica migration module backend) ---

--- pp (Pool to pool transfer manager) ---
Interface : cmsstor420.fnal.gov/131.225.205.122
Max Active : 50
Pnfs Timeout : 300 seconds

--- rep (Repository manager) ---
State : OPEN
Check Repository : true
Diskspace usage :
Total : 77215194343830
Used : 2448175983379 [0.031705886]
Free : 74767018360451
Precious : 0 [0.0]
Removable: 0 [0.0]
File system
Size: 77219583098880
Free: 74774237880320 [0.9683326]
Limits for maximum disk space
File system : 77222413863699
Statically configured: Infinity
Runtime configured : Infinity

[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor419-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > info a
--
csm (Checksum module) ---
Version : $Id$
Checksum type : ADLER32
Checkum calculation on : write enforceCRC

--- flush (Controller for centralising flushing) ---
Flushing Interval /seconds : 60
Maximum classes flushing : 1000
Minimum flush delay on error : 60
Remote controlled (hold until) : Locally Controlled

--- storagehandler (HSM integration module) ---
Version : $Id$
Restore Timeout : 14400
Store Timeout : 14400
Remove Timeout : 14400
Job Queues
to store 0(0)/0
from store 0(0)/0
delete (1)/

--- jtm (Job timeout manager) ---
Job Timeout Manager
regular (lastAccess=0;total=0)
p2p (lastAccess=0;total=0)
default (lastAccess=0;total=0)
WAN (lastAccess=0;total=0)
io (lastAccess=0;total=0)

--- pool (Main pool component) ---
Base directory : /storage/data1/write-pool/
Version : 2.2.29(2015-08-18_11-05_litvinse) (Sub=4)
Gap : 4294967296
Report remove : on
Pool Mode : enabled
Clean prec. files : on
Hsm Load Suppr. : off
Ping Heartbeat : 30 seconds
ReplicationMgr : Disabled
LargeFileStore : Precious
DuplicateRequests : None
P2P File Mode : Cached
Mover Queue (regular) 0(100)/0
Mover Queue (p2p) 0(10)/0
Mover Queue (default) 0(2)/0
Mover Queue (WAN) 0(2)/0

--- queue (HSM flush queue manager) ---
Version : $Id$
Classes : 0
Requests : 0

--- migration (Replica migration module client) ---

--- migration-server (Replica migration module backend) ---

--- pp (Pool to pool transfer manager) ---
Interface : cmsstor419.fnal.gov/131.225.205.120
Max Active : 10
Pnfs Timeout : 300 seconds

--- rep (Repository manager) ---
State : OPEN
Check Repository : true
Diskspace usage :
Total : 77201506889058
Used : 2826789551705 [0.03661573]
Free : 74374717337353
Precious : 0 [0.0]
Removable: 0 [0.0]
File system
Size: 77219583098880
Free: 74395356332032 [0.96342605]
Limits for maximum disk space
File system : 77222145883737
Statically configured: Infinity
Runtime configured : Infinity

[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin >

Main pool component sections are different in queue length.
Though it might not be relevant, why HSM section are different?

#3 Updated by Chih-Hao Huang about 4 years ago

reload -yes does fix it. (see the screenshot, too)

[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > info a
--
csm (Checksum module) ---
Version : $Id$
Checksum type : ADLER32
Checkum calculation on : write enforceCRC

--- flush (Controller for centralising flushing) ---
Flushing Interval /seconds : 60
Maximum classes flushing : 1000
Minimum flush delay on error : 60
Remote controlled (hold until) : Locally Controlled

--- storagehandler (HSM integration module) ---
Version : $Id$
Restore Timeout : 14400
Store Timeout : 14400
Remove Timeout : 14400
Job Queues
to store 0(0)/0
from store 0(0)/0
delete (1)/

--- jtm (Job timeout manager) ---
Job Timeout Manager
regular (lastAccess=0;total=0)
p2p (lastAccess=0;total=0)
default (lastAccess=0;total=0)
WAN (lastAccess=0;total=0)
io (lastAccess=0;total=0)

--- pool (Main pool component) ---
Base directory : /storage/data1/write-pool/
Version : 2.2.29(2015-08-18_11-05_litvinse) (Sub=4)
Gap : 4294967296
Report remove : on
Pool Mode : enabled
Clean prec. files : on
Hsm Load Suppr. : off
Ping Heartbeat : 30 seconds
ReplicationMgr : Disabled
LargeFileStore : Precious
DuplicateRequests : None
P2P File Mode : Cached
Mover Queue (regular) 2(100)/0
Mover Queue (p2p) 0(10)/0
Mover Queue (default) 0(2)/0
Mover Queue (WAN) 0(2)/0

--- queue (HSM flush queue manager) ---
Version : $Id$
Classes : 0
Requests : 0

--- migration (Replica migration module client) ---

--- migration-server (Replica migration module backend) ---

--- pp (Pool to pool transfer manager) ---
Interface : cmsstor419.fnal.gov/131.225.205.120
Max Active : 10
Pnfs Timeout : 300 seconds

--- rep (Repository manager) ---
State : OPEN
Check Repository : true
Diskspace usage :
Total : 77201506889058
Used : 2970107812777 [0.038472146]
Free : 74231399076281
Precious : 0 [0.0]
Removable: 0 [0.0]
File system
Size: 77219583098880
Free: 74248880939008 [0.9615291]
Limits for maximum disk space
File system : 77218988751785
Statically configured: Infinity
Runtime configured : Infinity

[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > reload yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > info -a
--
csm (Checksum module) ---
Version : $Id$
Checksum type : ADLER32
Checkum calculation on : write enforceCRC

--- flush (Controller for centralising flushing) ---
Flushing Interval /seconds : 60
Maximum classes flushing : 1000
Minimum flush delay on error : 60
Remote controlled (hold until) : Locally Controlled

--- storagehandler (HSM integration module) ---
Version : $Id$
Restore Timeout : 172800
Store Timeout : 172800
Remove Timeout : 14400
Job Queues
to store 0(100)/0
from store 0(300)/0
delete (1)/

--- jtm (Job timeout manager) ---
Job Timeout Manager
regular (lastAccess=0;total=0)
p2p (lastAccess=0;total=0)
default (lastAccess=0;total=0)
WAN (lastAccess=0;total=0)
io (lastAccess=0;total=0)
io-0 (lastAccess=0;total=0)
io-1 (lastAccess=0;total=0)

--- pool (Main pool component) ---
Base directory : /storage/data1/write-pool/
Version : 2.2.29(2015-08-18_11-05_litvinse) (Sub=4)
Gap : 4294967296
Report remove : on
Pool Mode : enabled
Clean prec. files : on
Hsm Load Suppr. : on
Ping Heartbeat : 30 seconds
ReplicationMgr : Disabled
LargeFileStore : Precious
DuplicateRequests : Ignored
P2P File Mode : Cached
Mover Queue (regular) 2(1000)/0
Mover Queue (p2p) 0(50)/0
Mover Queue (default) 0(1000)/0
Mover Queue (WAN) 0(1000)/0

--- queue (HSM flush queue manager) ---
Version : $Id$
Classes : 0
Requests : 0

--- migration (Replica migration module client) ---

--- migration-server (Replica migration module backend) ---

--- pp (Pool to pool transfer manager) ---
Interface : cmsstor419.fnal.gov/131.225.205.120
Max Active : 50
Pnfs Timeout : 300 seconds

--- rep (Repository manager) ---
State : OPEN
Check Repository : true
Diskspace usage :
Total : 77201506889058
Used : 2970107812777 [0.038472146]
Free : 74231399076281
Precious : 0 [0.0]
Removable: 0 [0.0]
File system
Size: 77219583098880
Free: 74248880939008 [0.9615291]
Limits for maximum disk space
File system : 77218988751785
Statically configured: Infinity
Runtime configured : Infinity

[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin >

#4 Updated by Chih-Hao Huang about 4 years ago

  • % Done changed from 20 to 30

Indeed, dcache started before setup file was in place

[root@cmsstor419 ~]# egrep "setup|qla2xx|running" /var/log/puppet/puppet.log
2015-11-02T16:07:09.058882-06:00 cmsstor419 puppet-agent14564: (/Stage[main]/Rsyslog::Service/Service[rsyslog]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:08:04.503181-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:08:04.503874-06:00 cmsstor419 puppet-agent19253: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:08:04.505962-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:08:05.127409-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Service[rpcbind]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:08:55.800025-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) change from stopped to running failed: Could not start Service[ipmi]: Execution of '/sbin/service ipmi start' returned 1: Starting ipmi drivers: [FAILED]
2015-11-02T16:08:56.929071-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Firewall::Linux::Redhat/Service[iptables]/ensure) change from stopped to running failed: Could not start Service[iptables]: Execution of '/sbin/service iptables start' returned 1: iptables: Applying firewall rules: FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:08:59.818836-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is already running"
2015-11-02T16:08:59.819515-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is not running"
2015-11-02T16:08:59.819696-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is running"
2015-11-02T16:08:59.819878-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is not running"
2015-11-02T16:08:59.820688-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) - echo 'Swatch is already running.'
2015-11-02T16:08:59.821084-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) - echo 'No swatch running.'
2015-11-02T16:09:14.247813-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Swatch::Service/Service[swatch]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:09:14.326525-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.328238-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.330157-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.331828-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.333693-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.335419-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.336073-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:09:14.338221-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:09:14.338921-06:00 cmsstor419 puppet-agent19253: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:10:15.470813-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:10:15.471488-06:00 cmsstor419 puppet-agent20263: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:10:15.473477-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:11:44.623849-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) change from stopped to running failed: Could not start Service[ipmi]: Execution of '/sbin/service ipmi start' returned 1: Starting ipmi drivers: [FAILED]
2015-11-02T16:11:45.965252-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Firewall::Linux::Redhat/Service[iptables]/ensure) change from stopped to running failed: Could not start Service[iptables]: Execution of '/sbin/service iptables start' returned 1: iptables: Applying firewall rules: FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:11:48.505765-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.507411-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.509249-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.510839-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.512575-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.514168-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.514749-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:11:48.516814-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:48.517390-06:00 cmsstor419 puppet-agent20263: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:17:01.848996-06:00 cmsstor419 puppet-agent7542: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:02.173435-06:00 cmsstor419 puppet-agent7542: (/Stage[main]/P_ipmi/Service[ipmievd]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:21.708256-06:00 cmsstor419 puppet-agent7542: (/Stage[main]/Dcache/Service[dcache-server]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:32.423789-06:00 cmsstor419 puppet-agent7542: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]/ensure) defined content as '{md5}9951019c983daffad37d80b5dda3da35'
2015-11-02T16:17:33.155720-06:00 cmsstor419 puppet-agent7542: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]/ensure) defined content as '{md5}9951019c983daffad37d80b5dda3da35'
[root@cmsstor419 ~]#

comparing to cmsstor420, dcache started at the same time when setup was put in place.

[root@cmsstor420 dcache]# grep qla2xxx /var/log/puppet/puppet.log
2015-11-02T16:09:19.928479-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:09:19.929017-06:00 cmsstor420 puppet-agent19469: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:09:19.930280-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:10:19.166989-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.168745-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.170673-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.172382-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.174335-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.175987-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.178749-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:11:13.919620-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:11:13.920173-06:00 cmsstor420 puppet-agent20477: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:11:13.921460-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:12:02.586176-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.587884-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.589820-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.591489-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.593376-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.595011-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.597783-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
[root@cmsstor420 dcache]# egrep "setup|qla2xx|running" /var/log/puppet/puppet.log
2015-11-02T16:08:39.571608-06:00 cmsstor420 puppet-agent14566: (/Stage[main]/Rsyslog::Service/Service[rsyslog]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:09:19.928479-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:09:19.929017-06:00 cmsstor420 puppet-agent19469: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:09:19.930280-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:09:23.000184-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Service[rpcbind]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:09:59.176068-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) change from stopped to running failed: Could not start Service[ipmi]: Execution of '/sbin/service ipmi start' returned 1: Starting ipmi drivers: [FAILED]
2015-11-02T16:10:00.607262-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Firewall::Linux::Redhat/Service[iptables]/ensure) change from stopped to running failed: Could not start Service[iptables]: Execution of '/sbin/service iptables start' returned 1: iptables: Applying firewall rules: FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:10:03.290290-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is already running"
2015-11-02T16:10:03.290796-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is not running"
2015-11-02T16:10:03.290925-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is running"
2015-11-02T16:10:03.291053-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) + echo "swatch is not running"
2015-11-02T16:10:03.291738-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) - echo 'Swatch is already running.'
2015-11-02T16:10:03.292031-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Config/File[/etc/rc.d/init.d/swatch]/content) - echo 'No swatch running.'
2015-11-02T16:10:18.974273-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Swatch::Service/Service[swatch]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:10:19.166989-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.168745-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.170673-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.172382-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.174335-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.175987-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.176503-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:10:19.178749-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:10:19.179289-06:00 cmsstor420 puppet-agent19469: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:11:13.919620-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:11:13.920173-06:00 cmsstor420 puppet-agent20477: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:11:13.921460-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Kernel::Module[qla2xxx]/Exec[/sbin/modprobe qla2xxx]/returns) change from notrun to 0 failed: /sbin/modprobe qla2xxx returned 1 instead of one of [0]
2015-11-02T16:11:58.996432-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) change from stopped to running failed: Could not start Service[ipmi]: Execution of '/sbin/service ipmi start' returned 1: Starting ipmi drivers: [FAILED]
2015-11-02T16:12:00.162631-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Firewall::Linux::Redhat/Service[iptables]/ensure) change from stopped to running failed: Could not start Service[iptables]: Execution of '/sbin/service iptables start' returned 1: iptables: Applying firewall rules: FATAL: Could not load /lib/modules/2.6.32-504.el6.x86_64/modules.dep: No such file or directory
2015-11-02T16:12:02.586176-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/Mount[/storage/data1]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.587884-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.589820-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/Mount[/storage/data2]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.591489-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.593376-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache/Service[dcache-server]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.595011-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.595452-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:12:02.597783-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Dependency Exec[/sbin/modprobe qla2xxx] has failures: true
2015-11-02T16:12:02.598252-06:00 cmsstor420 puppet-agent20477: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]) Skipping because of failed dependencies
2015-11-02T16:17:05.729219-06:00 cmsstor420 puppet-agent7546: (/Stage[main]/P_ipmi/Service[ipmi]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:06.646245-06:00 cmsstor420 puppet-agent7546: (/Stage[main]/P_ipmi/Service[ipmievd]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:51.559168-06:00 cmsstor420 puppet-agent7546: (/Stage[main]/Dcache/Service[dcache-server]/ensure) ensure changed 'stopped' to 'running'
2015-11-02T16:17:51.742095-06:00 cmsstor420 puppet-agent7546: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk2]/File[/storage/data2/write-pool/setup]/ensure) defined content as '{md5}9951019c983daffad37d80b5dda3da35'
2015-11-02T16:17:51.970882-06:00 cmsstor420 puppet-agent7546: (/Stage[main]/Dcache::Pool/Dcache::Mount[disk1]/File[/storage/data1/write-pool/setup]/ensure) defined content as '{md5}9951019c983daffad37d80b5dda3da35'
[root@cmsstor420 dcache]#

#5 Updated by Chih-Hao Huang about 4 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 30 to 100

Though it is not necessary for the good pools, but do all of them for convenience.
Now it is all fixed.

bash-4.1$ for i in `cat newpools`; do for j in 1 2; do printf "cd w-$i-disk-disk$j\nreload -yes\n..\n"; done; done | dcacheadmin-disk
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Remote host denied authentication agent forwarding.

dCache Admin (VII) (user=admin)

[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor411-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor411-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor411-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor411-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor411-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor411-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor412-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor412-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor412-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor412-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor412-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor412-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor413-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor413-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor413-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor413-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor413-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor413-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor414-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor414-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor414-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor414-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor414-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor414-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor415-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor415-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor415-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor415-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor415-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor415-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor416-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor416-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor416-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor416-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor416-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor416-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor417-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor417-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor417-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor417-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor417-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor417-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor418-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor418-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor418-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor418-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor418-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor418-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor419-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor419-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor419-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor420-disk-disk1
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk1) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk1) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > cd w-cmsstor420-disk-disk2
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk2) admin > reload -yes
[cmsdcacheadmindisk.fnal.gov] (w-cmsstor420-disk-disk2) admin > ..
[cmsdcacheadmindisk.fnal.gov] (local) admin > bash-4.1$

#6 Updated by Chih-Hao Huang about 4 years ago

A fix huangch_fix_dcache_setup_file_dependency has committed to itb.



Also available in: Atom PDF