Project

General

Profile

Bug #9230

Update EOS client from 0.3.95 to 0.3.118

Added by Gerard Bernabeu Altayo over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Start date:
06/17/2015
Due date:
% Done:

0%

Estimated time:
Duration:

Description

EOS Fuse client 0.3.95 is not that stable on WN and LPC nodes, sometimes it hangs, some other times it has caching issues.

I want to track the update (testing) in this ticket.

History

#1 Updated by Gerard Bernabeu Altayo over 4 years ago

I did an update on cmslpc37, where I had to restart EOS anyway (update does a restart of the daemon).

[root@cmslpc37 ~]# yum update --exclude=kernel*
Loaded plugins: priorities, security
Setting up Update Process
7743 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package eos-client.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-client.x86_64 0:0.3.118-aquamarine.slc6 will be an update
---> Package eos-fuse.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-fuse.x86_64 0:0.3.118-aquamarine.slc6 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

=====================================================================================================================================================================================
Package Arch Version Repository Size =====================================================================================================================================================================================
Updating:
eos-client x86_64 0.3.118-aquamarine.slc6 eos_client 613 k
eos-fuse x86_64 0.3.118-aquamarine.slc6 eos_client 416 k

Transaction Summary =====================================================================================================================================================================================
Upgrade 2 Package(s)

Total download size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): eos-client-0.3.118-aquamarine.slc6.x86_64.rpm | 613 kB 00:01
(2/2): eos-fuse-0.3.118-aquamarine.slc6.x86_64.rpm | 416 kB 00:01
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 270 kB/s | 1.0 MB 00:03
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : eos-client-0.3.118-aquamarine.slc6.x86_64 1/4
Updating : eos-fuse-0.3.118-aquamarine.slc6.x86_64 2/4
Starting conditional EOS services
Cleanup : eos-fuse-0.3.95-aquamarine.slc6.x86_64 3/4
Cleanup : eos-client-0.3.95-aquamarine.slc6.x86_64 4/4
Verifying : eos-fuse-0.3.118-aquamarine.slc6.x86_64 1/4
Verifying : eos-client-0.3.118-aquamarine.slc6.x86_64 2/4
Verifying : eos-client-0.3.95-aquamarine.slc6.x86_64 3/4
Verifying : eos-fuse-0.3.95-aquamarine.slc6.x86_64 4/4

Updated:
eos-client.x86_64 0:0.3.118-aquamarine.slc6 eos-fuse.x86_64 0:0.3.118-aquamarine.slc6

Complete!
[root@cmslpc37 ~]#

In order for the server to get the new RPMs I did change the repo, now rolledback:

Notice: /Stage[main]/Rpmrepos::Eos_client/Yumrepo[eos_client]/baseurl: baseurl changed 'http://eos.cern.ch/rpms/eos-aquamarine/slc-6-x86_64/' to 'https://cms-install.fnal.gov/repo_mirror/eos_client/'

Clearly the client was restarted:

root 32639 0.0 0.0 740900 5332 ? Ssl 12:04 0:00 /usr/sbin/eosd /eos/uscms -obig_writes,max_readahead=131072,max_write=4194304,fsname=eosstoreuser,allow_other rl=ro
[root@cmslpc37 ~]# tail /var/log/eos/fuse/fuse.log
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173670400
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173801472
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173932544
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174063616
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174194688
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174325760
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174456832
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174587904
150617 12:04:21 time=1434560661.734730 func=xrd_init level=NOTE tid=00007f303f0fb780 source=xrdposix:3057 cache=true size=300000000 cache-write=1 exec=0
[eosfs_ll_read]: inode=27817607407075328 size=4096 off=0
[root@cmslpc37 ~]#

Also the issues we had on this node are gone, but that has more to do with the restart than anything else....

If it's stable for a couple days I'd like to update our local client repo.

#2 Updated by Gerard Bernabeu Altayo over 4 years ago

By updating the client only it's not allowing to create links yet:

[gerard1@cmslpc37 gerard1]$ ln -s gba.22653 a
ln: creating symbolic link `a': Function not implemented

This is a good thing. It also logged something in the logs:

[root@cmslpc37 uscms]# tail /var/log/eos/fuse/fuse.log
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173801472
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173932544
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174063616
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174194688
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174325760
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174456832
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174587904
150617 12:04:21 time=1434560661.734730 func=xrd_init level=NOTE tid=00007f303f0fb780 source=xrdposix:3057 cache=true size=300000000 cache-write=1 exec=0
[eosfs_ll_read]: inode=27817607407075328 size=4096 off=0
[eosfs_ll_readdir]: failed for inode=5
[root@cmslpc37 uscms]#

#3 Updated by Gerard Bernabeu Altayo over 4 years ago

I also updated cmseos38.fnal.gov

#4 Updated by Gerard Bernabeu Altayo over 4 years ago

I've also updated one SL5 machine: cmslpc33.

Since this is an SL5 node, I've had to Create the yumrepo as well:

[root@cmslpc33 ~]# cat /etc/yum.repos.d/eos-client.repo
[eos-client-sl5]
name=EOS client
baseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/
enabled=1
gpgcheck=0

yum update --exclude=kernel*

I've fixed the script that checks EOS too so that it also supports SL5 (no timeout binary there). Now the latest version lives in cmslpc33.

#5 Updated by Gerard Bernabeu Altayo over 4 years ago

I have also updated the configuration to leave it very close to default on 'cmslpc38'. When restarting the node here is a good list of what it really does:

[root@cmslpc38 ~]# /etc/init.d/eosd restart
Stopping eosd:
[ OK ]

Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 0
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0
EOS_FUSE_RDAHEAD : 0
EOS_FUSE_RDAHEAD_WINDOW : 131072

Vs the current setup:

[root@cmslpc26 ~]# service eosd restart
Stopping eosd:
[ OK ]

Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_READ : 0
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 1
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0

This change is not in puppet (yet), I want to evaluate how it works first.
I also opened a ticket to the EOS devs about EOS client stability issues:

https://its.cern.ch/jira/browse/EOS-1178?filter=-2

#6 Updated by Gerard Bernabeu Altayo over 4 years ago

Updating client version on SL5 nodes:

[root@cmslpc35 ~]# echo -e '[eos-client-sl5]\nname=EOS client\nbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/\nenabled=0\ngpgcheck=0' > /etc/yum.repos.d/eos-client.repo; yum update -y --exclude=kernel* --enablerepo=eos-client-sl5

I tried to do it in batch without much success:

[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i uname -a; done
Linux cmslpc31.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc32.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc33.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc34.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc35.fnal.gov 2.6.18-398.el5 #1 SMP Tue Sep 16 01:03:15 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc42.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i echo -e '[eos-client-sl5]\nname=EOS client\nbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/\nenabled=0\ngpgcheck=0' > /etc/yum.repos.d/eos-client.repo; done
[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i yum update -y --exclude=kernel* --enablerepo=eos-client-sl5; done
Loaded plugins: kernel-module, priorities

Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities

Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities
http://cmsstor24.fnal.gov/install/rocks-dist/lan/x86_64/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (111, 'Connection refused')>
Trying other mirror.
http://cmsstor24.fnal.gov/install/rocks-dist/lan/i386/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (111, 'Connection refused')>
Trying other mirror.
Excluding Packages in global exclude list
Finished
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package eos-client.x86_64 0:0.3.121-aquamarine.slc5 set to be updated
---> Package eos-fuse.x86_64 0:0.3.121-aquamarine.slc5 set to be updated
--> Finished Dependency Resolution
Beginning Kernel Module Plugin
Finished Kernel Module Plugin

Dependencies Resolved

================================================================================
Package Arch Version Repository Size ================================================================================
Updating:
eos-client x86_64 0.3.121-aquamarine.slc5 eos-client-sl5 767 k
eos-fuse x86_64 0.3.121-aquamarine.slc5 eos-client-sl5 500 k

Transaction Summary ================================================================================
Install 0 Package(s)
Update 2 Package(s)
Remove 0 Package(s)

Total download size: 1.2 M
Downloading Packages:
--------------------------------------------------------------------------------
Total 380 kB/s | 1.2 MB 00:03
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : eos-client 1/4
Updating : eos-fuse 2/4Starting conditional EOS services

Cleanup        : eos-client                                               3/4 
Cleanup : eos-fuse 4/4

Updated:
eos-client.x86_64 0:0.3.121-aquamarine.slc5
eos-fuse.x86_64 0:0.3.121-aquamarine.slc5

Complete!
Loaded plugins: kernel-module, priorities

Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities
Config Error: File contains no section headers.
file: file://///etc/yum.repos.d/eos-client.repo, line: 1
'[eos-client-sl5]nname=EOS clientnbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/nenabled=0ngpgcheck=0\n'
Loaded plugins: kernel-module, priorities

Error getting repository data for eos-client-sl5, repository not found
[root@cmslpc35 ~]#

I will stall this and test the new version that supports links, that will require a client update also so I better do it only once...

#7 Updated by Gerard Bernabeu Altayo over 4 years ago

The client stall issues happen whenever the servers are unstable, newer versions of the eos-client (0.3.125) have introduced a decent amount of bugfixes in the client. As soon as the directory creation bug is solved we HAVE to update.

I just emailed CERN (Andreas) to know what version contains the bugfixes for the reported bugs he has been working on.

#8 Updated by Gerard Bernabeu Altayo about 4 years ago

Testing with cmslpc28, which is not available to users:

[root@cmslpc28 ~]# tail -2 /var/log/puppet/puppet.log
2015-07-28T13:42:03.493913-05:00 cmslpc28 puppet-agent23185: Skipping run of Puppet configuration client; administratively disabled (Reason: 'Gerard playing with the EOS mountpoint to test eos-test');
2015-07-28T13:42:03.493940-05:00 cmslpc28 puppet-agent23185: Use 'puppet agent --enable' to re-enable.
[root@cmslpc28 ~]#

I've updated [root@cmslpc28 ~]# vim /etc/yum.repos.d/eos_client.repo to point to the eos-itb repo, then:

Dependencies Resolved

=====================================================================================================================================================================================
Package Arch Version Repository Size =====================================================================================================================================================================================
Installing:
kernel x86_64 2.6.32-504.30.3.el6 slf-security 29 M
kernel-devel x86_64 2.6.32-504.30.3.el6 slf-security 9.4 M
Updating:
eos-client x86_64 0.3.125-aquamarine.slc6 eos_client 622 k
eos-fuse x86_64 0.3.125-aquamarine.slc6 eos_client 432 k
kernel-firmware noarch 2.6.32-504.30.3.el6 slf-security 14 M
kernel-headers x86_64 2.6.32-504.30.3.el6 slf-security 3.4 M

Transaction Summary =====================================================================================================================================================================================
Install 2 Package(s)
Upgrade 4 Package(s)

#9 Updated by Gerard Bernabeu Altayo about 4 years ago

  • Status changed from New to Resolved

All clients have been updated to 0.3.125.



Also available in: Atom PDF