Bug #9230
Update EOS client from 0.3.95 to 0.3.118
0%
Description
EOS Fuse client 0.3.95 is not that stable on WN and LPC nodes, sometimes it hangs, some other times it has caching issues.
I want to track the update (testing) in this ticket.
History
#1 Updated by Gerard Bernabeu Altayo almost 6 years ago
I did an update on cmslpc37, where I had to restart EOS anyway (update does a restart of the daemon).
[root@cmslpc37 ~]# yum update --exclude=kernel*
Loaded plugins: priorities, security
Setting up Update Process
7743 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package eos-client.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-client.x86_64 0:0.3.118-aquamarine.slc6 will be an update
---> Package eos-fuse.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-fuse.x86_64 0:0.3.118-aquamarine.slc6 will be an update
--> Finished Dependency Resolution
Dependencies Resolved
=====================================================================================================================================================================================
Package Arch Version Repository Size
=====================================================================================================================================================================================
Updating:
eos-client x86_64 0.3.118-aquamarine.slc6 eos_client 613 k
eos-fuse x86_64 0.3.118-aquamarine.slc6 eos_client 416 k
Transaction Summary
=====================================================================================================================================================================================
Upgrade 2 Package(s)
Total download size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): eos-client-0.3.118-aquamarine.slc6.x86_64.rpm | 613 kB 00:01
(2/2): eos-fuse-0.3.118-aquamarine.slc6.x86_64.rpm | 416 kB 00:01
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 270 kB/s | 1.0 MB 00:03
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : eos-client-0.3.118-aquamarine.slc6.x86_64 1/4
Updating : eos-fuse-0.3.118-aquamarine.slc6.x86_64 2/4
Starting conditional EOS services
Cleanup : eos-fuse-0.3.95-aquamarine.slc6.x86_64 3/4
Cleanup : eos-client-0.3.95-aquamarine.slc6.x86_64 4/4
Verifying : eos-fuse-0.3.118-aquamarine.slc6.x86_64 1/4
Verifying : eos-client-0.3.118-aquamarine.slc6.x86_64 2/4
Verifying : eos-client-0.3.95-aquamarine.slc6.x86_64 3/4
Verifying : eos-fuse-0.3.95-aquamarine.slc6.x86_64 4/4
Updated:
eos-client.x86_64 0:0.3.118-aquamarine.slc6 eos-fuse.x86_64 0:0.3.118-aquamarine.slc6
Complete!
[root@cmslpc37 ~]#
In order for the server to get the new RPMs I did change the repo, now rolledback:
Notice: /Stage[main]/Rpmrepos::Eos_client/Yumrepo[eos_client]/baseurl: baseurl changed 'http://eos.cern.ch/rpms/eos-aquamarine/slc-6-x86_64/' to 'https://cms-install.fnal.gov/repo_mirror/eos_client/'
Clearly the client was restarted:
root 32639 0.0 0.0 740900 5332 ? Ssl 12:04 0:00 /usr/sbin/eosd /eos/uscms -obig_writes,max_readahead=131072,max_write=4194304,fsname=eosstoreuser,allow_other rl=ro
[root@cmslpc37 ~]# tail /var/log/eos/fuse/fuse.log
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173670400
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173801472
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173932544
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174063616
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174194688
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174325760
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174456832
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174587904
150617 12:04:21 time=1434560661.734730 func=xrd_init level=NOTE tid=00007f303f0fb780 source=xrdposix:3057 cache=true size=300000000 cache-write=1 exec=0
[eosfs_ll_read]: inode=27817607407075328 size=4096 off=0
[root@cmslpc37 ~]#
Also the issues we had on this node are gone, but that has more to do with the restart than anything else....
If it's stable for a couple days I'd like to update our local client repo.
#2 Updated by Gerard Bernabeu Altayo almost 6 years ago
By updating the client only it's not allowing to create links yet:
[gerard1@cmslpc37 gerard1]$ ln -s gba.22653 a
ln: creating symbolic link `a': Function not implemented
This is a good thing. It also logged something in the logs:
[root@cmslpc37 uscms]# tail /var/log/eos/fuse/fuse.log
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173801472
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=173932544
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174063616
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174194688
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174325760
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174456832
[eosfs_ll_read]: inode=15527538893783040 size=131072 off=174587904
150617 12:04:21 time=1434560661.734730 func=xrd_init level=NOTE tid=00007f303f0fb780 source=xrdposix:3057 cache=true size=300000000 cache-write=1 exec=0
[eosfs_ll_read]: inode=27817607407075328 size=4096 off=0
[eosfs_ll_readdir]: failed for inode=5
[root@cmslpc37 uscms]#
#3 Updated by Gerard Bernabeu Altayo almost 6 years ago
I also updated cmseos38.fnal.gov
#4 Updated by Gerard Bernabeu Altayo almost 6 years ago
I've also updated one SL5 machine: cmslpc33.
Since this is an SL5 node, I've had to Create the yumrepo as well:
[root@cmslpc33 ~]# cat /etc/yum.repos.d/eos-client.repo
[eos-client-sl5]
name=EOS client
baseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/
enabled=1
gpgcheck=0
yum update --exclude=kernel*
I've fixed the script that checks EOS too so that it also supports SL5 (no timeout binary there). Now the latest version lives in cmslpc33.
#5 Updated by Gerard Bernabeu Altayo almost 6 years ago
I have also updated the configuration to leave it very close to default on 'cmslpc38'. When restarting the node here is a good list of what it really does:
[root@cmslpc38 ~]# /etc/init.d/eosd restart
Stopping eosd:
[ OK ]
Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 0
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0
EOS_FUSE_RDAHEAD : 0
EOS_FUSE_RDAHEAD_WINDOW : 131072
Vs the current setup:
[root@cmslpc26 ~]# service eosd restart
Stopping eosd:
[ OK ]
Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_READ : 0
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 1
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0
This change is not in puppet (yet), I want to evaluate how it works first.
I also opened a ticket to the EOS devs about EOS client stability issues:
#6 Updated by Gerard Bernabeu Altayo almost 6 years ago
Updating client version on SL5 nodes:
[root@cmslpc35 ~]# echo -e '[eos-client-sl5]\nname=EOS client\nbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/\nenabled=0\ngpgcheck=0' > /etc/yum.repos.d/eos-client.repo; yum update -y --exclude=kernel* --enablerepo=eos-client-sl5
I tried to do it in batch without much success:
[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i uname -a; done
Linux cmslpc31.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc32.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc33.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc34.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc35.fnal.gov 2.6.18-398.el5 #1 SMP Tue Sep 16 01:03:15 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux cmslpc42.fnal.gov 2.6.18-400.1.1.el5 #1 SMP Wed Dec 17 14:22:42 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i echo -e '[eos-client-sl5]\nname=EOS client\nbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/\nenabled=0\ngpgcheck=0' > /etc/yum.repos.d/eos-client.repo; done
[root@cmslpc35 ~]# for i in `seq 31 35` 42; do rsh cmslpc$i yum update -y --exclude=kernel* --enablerepo=eos-client-sl5; done
Loaded plugins: kernel-module, priorities
Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities
Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities
http://cmsstor24.fnal.gov/install/rocks-dist/lan/x86_64/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (111, 'Connection refused')>
Trying other mirror.
http://cmsstor24.fnal.gov/install/rocks-dist/lan/i386/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (111, 'Connection refused')>
Trying other mirror.
Excluding Packages in global exclude list
Finished
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package eos-client.x86_64 0:0.3.121-aquamarine.slc5 set to be updated
---> Package eos-fuse.x86_64 0:0.3.121-aquamarine.slc5 set to be updated
--> Finished Dependency Resolution
Beginning Kernel Module Plugin
Finished Kernel Module Plugin
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Updating:
eos-client x86_64 0.3.121-aquamarine.slc5 eos-client-sl5 767 k
eos-fuse x86_64 0.3.121-aquamarine.slc5 eos-client-sl5 500 k
Transaction Summary
================================================================================
Install 0 Package(s)
Update 2 Package(s)
Remove 0 Package(s)
Total download size: 1.2 M
Downloading Packages:
--------------------------------------------------------------------------------
Total 380 kB/s | 1.2 MB 00:03
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : eos-client 1/4
Updating : eos-fuse 2/4Starting conditional EOS services
Cleanup : eos-client 3/4
Cleanup : eos-fuse 4/4
Updated:
eos-client.x86_64 0:0.3.121-aquamarine.slc5
eos-fuse.x86_64 0:0.3.121-aquamarine.slc5
Complete!
Loaded plugins: kernel-module, priorities
Error getting repository data for eos-client-sl5, repository not found
Loaded plugins: kernel-module, priorities
Config Error: File contains no section headers.
file: file://///etc/yum.repos.d/eos-client.repo, line: 1
'[eos-client-sl5]nname=EOS clientnbaseurl=http://eos.cern.ch/rpms/eos-aquamarine/slc-5-x86_64/nenabled=0ngpgcheck=0\n'
Loaded plugins: kernel-module, priorities
Error getting repository data for eos-client-sl5, repository not found
[root@cmslpc35 ~]#
I will stall this and test the new version that supports links, that will require a client update also so I better do it only once...
#7 Updated by Gerard Bernabeu Altayo over 5 years ago
The client stall issues happen whenever the servers are unstable, newer versions of the eos-client (0.3.125) have introduced a decent amount of bugfixes in the client. As soon as the directory creation bug is solved we HAVE to update.
I just emailed CERN (Andreas) to know what version contains the bugfixes for the reported bugs he has been working on.
#8 Updated by Gerard Bernabeu Altayo over 5 years ago
Testing with cmslpc28, which is not available to users:
[root@cmslpc28 ~]# tail -2 /var/log/puppet/puppet.log
2015-07-28T13:42:03.493913-05:00 cmslpc28 puppet-agent23185: Skipping run of Puppet configuration client; administratively disabled (Reason: 'Gerard playing with the EOS mountpoint to test eos-test');
2015-07-28T13:42:03.493940-05:00 cmslpc28 puppet-agent23185: Use 'puppet agent --enable' to re-enable.
[root@cmslpc28 ~]#
I've updated [root@cmslpc28 ~]# vim /etc/yum.repos.d/eos_client.repo to point to the eos-itb repo, then:
Dependencies Resolved
=====================================================================================================================================================================================
Package Arch Version Repository Size
=====================================================================================================================================================================================
Installing:
kernel x86_64 2.6.32-504.30.3.el6 slf-security 29 M
kernel-devel x86_64 2.6.32-504.30.3.el6 slf-security 9.4 M
Updating:
eos-client x86_64 0.3.125-aquamarine.slc6 eos_client 622 k
eos-fuse x86_64 0.3.125-aquamarine.slc6 eos_client 432 k
kernel-firmware noarch 2.6.32-504.30.3.el6 slf-security 14 M
kernel-headers x86_64 2.6.32-504.30.3.el6 slf-security 3.4 M
Transaction Summary
=====================================================================================================================================================================================
Install 2 Package(s)
Upgrade 4 Package(s)
#9 Updated by Gerard Bernabeu Altayo over 5 years ago
- Status changed from New to Resolved
All clients have been updated to 0.3.125.