Project

General

Profile

Task #9235

Test EOS version that supports links (0.3.119)

Added by Gerard Bernabeu Altayo about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Start date:
06/29/2015
Due date:
06/29/2015
% Done:

0%

Estimated time:
Duration: 1

Description

We have the new version ready, and even a tester!

Hi Lisa and Gerard,

you can install the version with symbolic link support from the aquamarine testing repository.

http://eos.cern.ch/rpms/eos-aquamarine-testing/slc-6-x86_64/

It is EOS 0.3.119. I have added few generic tests via the shell and FUSE. However I didn't have time to make some exhausitve tests, so please try and give feedback. It is 100% back-forward compatible. If a client (FUSE) or the server (MGM) does not have the relevant support, it just will show up some dummy files without attached replicas. You can easily go back and forth between old and new version.

The FUSE client has now also a performance boosting parameter (which is off by default):

  1. Enable FUSE read-ahead (default off)
    #export EOS_FUSE_RDAHEAD=0
  1. Configure FUSE read-ahead window (default 128k)
    #export EOS_FUSE_RDAHEAD_WINDOW=13107

With this you can reach line speed when streaming files via FUSE. It has currently only minimal intelligence, we will implement the kernel read-ahead algorithm soon.

Cheers Andreas.

Hi Lisa,

is it supported to use /eos like a normal filesystem for interactive
job stuff yet? I tried

[tucker@cmslpc26 /eos/uscms/store/user/tucker/huh %] scram project
CMSSW CMSSW_5_3_28

and it took quite a long time (while nearly instantaneous in the home
area or /tmp), then printed

cp: cannot create symbolic link
`/eos/uscms/store/user/tucker/huh/CMSSW_5_3_28/config/toolbox/slc6_amd64_gcc472/tools/selected/py-pygithub.xml':
Function not implemented

and hung some more before I killed it.

Thanks,
Jordan


Related issues

Follows EOS - Task #9306: Make EOS test instance functionalResolved06/26/2015

History

#1 Updated by Gerard Bernabeu Altayo about 5 years ago

Old EOS version was not working right on the test machines, so I will fix that before I test the upgrade...

Starting deployment on EOS test, first of all I'm updating cmssrv151 and cmssrv153 to have latest security updates (because with xrootd3.3.6 that we have to have for EOS yumautoupdate is broken!):

yum update --exclude=xrootd*

Rebooting both nodes as of 11:40.

#2 Updated by Gerard Bernabeu Altayo about 5 years ago

I managed to bring the system up, there are a few issues with it though (some resolved already):

0. puppet could not run due to lack of cmseos certificate. Fixed by adding cmssrv152 cert as cmseos cert in the secret repository. Also opened https://cdcvs.fnal.gov/redmine/issues/9290 to create a real service IP (cmseos-test.fnal.gov)

1. MGM would not boot on cmssrv153. Fixed by service eos master mgm; service eos master mq; service eos restart

2. There is no FST (there was supposed to be one on 153, but not really. cmssrv151 is trying to start one but has no config file). I want a real FST for this.
- Lisa said there is one server available for this, will try to figure out which one and if so use it. If not I'll reshoot cmsstor112 for this.

3. The namespace is an (old) copy of the production one, but since the data is not there there will be lots of missing bits and pieces, I need to erase it and start from scratch so that I can test properly.

#3 Updated by Gerard Bernabeu Altayo about 5 years ago

  • Follows Task #9306: Make EOS test instance functional added

#4 Updated by Gerard Bernabeu Altayo almost 5 years ago

All the issues that preventing testing have been solved on other tickets. I did some testing locally and the functionality works fine, there are 2 main issues:

1. ',' is not supported in the filenames (and it is in regular POSIX). This prevents to compile the kernel, may or may not affect CMS use cases

2. Performance is really low, ~x10 slower than compiling in local FS. This is due to the lack of FUSE caching in the current kernels. EOS developers are implementing the caching in the FUSE layer.

I am updating the client in cmslpc41 to let Jesus Orduna test:

[root@cmslpc41 sysconfig]# yum update --disablerepo=* --enablerepo=eos-cern
Loaded plugins: priorities, security
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package eos-client.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-client.x86_64 0:0.3.125-aquamarine.slc6 will be an update
---> Package eos-fuse.x86_64 0:0.3.95-aquamarine.slc6 will be updated
---> Package eos-fuse.x86_64 0:0.3.125-aquamarine.slc6 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

=====================================================================================================================================================================================
Package Arch Version Repository Size =====================================================================================================================================================================================
Updating:
eos-client x86_64 0.3.125-aquamarine.slc6 eos-cern 621 k
eos-fuse x86_64 0.3.125-aquamarine.slc6 eos-cern 424 k

Transaction Summary =====================================================================================================================================================================================
Upgrade 2 Package(s)

Total download size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): eos-client-0.3.125-aquamarine.slc6.x86_64.rpm | 621 kB 00:02
(2/2): eos-fuse-0.3.125-aquamarine.slc6.x86_64.rpm | 424 kB 00:01
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 245 kB/s | 1.0 MB 00:04
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : eos-client-0.3.125-aquamarine.slc6.x86_64 1/4
Updating : eos-fuse-0.3.125-aquamarine.slc6.x86_64 2/4
Starting conditional EOS services
Cleanup : eos-fuse-0.3.95-aquamarine.slc6.x86_64 3/4
Cleanup : eos-client-0.3.95-aquamarine.slc6.x86_64 4/4
Verifying : eos-client-0.3.125-aquamarine.slc6.x86_64 1/4
Verifying : eos-fuse-0.3.125-aquamarine.slc6.x86_64 2/4
Verifying : eos-fuse-0.3.95-aquamarine.slc6.x86_64 3/4
Verifying : eos-client-0.3.95-aquamarine.slc6.x86_64 4/4

Updated:
eos-client.x86_64 0:0.3.125-aquamarine.slc6 eos-fuse.x86_64 0:0.3.125-aquamarine.slc6

Complete!
[root@cmslpc41 sysconfig]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 40G 9.7G 28G 26% /
tmpfs 16G 27M 16G 1% /dev/shm
/dev/sda1 1008M 114M 844M 12% /boot
/dev/sda5 857G 21G 793G 3% /storage/local/data1
cmsblue2.fnal.gov:/uscms_data/d3
49T 46T 2.5T 95% /uscms_data/d3
cmsblue2.fnal.gov:/uscms_data/d1
2.0G 75M 2.0G 4% /uscms_data/d1
cmsblue2.fnal.gov:/uscmst1b_scratch/lpc1
70T 32T 39T 46% /uscmst1b_scratch/lpc1
cms-nas-0.fnal.gov:/uscms
2.5T 1.9T 697G 73% /uscms
cms-nfs-uscms:/uscms/data1
50T 34M 50T 1% /uscms_data/d4
cms-nfs-uscms:/uscms/data2
50T 34M 50T 1% /uscms_data/d5
cms-nfs-uscms:/uscms/scratch1
50T 6.7T 43T 14% /uscms_data/scratch1
cmsblue1.fnal.gov:/uscms_data/d2
66T 52T 14T 80% /uscms_data/d2
cvmfs2 20G 14G 6.0G 70% /cvmfs/cms.cern.ch
AFS 2.0T 0 2.0T 0% /afs
eosstoreuser 6.9P 2.7P 4.2P 39% /eos/uscms

Now I need to setup the mount properly:

[root@cmslpc41 sysconfig]# service eosd restart
Stopping eosd:
[ OK ]
Stopping eosd:
[FAILED]

Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 1
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0
EOS_FUSE_RDAHEAD : 1
EOS_FUSE_RDAHEAD_WINDOW : 131072

Starting eosd: [ OK ]
EOS_FUSE_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 300000000
EOS_FUSE_CACHE_WRITE : 1
EOS_FUSE_BIGWRITES : 1
EOS_FUSE_EXEC : 0
EOS_FUSE_LOCK_ENVIRONMENT : 0
EOS_FUSE_NO_MT : 0
EOS_FUSE_RDAHEAD : 1
EOS_FUSE_RDAHEAD_WINDOW : 131072
[root@cmslpc41 sysconfig]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 40G 9.7G 28G 26% /
tmpfs 16G 27M 16G 1% /dev/shm
/dev/sda1 1008M 114M 844M 12% /boot
/dev/sda5 857G 21G 793G 3% /storage/local/data1
cmsblue2.fnal.gov:/uscms_data/d3
49T 46T 2.5T 95% /uscms_data/d3
cmsblue2.fnal.gov:/uscms_data/d1
2.0G 75M 2.0G 4% /uscms_data/d1
cmsblue2.fnal.gov:/uscmst1b_scratch/lpc1
70T 32T 39T 46% /uscmst1b_scratch/lpc1
cms-nas-0.fnal.gov:/uscms
2.5T 1.9T 697G 73% /uscms
cms-nfs-uscms:/uscms/data1
50T 34M 50T 1% /uscms_data/d4
cms-nfs-uscms:/uscms/data2
50T 34M 50T 1% /uscms_data/d5
cms-nfs-uscms:/uscms/scratch1
50T 6.7T 43T 14% /uscms_data/scratch1
cmsblue1.fnal.gov:/uscms_data/d2
66T 52T 14T 80% /uscms_data/d2
cvmfs2 20G 14G 6.0G 70% /cvmfs/cms.cern.ch
AFS 2.0T 0 2.0T 0% /afs
eosstoreuser 6.9P 2.7P 4.2P 39% /eos/uscms
eostest 33T 2.5G 33T 1% /eos/test
[root@cmslpc41 sysconfig]# grep test /etc/sysconfig/eos
export EOS_FUSE_MOUNTS="storeuser test"
[root@cmslpc41 sysconfig]# cat /etc/sysconfig/eos.test
export EOS_NOACCESS=1
export EOS_KERNELCACHE=1
export EOS_DIRECTIO=0

export EOS_FUSE_RDAHEAD=1

export EOS_READAHEADSIZE=4000000
#export EOS_READCACHESIZE=16000000
export EOS_READCACHESIZE=0
  1. set eos client log level to NOTICE
    export EOS_FUSE_LOGLEVEL=5
    export EOS_FUSE_MGM_ALIAS=cmssrv152.fnal.gov
    export EOS_FUSE_MOUNTDIR=/eos/test

export EOS_FUSE_DEBUG=0
export EOS_FUSE_READCACHESIZE=1048576
export EOS_FUSE_READAHEADSIZE=262144
export EOS_FUSE_KERNELCACHE=1
export EOS_FUSE_CACHE_WRITE=1
export EOS_FUSE_CACHE_READ=0

  1. this line manages EOS threading 0=on 1=off
    export EOS_FUSE_NO_MT=0
  1. this line manages bigwrites 0=off 1=on
    export EOS_FUSE_BIGWRITES=1
  1. This line manages IO Operations in Fuse 0=off 1=on
    export EOS_FUSE_NOPIO=1
    [root@cmslpc41 sysconfig]#
    [root@cmslpc41 sysconfig]# ll /eos/test
    ls: cannot access /eos/test: No such file or directory
    [root@cmslpc41 sysconfig]#

There is something wrong still...

#5 Updated by Gerard Bernabeu Altayo almost 5 years ago

I had to create /eos/test in cmssrv153 (the test instance MGM) with 'eos mkdir /eos/tmp; eos chown 777 /eos/tmp'. However I can't read remotely yet:

[gerard1@cmslpc41 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 40G 9.7G 28G 26% /
tmpfs 16G 27M 16G 1% /dev/shm
/dev/sda1 1008M 114M 844M 12% /boot
/dev/sda5 857G 21G 793G 3% /storage/local/data1
cmsblue2.fnal.gov:/uscms_data/d3
49T 46T 2.5T 95% /uscms_data/d3
cmsblue2.fnal.gov:/uscms_data/d1
2.0G 75M 2.0G 4% /uscms_data/d1
cmsblue2.fnal.gov:/uscmst1b_scratch/lpc1
70T 32T 39T 46% /uscmst1b_scratch/lpc1
cms-nas-0.fnal.gov:/uscms
2.5T 1.9T 697G 73% /uscms
cms-nfs-uscms:/uscms/data1
50T 34M 50T 1% /uscms_data/d4
cms-nfs-uscms:/uscms/data2
50T 34M 50T 1% /uscms_data/d5
cms-nfs-uscms:/uscms/scratch1
50T 6.7T 43T 14% /uscms_data/scratch1
cmsblue1.fnal.gov:/uscms_data/d2
66T 52T 14T 80% /uscms_data/d2
cvmfs2 20G 14G 6.0G 70% /cvmfs/cms.cern.ch
AFS 2.0T 0 2.0T 0% /afs
eostest 33T 2.5G 33T 1% /eos/test
eosstoreuser 6.9P 2.7P 4.2P 39% /eos/uscms
[gerard1@cmslpc41 ~]$ cat /eos/test/test.content
cat: /eos/test/test.content: No such file or directory
cat: /eos/test/test.content: Input/output error
[gerard1@cmslpc41 ~]$

And this is what is logged in the logfile:

150723 10:05:24 time=1437663924.390425 func=Open level=ERROR tid=00007f8890d12700 source=XrdIo:154 error=opening remote XrdClFile
150723 10:05:24 time=1437663924.390466 func=Open level=ERROR tid=00007f8890d12700 source=PlainLayout:78 failed stat for file=root:////eos/test/test.content
150723 10:05:24 time=1437663924.390481 func=xrd_open level=ERROR tid=00007f8890d12700 source=xrdposix:2497 open failed for root:////eos/test/test.content.
150723 10:05:24 time=1437663924.390638 func=xrd_get_file level=ERROR tid=00007f8891b64700 source=xrdposix:743 no file abst for fd=0
150723 10:05:24 time=1437663924.390927 func=xrd_get_file level=ERROR tid=00007f8892587700 source=xrdposix:743 no file abst for fd=0
150723 10:05:24 time=1437663924.391021 func=xrd_get_file level=ERROR tid=00007f8890d12700 source=xrdposix:743 no file abst for fd=0
150723 10:05:38 time=1437663938.234485 func=xrd_pread level=WARN tid=00007f2dd3fff700 source=xrdposix:2681 read size=4096, returned=23

I didn't manage to make the EOS fusemount to work, but via xrdcp I can access the file after the following change in the cmsstor150.fnal.gov FST /etc/xrd.cf.fst config file:

xrootd.seclib libXrdSec.so
sec.protocol unix
sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
sec.protbind * only sss unix

(added unix references)

I think this is VERY unsafe... Will contact EOS developers!

#6 Updated by Gerard Bernabeu Altayo almost 5 years ago

email sent:
Hi,

in order to schedule the upgrade I'd like to know when EOS 0.3.125+ will be released in the stable repository.

Also, I have been testing from the MGM node so far, but now I want to give some users access from another node (cmslpc41.fnal.gov). I found that the FUSE mount does mount but does not allow me to write/read:

I had to create /eos/test in cmssrv153 (the test instance MGM) with 'eos mkdir /eos/tmp; eos chown 777 /eos/tmp'. However I can't read remotely yet:

[gerard1@cmslpc41 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 40G 9.7G 28G 26% /
tmpfs 16G 27M 16G 1% /dev/shm
/dev/sda1 1008M 114M 844M 12% /boot
/dev/sda5 857G 21G 793G 3% /storage/local/data1
cmsblue2.fnal.gov:/uscms_data/d3
49T 46T 2.5T 95% /uscms_data/d3
cmsblue2.fnal.gov:/uscms_data/d1
2.0G 75M 2.0G 4% /uscms_data/d1
cmsblue2.fnal.gov:/uscmst1b_scratch/lpc1
70T 32T 39T 46% /uscmst1b_scratch/lpc1
cms-nas-0.fnal.gov:/uscms
2.5T 1.9T 697G 73% /uscms
cms-nfs-uscms:/uscms/data1
50T 34M 50T 1% /uscms_data/d4
cms-nfs-uscms:/uscms/data2
50T 34M 50T 1% /uscms_data/d5
cms-nfs-uscms:/uscms/scratch1
50T 6.7T 43T 14% /uscms_data/scratch1
cmsblue1.fnal.gov:/uscms_data/d2
66T 52T 14T 80% /uscms_data/d2
cvmfs2 20G 14G 6.0G 70% /cvmfs/cms.cern.ch
AFS 2.0T 0 2.0T 0% /afs
eostest 33T 2.5G 33T 1% /eos/test
eosstoreuser 6.9P 2.7P 4.2P 39% /eos/uscms
[gerard1@cmslpc41 ~]$ cat /eos/test/test.content
cat: /eos/test/test.content: No such file or directory
cat: /eos/test/test.content: Input/output error
[gerard1@cmslpc41 ~]$

And this is what is logged in the logfile:

150723 10:05:24 time=1437663924.390425 func=Open level=ERROR tid=00007f8890d12700 source=XrdIo:154 error=opening remote XrdClFile
150723 10:05:24 time=1437663924.390466 func=Open level=ERROR tid=00007f8890d12700 source=PlainLayout:78 failed stat for file=root:////eos/test/test.content
150723 10:05:24 time=1437663924.390481 func=xrd_open level=ERROR tid=00007f8890d12700 source=xrdposix:2497 open failed for root:////eos/test/test.content.
150723 10:05:24 time=1437663924.390638 func=xrd_get_file level=ERROR tid=00007f8891b64700 source=xrdposix:743 no file abst for fd=0
150723 10:05:24 time=1437663924.390927 func=xrd_get_file level=ERROR tid=00007f8892587700 source=xrdposix:743 no file abst for fd=0
150723 10:05:24 time=1437663924.391021 func=xrd_get_file level=ERROR tid=00007f8890d12700 source=xrdposix:743 no file abst for fd=0
150723 10:05:38 time=1437663938.234485 func=xrd_pread level=WARN tid=00007f2dd3fff700 source=xrdposix:2681 read size=4096, returned=23

I didn't manage to make the EOS fusemount to work, but via xrdcp I can access the file after the following change in the cmsstor150.fnal.gov FST /etc/xrd.cf.fst config file:

xrootd.seclib libXrdSec.so
sec.protocol unix
sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
sec.protbind * only sss unix
This security config is copied from our production FSTs... and this looks VERY unsafe, isn't it? Basically any UNIX box around the globe could write/read to any of our FSTs right? What setup do you have? At a minimum I guess I should copy the setup I have in the MGM which is safe(r), right?

Any idea on how to fix the FUSE mount? I can't find any error logs in the MGM machine... It's also true that I didn't manage to set it to 'debug info', the suggested commands don't seem to work:

[root@cmssrv153 ~]# eos debug info *
success: switched to mgm.debuglevel=info on nodes mgm.nodename=a
[root@cmssrv153 ~]# tail /var/log/eos/mgm/xrdlog.mgm
150723 10:16:50 28802 mgmofs_Sched: scheduling underused thread monitor in 64 seconds
150723 10:17:54 28801 mgmofs_Sched: running underused thread monitor inq=0
150723 10:17:54 28801 mgmofs_Sched: 2 threads; 1 idle
150723 10:17:54 28801 mgmofs_Sched: scheduling underused thread monitor in 64 seconds
150723 10:18:58 28802 mgmofs_Sched: running underused thread monitor inq=0
150723 10:18:58 28802 mgmofs_Sched: 2 threads; 1 idle
150723 10:18:58 28802 mgmofs_Sched: scheduling underused thread monitor in 64 seconds
150723 10:18:59 27180 XrootdXeq: root.15941: login as daemon
150723 10:18:59 27180 mgmofs_SendMessage: Unable to Querying relative path 'a?xrdmqmessage.header=24c5bbb2-314e-11e5-a65b-782bcb3c9986^^/eos/cmssrv153.fnal.gov/mgm^^^a^debug^1437664739^881678000^0^0^0^0^^^^0^0^&xrdmqmessage.body=#and#eos.rgid=0#and#eos.ruid=0#and#mgm.cmd=debug#and#mgm.debuglevel=info#and#mgm.nodename=a' is disallowed.; unknown error 3010
150723 10:18:59 27180 XrootdXeq: root.15941: disc 0:00:00
[root@cmssrv153 ~]# ll /eos/
total 0
drwxr-xr-x 1 root root 1 Jul 17 12:23 cmseos-test.fnal.gov
drwxrwxrwx 1 root root 4 Jul 20 11:54 gerard
drwxrwxrwx 1 root root 1 Jul 23 09:53 test
[root@cmssrv153 ~]#

Note that from localhost in the MGM it does work well.

#7 Updated by Gerard Bernabeu Altayo almost 5 years ago

Further changes I need to puppetize:

  • FST: mountpoint fix, add UNIX in /etc/xrd.cnf.fst
  • Add clients as VIDs in the MGM: 'eos vid add gateway cmslpc*.fnal.gov'

To be consistent and after some testing, I've removed the individual entries from production too with:

[root@cmssrv222 ~]# for i in `seq -w 01 42`; do eos vid remove gateway cmslpc${i}; eos vid remove gateway cmslpc${i}.fnal.gov; done

Done in production on:

[root@cmssrv222 ~]# date
Thu Jul 23 14:21:35 CDT 2015

#8 Updated by Gerard Bernabeu Altayo almost 5 years ago

I should add rules for the following:

[root@cmssrv153 ~]# cat /etc/xrd.cf.mgm | grep unix
sec.protocol unix
  1. KRB authentication (GBA 05/2015: we are not using this, we can potentially get rid of krb5 everywhere, or maybe enable it and disable unsafe unix...)
    sec.protbind cmssrv153.fnal.gov sss unix
    sec.protbind cmssrv151.fnal.gov sss unix
    sec.protbind cmssrv152.fnal.gov sss unix
    sec.protbind localhost.localdomain sss unix
    sec.protbind localhost sss unix
    sec.protbind cmseos*.fnal.gov sss unix
    #Client servers that can mount via FUSE and accept (weak) UNIX authentication (we should change the auth to 'krb5 gsi unix sss')
    sec.protbind cmslpc*.fnal.gov unix gsi
    sec.protbind cmswn*.fnal.gov unix gsi
    sec.protbind cmssrv*.fnal.gov unix gsi
    sec.protbind cmsdev*.fnal.gov unix gsi
    sec.protbind cmsfts*.fnal.gov unix gsi
    sec.protbind cmsphedex*.fnal.gov unix gsi
    sec.protbind cmsgwms*.fnal.gov unix gsi
    sec.protbind cmsdev*.fnal.gov unix gsi
    sec.protbind cmsdcam*.fnal.gov unix gsi

And remove the individuals from production too...
[root@cmssrv222 ~]# eos vid add gateway cmswn*.fnal.gov
[root@cmssrv222 ~]# for i in `seq 900 2200`; do eos vid remove gateway cmswn$i; eos vid remove gateway cmswn$i.fnal.gov; done

#9 Updated by Gerard Bernabeu Altayo almost 5 years ago

We went from ~6K entries to 243 VIDs:

[root@cmssrv222 ~]# eos vid ls | grep cmswn
hostmatch:"protocol=* pattern=cmswn*.fnal.gov
tident:"*@cmswn*.fnal.gov":gid => root
tident:"*@cmswn*.fnal.gov":uid => root
[root@cmssrv222 ~]# eos vid ls | wc -l
243

#10 Updated by Gerard Bernabeu Altayo almost 5 years ago

OK, this was a BIG mistake. My checks were not good because I was not checking what userid files were being written as, and it happened that everyone was being mapped as nobody (99) instead of themselves... So when I removed the individual mappings things stopped working because of lack of permissions....

I couldn't manage to rollback the change by adding the manual mappings back so I have stopped the MGM and copied the old backed up config in:

1056  puppet agent --disable
1057 service eos stop mgm
1058 service eos status
1059 cd /var/eos/config/cmssrv222.fnal.gov/
1060 cp default.eoscf /root/
1066 cp default.autosave.1437575489.eoscf default.eoscf
1068 service eos start mgm

Here are the differences, I don't understand why I could not roll it back yet:

[root@cmssrv222 cmssrv222.fnal.gov]# diff /root/default.eoscf /var/eos/config/cmssrv222.fnal.gov/default.eoscf
2939,2942d2938
< vid:tident:"*@cmslpc*":gid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc*" mgm.vid.uid=0
< vid:tident:"*@cmslpc*":uid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc*" mgm.vid.uid=0
< vid:tident:"*@cmslpc*.fnal.gov":gid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc*.fnal.gov" mgm.vid.uid=0
< vid:tident:"*@cmslpc*.fnal.gov":uid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc*.fnal.gov" mgm.vid.uid=0
3033,3034c3029,3030
< vid:tident:"*@cmslpc23.fnal.gov":gid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc23.fnal.gov" mgm.vid.uid=0
< vid:tident:"*@cmslpc23.fnal.gov":uid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmslpc23.fnal.gov" mgm.vid.uid=0
---

vid:tident:"*@cmslpc23.fnal.gov":gid => mgm.cmd=vid mgm.subcmd=set mgm.vid.cmd=map mgm.vid.auth=tident mgm.vid.pattern="*@cmslpc23.fnal.gov" mgm.vid.uid=0 mgm.vid.gid=0 mgm.vid.key=<key> eos.ruid=0 eos.rgid=0
vid:tident:"*@cmslpc23.fnal.gov":uid => mgm.cmd=vid mgm.subcmd=set mgm.vid.cmd=map mgm.vid.auth=tident mgm.vid.pattern="*@cmslpc23.fnal.gov" mgm.vid.uid=0 mgm.vid.gid=0 mgm.vid.key=<key> eos.ruid=0 eos.rgid=0

3147,3148d3142
< vid:tident:"*@cmswn*.fnal.gov":gid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmswn*.fnal.gov" mgm.vid.uid=0
< vid:tident:"*@cmswn*.fnal.gov":uid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="*@cmswn*.fnal.gov" mgm.vid.uid=0
9151,9152d9144
< vid:tident:"unix@cmslpc*.fnal.gov":gid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="unix@cmslpc*.fnal.gov" mgm.vid.uid=0
< vid:tident:"unix@cmslpc*.fnal.gov":uid => eos.rgid=0 eos.ruid=0 mgm.cmd=vid mgm.subcmd=set mgm.vid.auth=tident mgm.vid.cmd=map mgm.vid.gid=0 mgm.vid.key=<key> mgm.vid.pattern="unix@cmslpc*.fnal.gov" mgm.vid.uid=0
[root@cmssrv222 cmssrv222.fnal.gov]#

#11 Updated by Gerard Bernabeu Altayo almost 5 years ago

  • Status changed from New to Resolved

This is done, we're running 0.3.125 in production now.



Also available in: Atom PDF