Project

General

Profile

Support #15795

Milestone #15057: Minos SLF5 retirement

Support #15792: Individual Minos SLF5 node shutdowns

minos25 retirement

Added by Arthur Kreymer over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
High
Start date:
03/08/2017
Due date:
03/31/2017
% Done:

100%

Estimated time:
10.00 h
Duration: 24

Description

Retire minos25 by 2017 April 1

History

#1 Updated by Arthur Kreymer over 3 years ago

Ran /grid/fermiapp/minos/scripts/procscan

Found many user crontab still running kcron,
as well as a wingmc background process.

20170303_16:07:01_procs.gz
 4023 jdejong   18   0 59880 3892 2584 D  1.9  0.0   0:00.01 sendmail: [127.0.0.1]: idle
 3917 wingmc    21   0  106m 2036 1252 S  0.0  0.0   0:00.00 crond
 3919 jdejong   23   0  106m 2096 1256 S  0.0  0.0   0:00.01 crond
 3920 rtoner    23   0  106m 2096 1256 S  0.0  0.0   0:00.00 crond
 3953 wingmc    25   0  8728 1044  880 S  0.0  0.0   0:00.00 /bin/sh /local/scratch25/grid/kproxy -r Production
 3971 wingmc    25   0  3812  428  368 S  0.0  0.0   0:00.00 sleep 1632
 3978 jdejong   23   0     0    0    0 Z  0.0  0.0   0:00.00 [kproxy] <defunct>
 3990 rtoner    25   0     0    0    0 Z  0.0  0.0   0:00.00 [kproxy] <defunct>
 4008 rtoner    21   0 59828 3840 2544 D  0.0  0.0   0:00.00 sendmail: [127.0.0.1]: idle

also nsmayer and ahimmel

Need to update /local/scratch/grid/kproxy.removecron
    it only removes Analysis entries
    it writes CRONX to nonexistent /scratch25/<user>/CRONX, for gpsn01

Should remove entire crontab entry,
and write to /local/scratch/grid/CRONX-<user> for now

#2 Updated by Arthur Kreymer over 3 years ago

As grid@minos25,
in /local/scratch25/grid linked kproxy.nocron instead of kproxy.removecron.

Tested with jdejong account. Seems to work, removing crontab unconditionally.

Deployed

date
ln -sf  kproxy.nocron kproxy # was kproxy.removecron

Thu Mar  9 00:15:43 CST 2017

#3 Updated by Arthur Kreymer over 3 years ago

See scan /minos/app/users/mindata/log/procsum/minos25-20170309-0825

no kproxy activity since 02:07

ls -l /scratch/scratch25/*/CRONX
-rw-r--r-- 1 ahimmel e875  80 Mar  9 02:07 /scratch/scratch25/ahimmel/CRONX
-rw-r--r-- 1 grafnj  e875 430 Mar  9 02:07 /scratch/scratch25/grafnj/CRONX
-rw-rw-r-- 1 jdejong e875 105 Mar  9 00:13 /scratch/scratch25/jdejong/CRONX
-rw-r--r-- 1 nsmayer e875 479 Mar  9 02:07 /scratch/scratch25/nsmayer/CRONX
-rw-r--r-- 1 rtoner  e875  79 Mar  9 02:07 /scratch/scratch25/rtoner/CRONX
-rw-r--r-- 1 wingmc  e875  36 Mar  9 02:07 /scratch/scratch25/wingmc/CRONX

#4 Updated by Arthur Kreymer over 3 years ago

   Local disks for archival :

df -h  | grep dev
/dev/vda2              14G  6.8G  6.1G  53% /
/dev/vda8              47G  1.2G   43G   3% /scratch
/dev/vda7             2.0G   37M  1.9G   2% /tmp
/dev/vda5             5.9G  1.5G  4.1G  26% /opt
/dev/vda3             7.8G  2.6G  4.8G  35% /var
/dev/vda1             122M   35M   81M  31% /boot
tmpfs                 5.9G  4.0K  5.9G   1% /dev/shm

  In opt, will abandon the condor areas last accessed  May 1 2012
  and the empty mindata and minospro directories

/scratch/scratch25 is symlinked to /local/scratch25

storage in scratch25 is mostly small,
user grid areas are protected, contain old proxies

also protected are a few directories in 
/gridold/VDT/vdt
and kreymer/AFS.201110/* which was used to recover a single file in 2011.

 du -sm * | sort -n 
1    xbhuang
1    zisvan
74    condor
87    rmehdi
153    nickd
209    gridold

I removed kreymer/AFS.201110
I removed gridold/VDT. gridold was a copy of grid made in 2012

    mindata@minos25

mkdir /minos/data/mindata/archive/minos25
date
cp -ax /scratch/scratch25 /minos/data/mindata/archive/minos25/scratch25
Thu Mar  9 09:31:30 CST 2017

du -sm /scratch/scratch25 /minos/data/mindata/archive/minos25/scratch25
314    /scratch/scratch25
255    /minos/data/mindata/archive/minos25/scratch25

   after the archive
I removed rmehdi/N00010660_0015.mdaq.root
I removed nickd/rocklogs job logs from 2009

Archived minos25 grid home area

rm -r .nedit
rm -r .globus
rmdir .ssh
rm    .lesshst
rm    .Xauthority
rm    .bash_history
rmdir hostcerts
rm -r minos25test # hostcert and key pem files
rm -r vdt2.0
rm vdt
mindata
cp -ax /home/grid /minos/data/mindata/archive/minos25/homegrid

#5 Updated by Arthur Kreymer over 3 years ago

minos25 seems to be ready to shut down soon

  • Ganglia shows little to no activity
  • local files are archived to /minos/data/mindata/archive/minos25/scratch25
  • prochistory shows no current activity. See
    • /minos/app/users/mindata/log/procsum/minos25-20170309-0825

I will check prochistory again on Monday 2017/03/13
If all is quiet, I will issue a RITM for shutting down the system.
We will ask to have the image retained for month before removal ( minos25 is a VM ).

#6 Updated by Arthur Kreymer over 3 years ago

  • % Done changed from 10 to 90

#7 Updated by Arthur Kreymer over 3 years ago

Summary

Task Completed Comment
Ganglia 03/16 Idle
procsum 03/14 Users off 03/09, idle from 03/10
scratch25 03/09 /minos/data/mindata/archive/minos25/scratch25
home 03/09 /minos/data/mindata/archive/minos25/homegrid
opt 03/16 will not retain /opt/condor files and logs

Ganglia monthly averages
Idle 99.5 %
Load 1 28m
Net in 3k
Net out 2k

#8 Updated by Arthur Kreymer over 3 years ago

RITM0539526 03/16 minos25 shutdown

At your convenience, please shut down the minos25 VM.
This SLF5 system is being retired.

See preparation details in https://cdcvs.fnal.gov/redmine/issues/15795

Please hold the image for a month before permanent removal.
_________________________________________________________________

2017-03-16 09:58:08 CDT - Christophe Bonnaud
Minos25 has been shutdown.

#9 Updated by Arthur Kreymer over 3 years ago

  • Status changed from Work in progress to Resolved
  • % Done changed from 90 to 100

#10 Updated by Arthur Kreymer about 3 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF