Project

General

Profile

ADMINISTRATION » History » Version 2

« Previous - Version 2/22 (diff) - Next » - Current version
Arthur Kreymer, 12/03/2012 06:08 PM


ADMINISTRATION

FILES

Working directories and files are under /grid/data/${GROUP}/LOCK

On the grid, the username does not reflect the identity of the
person who submitted the job.
So the lock script gets the identity from the grid proxy.

/LOCKS - active lock files
The lock files are empty, with names contining
date, time queued, host, pid, user, identity

/QUEUE - locks pending, empty files containing
date, host, pid, user, identity

/LOG - empty files with names reflecting completed locks
date, time queued, time locked, host, pid, user, identity

/LOGS - monthly text summaries built from LOG file names.

/STALE - record of locks that have timed out

glimit - global activity limit, including all user groups
set this near the actual Bluearc capacity
this is not implemented as of 2012-11-06

limit - local activity limit, for the users' own group
set this well under Bluearc capacity

perf - performance MB/sec required in PERF before locking

PERF - actual MB/sec performance, measured by external agent
( No agents implemented as of 2010-08-02 )

rate - net retry rate target, in retries per second

small - MBytes: files smaller than this are not locked by cpn.

wait - mininum time to wait before retrying, regardless of the load.
the time delay before retrying a lock is the minimum of
  • wait
  • (number of queued locks)/rate

MAINTENANCE

lock files should be owned by some appropriate group account, like mindata.

That account should occasionally remove expired locks and queue entries,
and concatenate LOG entries into monthly summary files.

You can run the lockclean script, which will do this hourly :

set nohup ; /grid/fermiapp/common/tools/lockclean &

Get an idea of activity by counting lines in log files.

For example, for Minos,

$ wc -l /grid/data/e875/LOCK/LOGS/*.log
9124 /grid/data/e875/LOCK/LOGS/200908.log
140794 /grid/data/e875/LOCK/LOGS/200909.log
181895 /grid/data/e875/LOCK/LOGS/200910.log
196327 /grid/data/e875/LOCK/LOGS/200911.log
125084 /grid/data/e875/LOCK/LOGS/200912.log
272598 /grid/data/e875/LOCK/LOGS/201001.log
284000 /grid/data/e875/LOCK/LOGS/201002.log
275479 /grid/data/e875/LOCK/LOGS/201003.log
354725 /grid/data/e875/LOCK/LOGS/201004.log
1840026 total
$ wc -l /grid/data/e875/LOCK/STALE/LOCKS/*.log
$ wc -l /grid/data/e875/LOCK/STALE/QUEUE/*.log

INITIALIZATION

To start up a new group's LOCKs,
give REX DH people access to the account,
and issue a ServiceNow ticket to have the files set up.
The .k5login should include

dbox@FNAL.GOV
illingwo@FNAL.GOV
kreymer@FNAL.GOV
lyon@FNAL.GOV
mengel@FNAL.GOV
rs@FNAL.GOV
votava@FNAL.GOV

They will log in to the account and cut/paste the output from
the following commands :

GROUP=`id -gn`
BASE=/grid/data
    printf " 
    You should first be logged into the group account

mkdir -p  ${BASE}/${GROUP}
mkdir -p  ${BASE}/${GROUP}/LOCK
mkdir -p  ${BASE}/${GROUP}/LOCK/DO
mkdir -p  ${BASE}/${GROUP}/LOCK/LOCKS
mkdir -p  ${BASE}/${GROUP}/LOCK/LOG
mkdir -p  ${BASE}/${GROUP}/LOCK/LOGS
mkdir -p  ${BASE}/${GROUP}/LOCK/QUEUE
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE/LOCKS
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE/QUEUE

chmod -R 775 ${BASE}/${GROUP}/LOCK
find ${BASE}/${GROUP}/LOCK -type d -exec chmod 775 {} \;

    Then create these files under LOCK, with typical values:

echo  1000000 > ${BASE}/${GROUP}/LOCK/small # small file size for cpn, 1 MB

echo  99 > ${BASE}/${GROUP}/LOCK/glimit # global open file limit
echo   5 > ${BASE}/${GROUP}/LOCK/limit  #        open file limit
echo   3 > ${BASE}/${GROUP}/LOCK/perf   #  required performance MB/sec before locking
echo  50 > ${BASE}/${GROUP}/LOCK/PERF   #  measured performance MB/sec set by outside program like bluwatch
echo   1 > ${BASE}/${GROUP}/LOCK/rate   #  target polling rate, per second
echo   3 > ${BASE}/${GROUP}/LOCK/stale  #  ignore locks this old ( minutes )
echo 600 > ${BASE}/${GROUP}/LOCK/staleq #  ignore queue entries this old ( minutes )
echo   5 > ${BASE}/${GROUP}/LOCK/wait   #  minimum retry delay ( seconds )
"