Project

General

Profile

ADMINISTRATION » History » Version 2

Arthur Kreymer, 12/03/2012 06:08 PM

1 1 Arthur Kreymer
h1. ADMINISTRATION
2 1 Arthur Kreymer
3 2 Arthur Kreymer
h2. FILES
4 2 Arthur Kreymer
5 1 Arthur Kreymer
Working directories and files are under /grid/data/${GROUP}/LOCK
6 1 Arthur Kreymer
7 1 Arthur Kreymer
On the grid, the username does not reflect the identity of the
8 1 Arthur Kreymer
person who submitted the job.
9 1 Arthur Kreymer
So the lock script gets the identity from the grid proxy.
10 1 Arthur Kreymer
11 1 Arthur Kreymer
/LOCKS - active lock files
12 1 Arthur Kreymer
  The lock files are empty, with names contining
13 1 Arthur Kreymer
  date, time queued,               host, pid, user, identity
14 1 Arthur Kreymer
15 1 Arthur Kreymer
/QUEUE - locks pending, empty files containing
16 1 Arthur Kreymer
  date,                            host, pid, user, identity
17 1 Arthur Kreymer
18 1 Arthur Kreymer
/LOG   - empty files with names reflecting completed locks
19 1 Arthur Kreymer
   date, time queued, time locked, host, pid,  user, identity
20 1 Arthur Kreymer
21 1 Arthur Kreymer
/LOGS   - monthly text summaries built from LOG file names.
22 1 Arthur Kreymer
23 1 Arthur Kreymer
/STALE - record of locks that have timed out
24 1 Arthur Kreymer
25 1 Arthur Kreymer
glimit - global activity limit, including all user groups
26 1 Arthur Kreymer
         set this near the actual Bluearc capacity
27 1 Arthur Kreymer
         this is not implemented as of 2012-11-06
28 1 Arthur Kreymer
29 1 Arthur Kreymer
limit  - local activity limit, for the users' own group
30 1 Arthur Kreymer
             set this well under Bluearc capacity
31 1 Arthur Kreymer
32 1 Arthur Kreymer
perf   - performance MB/sec required in PERF before locking
33 1 Arthur Kreymer
34 1 Arthur Kreymer
PERF   - actual MB/sec performance, measured by external agent
35 1 Arthur Kreymer
            ( No agents implemented as of 2010-08-02 )
36 1 Arthur Kreymer
37 1 Arthur Kreymer
rate   - net retry rate target, in retries per second
38 1 Arthur Kreymer
39 1 Arthur Kreymer
small  - MBytes: files smaller than this are not locked by cpn.
40 1 Arthur Kreymer
41 1 Arthur Kreymer
wait   - mininum time to wait before retrying, regardless of the load.
42 1 Arthur Kreymer
         the time delay before retrying a lock is the minimum of
43 1 Arthur Kreymer
* wait
44 1 Arthur Kreymer
* (number of queued locks)/rate
45 1 Arthur Kreymer
46 1 Arthur Kreymer
h1. MAINTENANCE
47 1 Arthur Kreymer
48 1 Arthur Kreymer
lock files should be owned by some appropriate group account, like mindata.
49 1 Arthur Kreymer
50 1 Arthur Kreymer
That account should occasionally remove expired locks and queue entries,
51 1 Arthur Kreymer
and concatenate  LOG entries into monthly summary files.
52 1 Arthur Kreymer
53 1 Arthur Kreymer
You can run the lockclean script, which will do this hourly :
54 1 Arthur Kreymer
55 1 Arthur Kreymer
    set nohup ; /grid/fermiapp/common/tools/lockclean &
56 1 Arthur Kreymer
57 1 Arthur Kreymer
Get an idea of activity by counting lines in log files.
58 1 Arthur Kreymer
59 1 Arthur Kreymer
For example, for Minos, 
60 1 Arthur Kreymer
61 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/LOGS/*.log
62 1 Arthur Kreymer
    9124 /grid/data/e875/LOCK/LOGS/200908.log
63 1 Arthur Kreymer
  140794 /grid/data/e875/LOCK/LOGS/200909.log
64 1 Arthur Kreymer
  181895 /grid/data/e875/LOCK/LOGS/200910.log
65 1 Arthur Kreymer
  196327 /grid/data/e875/LOCK/LOGS/200911.log
66 1 Arthur Kreymer
  125084 /grid/data/e875/LOCK/LOGS/200912.log
67 1 Arthur Kreymer
  272598 /grid/data/e875/LOCK/LOGS/201001.log
68 1 Arthur Kreymer
  284000 /grid/data/e875/LOCK/LOGS/201002.log
69 1 Arthur Kreymer
  275479 /grid/data/e875/LOCK/LOGS/201003.log
70 1 Arthur Kreymer
  354725 /grid/data/e875/LOCK/LOGS/201004.log
71 1 Arthur Kreymer
 1840026 total
72 1 Arthur Kreymer
73 1 Arthur Kreymer
  
74 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/STALE/LOCKS/*.log
75 1 Arthur Kreymer
$ wc -l /grid/data/e875/LOCK/STALE/QUEUE/*.log
76 2 Arthur Kreymer
77 2 Arthur Kreymer
h2. INITIALIZATION
78 2 Arthur Kreymer
79 2 Arthur Kreymer
To start up a new group's LOCKs, 
80 2 Arthur Kreymer
give REX DH people access to the account,
81 2 Arthur Kreymer
and issue a ServiceNow ticket to have the files set up.
82 2 Arthur Kreymer
The .k5login should include 
83 2 Arthur Kreymer
<pre>
84 2 Arthur Kreymer
dbox@FNAL.GOV
85 2 Arthur Kreymer
illingwo@FNAL.GOV
86 2 Arthur Kreymer
kreymer@FNAL.GOV
87 2 Arthur Kreymer
lyon@FNAL.GOV
88 2 Arthur Kreymer
mengel@FNAL.GOV
89 2 Arthur Kreymer
rs@FNAL.GOV
90 2 Arthur Kreymer
votava@FNAL.GOV
91 2 Arthur Kreymer
</pre>
92 2 Arthur Kreymer
93 2 Arthur Kreymer
They will log in to the account and cut/paste the output from
94 2 Arthur Kreymer
the following commands :
95 2 Arthur Kreymer
96 2 Arthur Kreymer
<pre>
97 2 Arthur Kreymer
GROUP=`id -gn`
98 2 Arthur Kreymer
BASE=/grid/data
99 2 Arthur Kreymer
    printf "
100 2 Arthur Kreymer
    You should first be logged into the group account
101 2 Arthur Kreymer
    
102 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}
103 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK
104 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/DO
105 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/LOCKS
106 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/LOG
107 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/LOGS
108 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/QUEUE
109 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE
110 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE/LOCKS
111 2 Arthur Kreymer
mkdir -p  ${BASE}/${GROUP}/LOCK/STALE/QUEUE
112 2 Arthur Kreymer
113 2 Arthur Kreymer
chmod -R 775 ${BASE}/${GROUP}/LOCK
114 2 Arthur Kreymer
find ${BASE}/${GROUP}/LOCK -type d -exec chmod 775 {} \;
115 2 Arthur Kreymer
116 2 Arthur Kreymer
    Then create these files under LOCK, with typical values:
117 2 Arthur Kreymer
118 2 Arthur Kreymer
echo  1000000 > ${BASE}/${GROUP}/LOCK/small # small file size for cpn, 1 MB
119 2 Arthur Kreymer
120 2 Arthur Kreymer
echo  99 > ${BASE}/${GROUP}/LOCK/glimit # global open file limit
121 2 Arthur Kreymer
echo   5 > ${BASE}/${GROUP}/LOCK/limit  #        open file limit
122 2 Arthur Kreymer
echo   3 > ${BASE}/${GROUP}/LOCK/perf   #  required performance MB/sec before locking
123 2 Arthur Kreymer
echo  50 > ${BASE}/${GROUP}/LOCK/PERF   #  measured performance MB/sec set by outside program like bluwatch
124 2 Arthur Kreymer
echo   1 > ${BASE}/${GROUP}/LOCK/rate   #  target polling rate, per second
125 2 Arthur Kreymer
echo   3 > ${BASE}/${GROUP}/LOCK/stale  #  ignore locks this old ( minutes )
126 2 Arthur Kreymer
echo 600 > ${BASE}/${GROUP}/LOCK/staleq #  ignore queue entries this old ( minutes )
127 2 Arthur Kreymer
echo   5 > ${BASE}/${GROUP}/LOCK/wait   #  minimum retry delay ( seconds )
128 2 Arthur Kreymer
"
129 2 Arthur Kreymer
</pre>