Project

General

Profile

ADMINISTRATION » History » Version 20

Arthur Kreymer, 04/30/2015 09:42 AM
cleaned up format, tested with cdms

1 1 Arthur Kreymer
h1. ADMINISTRATION
2 1 Arthur Kreymer
3 2 Arthur Kreymer
h2. FILES
4 2 Arthur Kreymer
5 1 Arthur Kreymer
Working directories and files are under /grid/data/${GROUP}/LOCK
6 1 Arthur Kreymer
7 1 Arthur Kreymer
On the grid, the username does not reflect the identity of the
8 1 Arthur Kreymer
person who submitted the job.
9 1 Arthur Kreymer
So the lock script gets the identity from the grid proxy.
10 1 Arthur Kreymer
11 1 Arthur Kreymer
/LOCKS - active lock files
12 1 Arthur Kreymer
  The lock files are empty, with names contining
13 1 Arthur Kreymer
  date, time queued,               host, pid, user, identity
14 1 Arthur Kreymer
15 1 Arthur Kreymer
/QUEUE - locks pending, empty files containing
16 1 Arthur Kreymer
  date,                            host, pid, user, identity
17 1 Arthur Kreymer
18 1 Arthur Kreymer
/LOG   - empty files with names reflecting completed locks
19 1 Arthur Kreymer
   date, time queued, time locked, host, pid,  user, identity
20 1 Arthur Kreymer
21 1 Arthur Kreymer
/LOGS   - monthly text summaries built from LOG file names.
22 1 Arthur Kreymer
23 1 Arthur Kreymer
/STALE - record of locks that have timed out
24 1 Arthur Kreymer
25 1 Arthur Kreymer
glimit - global activity limit, including all user groups
26 1 Arthur Kreymer
         set this near the actual Bluearc capacity
27 1 Arthur Kreymer
         this is not implemented as of 2012-11-06
28 1 Arthur Kreymer
29 1 Arthur Kreymer
limit  - local activity limit, for the users' own group
30 1 Arthur Kreymer
             set this well under Bluearc capacity
31 1 Arthur Kreymer
32 1 Arthur Kreymer
perf   - performance MB/sec required in PERF before locking
33 1 Arthur Kreymer
34 1 Arthur Kreymer
PERF   - actual MB/sec performance, measured by external agent
35 1 Arthur Kreymer
            ( No agents implemented as of 2010-08-02 )
36 1 Arthur Kreymer
37 1 Arthur Kreymer
rate   - net retry rate target, in retries per second
38 1 Arthur Kreymer
39 1 Arthur Kreymer
small  - MBytes: files smaller than this are not locked by cpn.
40 1 Arthur Kreymer
41 1 Arthur Kreymer
wait   - mininum time to wait before retrying, regardless of the load.
42 1 Arthur Kreymer
         the time delay before retrying a lock is the minimum of
43 1 Arthur Kreymer
* wait
44 1 Arthur Kreymer
* (number of queued locks)/rate
45 1 Arthur Kreymer
46 4 Arthur Kreymer
h2. MAINTENANCE
47 1 Arthur Kreymer
48 1 Arthur Kreymer
lock files should be owned by some appropriate group account, like mindata.
49 1 Arthur Kreymer
50 1 Arthur Kreymer
That account should occasionally remove expired locks and queue entries,
51 1 Arthur Kreymer
and concatenate  LOG entries into monthly summary files.
52 1 Arthur Kreymer
53 4 Arthur Kreymer
You can run the lockclean script manually, which will do this hourly :
54 4 Arthur Kreymer
But be careful, interactive logins on gpsn01 are in group gpcf.
55 4 Arthur Kreymer
Use 'sg' to set the proper group first
56 4 Arthur Kreymer
<pre>
57 9 Arthur Kreymer
    GRP=<mygroup>
58 9 Arthur Kreymer
    set nohup ; sg ${GRP} -c /grid/fermiapp/common/tools/lockclean &
59 4 Arthur Kreymer
</pre>
60 4 Arthur Kreymer
There should be a crontab entry for each account like
61 4 Arthur Kreymer
<pre>
62 9 Arthur Kreymer
@reboot sg ${GRP} -c /grid/fermiapp/common/tools/lockclean
63 4 Arthur Kreymer
</pre>
64 4 Arthur Kreymer
65 11 Arthur Kreymer
| Group     | Account@Host            | crontab |
66 14 Arthur Kreymer
| coupp     | ifmon@gpsn01            | @reboot sleep 300 ; sg coupp  -c /grid/fermiapp/common/tools/lockclean |
67 14 Arthur Kreymer
| des       | desdata@gpsn01          | @reboot sleep 300 ; sg des    -c /grid/fermiapp/common/tools/lockclean |
68 14 Arthur Kreymer
| e875      | mindata@minos27         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
69 14 Arthur Kreymer
| e898      | ifmon@gpsn01            | @reboot sleep 300 ; sg e898   -c /grid/fermiapp/common/tools/lockclean |
70 14 Arthur Kreymer
| e938      | minervadat@if02         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
71 14 Arthur Kreymer
| gm2       | gm2dat@gpsn01           | @reboot sleep 300 ; sg gm2    -c /grid/fermiapp/common/tools/lockclean |
72 14 Arthur Kreymer
| gpcf      | ifmon@gpsn01            | @reboot sleep 300 ; sg gpcf   -c /grid/fermiapp/common/tools/lockclean |
73 14 Arthur Kreymer
| lbne      | lbnedata@lbnegpvm01     | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
74 14 Arthur Kreymer
| marslbne  | marslbne@lbnegpvm01     | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
75 14 Arthur Kreymer
| marsmu2e  | marsmu2e@detsim         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
76 14 Arthur Kreymer
| mu2e      | mu2e@mu2egpvm01         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
77 14 Arthur Kreymer
| mu2epro   | mu2epro@mu2egpvm01      | RETIRED FEB 4 2013                                                     |
78 14 Arthur Kreymer
| t-962     | argoneut@argoneutgpvm01 | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
79 14 Arthur Kreymer
| t-1034    | ifmon@gpsn01            | @reboot sleep 300 ; sg t-1034 -c /grid/fermiapp/common/tools/lockclean |
80 14 Arthur Kreymer
| uboone    | uboone@uboonegpvm01     | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
81 14 Arthur Kreymer
| nova      | novadata@gpcf028        | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
82 18 Marc Mengel
| numix     | ifmon@gpsn01            | @reboot sleep 300; sg numix -c   /grid/fermiapp/common/tools/lockclean |
83 4 Arthur Kreymer
84 4 Arthur Kreymer
h1. USAGE
85 1 Arthur Kreymer
86 1 Arthur Kreymer
Get an idea of activity by counting lines in log files.
87 1 Arthur Kreymer
88 1 Arthur Kreymer
For example, for Minos, 
89 1 Arthur Kreymer
90 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/LOGS/*.log
91 1 Arthur Kreymer
    9124 /grid/data/e875/LOCK/LOGS/200908.log
92 1 Arthur Kreymer
  140794 /grid/data/e875/LOCK/LOGS/200909.log
93 1 Arthur Kreymer
  181895 /grid/data/e875/LOCK/LOGS/200910.log
94 1 Arthur Kreymer
  196327 /grid/data/e875/LOCK/LOGS/200911.log
95 1 Arthur Kreymer
  125084 /grid/data/e875/LOCK/LOGS/200912.log
96 1 Arthur Kreymer
  272598 /grid/data/e875/LOCK/LOGS/201001.log
97 1 Arthur Kreymer
  284000 /grid/data/e875/LOCK/LOGS/201002.log
98 1 Arthur Kreymer
  275479 /grid/data/e875/LOCK/LOGS/201003.log
99 1 Arthur Kreymer
  354725 /grid/data/e875/LOCK/LOGS/201004.log
100 1 Arthur Kreymer
 1840026 total
101 1 Arthur Kreymer
102 1 Arthur Kreymer
  
103 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/STALE/LOCKS/*.log
104 1 Arthur Kreymer
$ wc -l /grid/data/e875/LOCK/STALE/QUEUE/*.log
105 2 Arthur Kreymer
106 2 Arthur Kreymer
h2. INITIALIZATION
107 2 Arthur Kreymer
108 13 Arthur Kreymer
New LOCK areas should be created by the ifmon account.
109 13 Arthur Kreymer
We can use root@if-admin-minos to create this area.
110 1 Arthur Kreymer
111 1 Arthur Kreymer
<pre>
112 1 Arthur Kreymer
    ifmon@gpsn01
113 13 Arthur Kreymer
114 1 Arthur Kreymer
GROU=<name of the group>
115 1 Arthur Kreymer
sg ${GROU}
116 20 Arthur Kreymer
GROU=<name of the group>
117 1 Arthur Kreymer
118 19 Arthur Kreymer
. /grid/fermiapp/products/common/etc/setups.sh
119 20 Arthur Kreymer
ups tailor cpn          # will echo the commands proposed 
120 20 Arthur Kreymer
ups tailor cpn -O write # will execute the commands
121 1 Arthur Kreymer
122 1 Arthur Kreymer
cd /grid/data/${GROU}/LOCK
123 1 Arthur Kreymer
chmod 777 DO LOCKS LOG QUEUE
124 20 Arthur Kreymer
</pre>
125 1 Arthur Kreymer
126 1 Arthur Kreymer
   Update gpsn01:ifmon:crontab.gpsn01
127 1 Arthur Kreymer
   adding an entry for the new group
128 1 Arthur Kreymer
   and start the lockclean process manually
129 1 Arthur Kreymer
130 20 Arthur Kreymer
<pre>
131 1 Arthur Kreymer
crontab crontab.gpsn01
132 1 Arthur Kreymer
133 1 Arthur Kreymer
set nohup ;  sg ${GROU} -c /grid/fermiapp/common/tools/lockclean &
134 20 Arthur Kreymer
</pre>
135 1 Arthur Kreymer
136 1 Arthur Kreymer
    Test the locks with a normal user
137 1 Arthur Kreymer
138 20 Arthur Kreymer
<pre>
139 1 Arthur Kreymer
GROU=<name of the group>
140 1 Arthur Kreymer
141 1 Arthur Kreymer
. /grid/fermiapp/products/common/etc/setups.sh
142 1 Arthur Kreymer
143 1 Arthur Kreymer
setup cpn
144 1 Arthur Kreymer
145 1 Arthur Kreymer
export CPN_LOCK_GROUP=${GROU}
146 1 Arthur Kreymer
147 1 Arthur Kreymer
lock
148 1 Arthur Kreymer
LOCK - Wed Jan 22 16:37:07 UTC 2014 lock  /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:07.0.minos27.7436.kreymer.kreymer
149 1 Arthur Kreymer
150 1 Arthur Kreymer
lock free
151 1 Arthur Kreymer
LOCK - Wed Jan 22 16:37:09 UTC 2014 freed /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:07.0.minos27.7436.kreymer.kreymer
152 1 Arthur Kreymer
153 1 Arthur Kreymer
cpn /usr/bin/crash /dev/null
154 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:44 UTC 2014 lock  /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:44.1.minos27.30874.kreymer.kreymer
155 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:44 UTC 2014 freed /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:44.1.minos27.30874.kreymer.kreymer
156 20 Arthur Kreymer
</pre>
157 19 Arthur Kreymer
158 19 Arthur Kreymer
h3. If the /grid/data/${GROU} does not exist, or needs to be linked to alternate name, see the following notes
159 19 Arthur Kreymer
160 20 Arthur Kreymer
<pre>
161 19 Arthur Kreymer
DATA=${GROU}  # directory to be locked, usually ${GROU}/data, but not always
162 19 Arthur Kreymer
163 19 Arthur Kreymer
Determine what the gid is for the area.
164 19 Arthur Kreymer
stat /${DATA}/data | grep Gid
165 19 Arthur Kreymer
166 19 Arthur Kreymer
    root@if-admin-minos
167 19 Arthur Kreymer
168 19 Arthur Kreymer
IFUID=45438  # ifmon UID
169 1 Arthur Kreymer
GRGID=....   # group GID from /${DATA}/data
170 19 Arthur Kreymer
GROU=<name of the group>
171 19 Arthur Kreymer
172 19 Arthur Kreymer
mkdir -p                /grid/data/$GROU}  # if necessary       
173 19 Arthur Kreymer
ls -ld                  /grid/data/$GROU}  # note original owner
174 19 Arthur Kreymer
chown ${IFUID}.${GRGID} /grid/data/$GROU}  # set ownership while tailoring cpn
175 19 Arthur Kreymer
176 19 Arthur Kreymer
REX will verify that the group id name is the same on Fermigrid nodes and in the Lab GID registry, at
177 20 Arthur Kreymer
http://www-giduid.fnal.gov/cd/FUE/uidgid/gid_id.lis</pre>
178 19 Arthur Kreymer
179 19 Arthur Kreymer
WARNING - if the group name contains a special character like -,
180 19 Arthur Kreymer
that group name will not be used on Fermigrid worker nodes.
181 19 Arthur Kreymer
You need to find out what the group name is there,
182 19 Arthur Kreymer
and make a symlink in /grid/data for compatibility
183 19 Arthur Kreymer
184 19 Arthur Kreymer
GRID=<name of Fermigrid group>
185 14 Arthur Kreymer
186 14 Arthur Kreymer
ln -s /grid/data/${GROU} /grid/data/${GRID}
187 1 Arthur Kreymer
</pre>