Project

General

Profile

ADMINISTRATION » History » Version 18

Marc Mengel, 01/27/2015 09:59 AM

1 1 Arthur Kreymer
h1. ADMINISTRATION
2 1 Arthur Kreymer
3 2 Arthur Kreymer
h2. FILES
4 2 Arthur Kreymer
5 1 Arthur Kreymer
Working directories and files are under /grid/data/${GROUP}/LOCK
6 1 Arthur Kreymer
7 1 Arthur Kreymer
On the grid, the username does not reflect the identity of the
8 1 Arthur Kreymer
person who submitted the job.
9 1 Arthur Kreymer
So the lock script gets the identity from the grid proxy.
10 1 Arthur Kreymer
11 1 Arthur Kreymer
/LOCKS - active lock files
12 1 Arthur Kreymer
  The lock files are empty, with names contining
13 1 Arthur Kreymer
  date, time queued,               host, pid, user, identity
14 1 Arthur Kreymer
15 1 Arthur Kreymer
/QUEUE - locks pending, empty files containing
16 1 Arthur Kreymer
  date,                            host, pid, user, identity
17 1 Arthur Kreymer
18 1 Arthur Kreymer
/LOG   - empty files with names reflecting completed locks
19 1 Arthur Kreymer
   date, time queued, time locked, host, pid,  user, identity
20 1 Arthur Kreymer
21 1 Arthur Kreymer
/LOGS   - monthly text summaries built from LOG file names.
22 1 Arthur Kreymer
23 1 Arthur Kreymer
/STALE - record of locks that have timed out
24 1 Arthur Kreymer
25 1 Arthur Kreymer
glimit - global activity limit, including all user groups
26 1 Arthur Kreymer
         set this near the actual Bluearc capacity
27 1 Arthur Kreymer
         this is not implemented as of 2012-11-06
28 1 Arthur Kreymer
29 1 Arthur Kreymer
limit  - local activity limit, for the users' own group
30 1 Arthur Kreymer
             set this well under Bluearc capacity
31 1 Arthur Kreymer
32 1 Arthur Kreymer
perf   - performance MB/sec required in PERF before locking
33 1 Arthur Kreymer
34 1 Arthur Kreymer
PERF   - actual MB/sec performance, measured by external agent
35 1 Arthur Kreymer
            ( No agents implemented as of 2010-08-02 )
36 1 Arthur Kreymer
37 1 Arthur Kreymer
rate   - net retry rate target, in retries per second
38 1 Arthur Kreymer
39 1 Arthur Kreymer
small  - MBytes: files smaller than this are not locked by cpn.
40 1 Arthur Kreymer
41 1 Arthur Kreymer
wait   - mininum time to wait before retrying, regardless of the load.
42 1 Arthur Kreymer
         the time delay before retrying a lock is the minimum of
43 1 Arthur Kreymer
* wait
44 1 Arthur Kreymer
* (number of queued locks)/rate
45 1 Arthur Kreymer
46 4 Arthur Kreymer
h2. MAINTENANCE
47 1 Arthur Kreymer
48 1 Arthur Kreymer
lock files should be owned by some appropriate group account, like mindata.
49 1 Arthur Kreymer
50 1 Arthur Kreymer
That account should occasionally remove expired locks and queue entries,
51 1 Arthur Kreymer
and concatenate  LOG entries into monthly summary files.
52 1 Arthur Kreymer
53 4 Arthur Kreymer
You can run the lockclean script manually, which will do this hourly :
54 4 Arthur Kreymer
But be careful, interactive logins on gpsn01 are in group gpcf.
55 4 Arthur Kreymer
Use 'sg' to set the proper group first
56 4 Arthur Kreymer
<pre>
57 9 Arthur Kreymer
    GRP=<mygroup>
58 9 Arthur Kreymer
    set nohup ; sg ${GRP} -c /grid/fermiapp/common/tools/lockclean &
59 4 Arthur Kreymer
</pre>
60 4 Arthur Kreymer
There should be a crontab entry for each account like
61 4 Arthur Kreymer
<pre>
62 9 Arthur Kreymer
@reboot sg ${GRP} -c /grid/fermiapp/common/tools/lockclean
63 4 Arthur Kreymer
</pre>
64 4 Arthur Kreymer
65 11 Arthur Kreymer
| Group     | Account@Host            | crontab |
66 14 Arthur Kreymer
| coupp     | ifmon@gpsn01            | @reboot sleep 300 ; sg coupp  -c /grid/fermiapp/common/tools/lockclean |
67 14 Arthur Kreymer
| des       | desdata@gpsn01          | @reboot sleep 300 ; sg des    -c /grid/fermiapp/common/tools/lockclean |
68 14 Arthur Kreymer
| e875      | mindata@minos27         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
69 14 Arthur Kreymer
| e898      | ifmon@gpsn01            | @reboot sleep 300 ; sg e898   -c /grid/fermiapp/common/tools/lockclean |
70 14 Arthur Kreymer
| e938      | minervadat@if02         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
71 14 Arthur Kreymer
| gm2       | gm2dat@gpsn01           | @reboot sleep 300 ; sg gm2    -c /grid/fermiapp/common/tools/lockclean |
72 14 Arthur Kreymer
| gpcf      | ifmon@gpsn01            | @reboot sleep 300 ; sg gpcf   -c /grid/fermiapp/common/tools/lockclean |
73 14 Arthur Kreymer
| lbne      | lbnedata@lbnegpvm01     | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
74 14 Arthur Kreymer
| marslbne  | marslbne@lbnegpvm01     | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
75 14 Arthur Kreymer
| marsmu2e  | marsmu2e@detsim         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
76 14 Arthur Kreymer
| mu2e      | mu2e@mu2egpvm01         | @reboot sleep 300 ;              /grid/fermiapp/common/tools/lockclean |
77 14 Arthur Kreymer
| mu2epro   | mu2epro@mu2egpvm01      | RETIRED FEB 4 2013                                                     |
78 14 Arthur Kreymer
| t-962     | argoneut@argoneutgpvm01 | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
79 14 Arthur Kreymer
| t-1034    | ifmon@gpsn01            | @reboot sleep 300 ; sg t-1034 -c /grid/fermiapp/common/tools/lockclean |
80 14 Arthur Kreymer
| uboone    | uboone@uboonegpvm01     | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
81 14 Arthur Kreymer
| nova      | novadata@gpcf028        | @reboot sleep 300;               /grid/fermiapp/common/tools/lockclean |
82 18 Marc Mengel
| numix     | ifmon@gpsn01            | @reboot sleep 300; sg numix -c   /grid/fermiapp/common/tools/lockclean |
83 4 Arthur Kreymer
84 4 Arthur Kreymer
h1. USAGE
85 1 Arthur Kreymer
86 1 Arthur Kreymer
Get an idea of activity by counting lines in log files.
87 1 Arthur Kreymer
88 1 Arthur Kreymer
For example, for Minos, 
89 1 Arthur Kreymer
90 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/LOGS/*.log
91 1 Arthur Kreymer
    9124 /grid/data/e875/LOCK/LOGS/200908.log
92 1 Arthur Kreymer
  140794 /grid/data/e875/LOCK/LOGS/200909.log
93 1 Arthur Kreymer
  181895 /grid/data/e875/LOCK/LOGS/200910.log
94 1 Arthur Kreymer
  196327 /grid/data/e875/LOCK/LOGS/200911.log
95 1 Arthur Kreymer
  125084 /grid/data/e875/LOCK/LOGS/200912.log
96 1 Arthur Kreymer
  272598 /grid/data/e875/LOCK/LOGS/201001.log
97 1 Arthur Kreymer
  284000 /grid/data/e875/LOCK/LOGS/201002.log
98 1 Arthur Kreymer
  275479 /grid/data/e875/LOCK/LOGS/201003.log
99 1 Arthur Kreymer
  354725 /grid/data/e875/LOCK/LOGS/201004.log
100 1 Arthur Kreymer
 1840026 total
101 1 Arthur Kreymer
102 1 Arthur Kreymer
  
103 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/STALE/LOCKS/*.log
104 1 Arthur Kreymer
$ wc -l /grid/data/e875/LOCK/STALE/QUEUE/*.log
105 2 Arthur Kreymer
106 2 Arthur Kreymer
h2. INITIALIZATION
107 2 Arthur Kreymer
108 13 Arthur Kreymer
New LOCK areas should be created by the ifmon account.
109 13 Arthur Kreymer
We can use root@if-admin-minos to create this area.
110 1 Arthur Kreymer
111 1 Arthur Kreymer
<pre>
112 1 Arthur Kreymer
    ifmon@gpsn01
113 13 Arthur Kreymer
114 1 Arthur Kreymer
GROU=<name of the group>
115 13 Arthur Kreymer
DATA=${GROU}  # directory to be locked, usually ${GROU}/data, but not always
116 13 Arthur Kreymer
117 13 Arthur Kreymer
Determine what the gid is for the area.
118 1 Arthur Kreymer
stat /${DATA}/data | grep Gid
119 13 Arthur Kreymer
120 13 Arthur Kreymer
    root@if-admin-minos
121 13 Arthur Kreymer
122 13 Arthur Kreymer
IFUID=45438  # ifmon UID
123 13 Arthur Kreymer
GRGID=....   # group GID from /${DATA}/data
124 13 Arthur Kreymer
GROU=<name of the group>
125 13 Arthur Kreymer
126 14 Arthur Kreymer
mkdir -p                /grid/data/$GROU}  # if necessary       
127 14 Arthur Kreymer
ls -ld                  /grid/data/$GROU}  # note original owner
128 14 Arthur Kreymer
chown ${IFUID}.${GRGID} /grid/data/$GROU}  # set ownership while tailoring cpn
129 1 Arthur Kreymer
130 1 Arthur Kreymer
REX will verify that the group id name is the same on Fermigrid nodes and in the Lab GID registry, at
131 1 Arthur Kreymer
http://www-giduid.fnal.gov/cd/FUE/uidgid/gid_id.lis
132 1 Arthur Kreymer
133 17 Arthur Kreymer
WARNING - if the group name contains a special character like -,
134 17 Arthur Kreymer
that group name will not be used on Fermigrid worker nodes.
135 17 Arthur Kreymer
You need to find out what the group name is there,
136 17 Arthur Kreymer
and make a symlink in /grid/data for compatibility
137 17 Arthur Kreymer
138 17 Arthur Kreymer
GRID=<name of Fermigrid group>
139 17 Arthur Kreymer
140 17 Arthur Kreymer
ln -s /grid/data/${GROU} /grid/data/${GRID}
141 17 Arthur Kreymer
142 17 Arthur Kreymer
143 17 Arthur Kreymer
    CREATE LOGIN AREA
144 17 Arthur Kreymer
145 17 Arthur Kreymer
    ifmon@gpsn01
146 1 Arthur Kreymer
147 14 Arthur Kreymer
GROU=<name of the group>
148 1 Arthur Kreymer
sg ${GROU}
149 1 Arthur Kreymer
150 1 Arthur Kreymer
. /grid/fermiapp/product/common/etc/setups.sh
151 1 Arthur Kreymer
ups tailor cpn          - will echo the commands proposed 
152 1 Arthur Kreymer
ups tailor cpn -O write - will execute the commands
153 1 Arthur Kreymer
154 1 Arthur Kreymer
cd /grid/data/${GROU}/LOCK
155 1 Arthur Kreymer
chmod 777 DO LOCKS LOG QUEUE
156 14 Arthur Kreymer
157 14 Arthur Kreymer
   Update gpsn01:ifmon:crontab.gpsn01
158 14 Arthur Kreymer
   adding an entry for the new group
159 14 Arthur Kreymer
   and start the lockclean process manually
160 14 Arthur Kreymer
161 15 Arthur Kreymer
crontab crontab.gpsn01
162 15 Arthur Kreymer
163 16 Arthur Kreymer
set nohup ;  sg ${GROU} -c /grid/fermiapp/common/tools/lockclean &
164 14 Arthur Kreymer
165 14 Arthur Kreymer
    root@if-admin-minos
166 14 Arthur Kreymer
167 14 Arthur Kreymer
reset ownership of /grid/data/$GROU} if desired
168 14 Arthur Kreymer
169 14 Arthur Kreymer
    Test the locks with a normal user
170 14 Arthur Kreymer
171 14 Arthur Kreymer
GROU=<name of the group>
172 14 Arthur Kreymer
173 14 Arthur Kreymer
. /grid/fermiapp/products/common/etc/setups.sh
174 14 Arthur Kreymer
175 14 Arthur Kreymer
setup cpn
176 14 Arthur Kreymer
177 14 Arthur Kreymer
export CPN_LOCK_GROUP=${GROU}
178 14 Arthur Kreymer
179 14 Arthur Kreymer
lock
180 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:07 UTC 2014 lock  /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:07.0.minos27.7436.kreymer.kreymer
181 14 Arthur Kreymer
182 14 Arthur Kreymer
lock free
183 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:09 UTC 2014 freed /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:07.0.minos27.7436.kreymer.kreymer
184 14 Arthur Kreymer
185 14 Arthur Kreymer
cpn /usr/bin/crash /dev/null
186 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:44 UTC 2014 lock  /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:44.1.minos27.30874.kreymer.kreymer
187 14 Arthur Kreymer
LOCK - Wed Jan 22 16:37:44 UTC 2014 freed /grid/data/t-1034/LOCK/LOCKS/20140122.16:37:44.1.minos27.30874.kreymer.kreymer
188 14 Arthur Kreymer
189 14 Arthur Kreymer
</pre>