Project

General

Profile

ADMINISTRATION » History » Version 13

Arthur Kreymer, 01/21/2014 04:46 PM

1 1 Arthur Kreymer
h1. ADMINISTRATION
2 1 Arthur Kreymer
3 2 Arthur Kreymer
h2. FILES
4 2 Arthur Kreymer
5 1 Arthur Kreymer
Working directories and files are under /grid/data/${GROUP}/LOCK
6 1 Arthur Kreymer
7 1 Arthur Kreymer
On the grid, the username does not reflect the identity of the
8 1 Arthur Kreymer
person who submitted the job.
9 1 Arthur Kreymer
So the lock script gets the identity from the grid proxy.
10 1 Arthur Kreymer
11 1 Arthur Kreymer
/LOCKS - active lock files
12 1 Arthur Kreymer
  The lock files are empty, with names contining
13 1 Arthur Kreymer
  date, time queued,               host, pid, user, identity
14 1 Arthur Kreymer
15 1 Arthur Kreymer
/QUEUE - locks pending, empty files containing
16 1 Arthur Kreymer
  date,                            host, pid, user, identity
17 1 Arthur Kreymer
18 1 Arthur Kreymer
/LOG   - empty files with names reflecting completed locks
19 1 Arthur Kreymer
   date, time queued, time locked, host, pid,  user, identity
20 1 Arthur Kreymer
21 1 Arthur Kreymer
/LOGS   - monthly text summaries built from LOG file names.
22 1 Arthur Kreymer
23 1 Arthur Kreymer
/STALE - record of locks that have timed out
24 1 Arthur Kreymer
25 1 Arthur Kreymer
glimit - global activity limit, including all user groups
26 1 Arthur Kreymer
         set this near the actual Bluearc capacity
27 1 Arthur Kreymer
         this is not implemented as of 2012-11-06
28 1 Arthur Kreymer
29 1 Arthur Kreymer
limit  - local activity limit, for the users' own group
30 1 Arthur Kreymer
             set this well under Bluearc capacity
31 1 Arthur Kreymer
32 1 Arthur Kreymer
perf   - performance MB/sec required in PERF before locking
33 1 Arthur Kreymer
34 1 Arthur Kreymer
PERF   - actual MB/sec performance, measured by external agent
35 1 Arthur Kreymer
            ( No agents implemented as of 2010-08-02 )
36 1 Arthur Kreymer
37 1 Arthur Kreymer
rate   - net retry rate target, in retries per second
38 1 Arthur Kreymer
39 1 Arthur Kreymer
small  - MBytes: files smaller than this are not locked by cpn.
40 1 Arthur Kreymer
41 1 Arthur Kreymer
wait   - mininum time to wait before retrying, regardless of the load.
42 1 Arthur Kreymer
         the time delay before retrying a lock is the minimum of
43 1 Arthur Kreymer
* wait
44 1 Arthur Kreymer
* (number of queued locks)/rate
45 1 Arthur Kreymer
46 4 Arthur Kreymer
h2. MAINTENANCE
47 1 Arthur Kreymer
48 1 Arthur Kreymer
lock files should be owned by some appropriate group account, like mindata.
49 1 Arthur Kreymer
50 1 Arthur Kreymer
That account should occasionally remove expired locks and queue entries,
51 1 Arthur Kreymer
and concatenate  LOG entries into monthly summary files.
52 1 Arthur Kreymer
53 4 Arthur Kreymer
You can run the lockclean script manually, which will do this hourly :
54 4 Arthur Kreymer
But be careful, interactive logins on gpsn01 are in group gpcf.
55 4 Arthur Kreymer
Use 'sg' to set the proper group first
56 4 Arthur Kreymer
<pre>
57 9 Arthur Kreymer
    GRP=<mygroup>
58 9 Arthur Kreymer
    set nohup ; sg ${GRP} -c /grid/fermiapp/common/tools/lockclean &
59 4 Arthur Kreymer
</pre>
60 4 Arthur Kreymer
There should be a crontab entry for each account like
61 4 Arthur Kreymer
<pre>
62 9 Arthur Kreymer
@reboot sg ${GRP} -c /grid/fermiapp/common/tools/lockclean
63 4 Arthur Kreymer
</pre>
64 4 Arthur Kreymer
65 11 Arthur Kreymer
| Group     | Account@Host            | crontab |
66 12 Arthur Kreymer
| coupp     | ifmon@gpsn01            | @reboot sleep 300 ; sg coupp -c /grid/fermiapp/common/tools/lockclean |
67 12 Arthur Kreymer
| des       | desdata@gpsn01          | @reboot sleep 300 ; sg des   -c /grid/fermiapp/common/tools/lockclean |
68 12 Arthur Kreymer
| e875      | mindata@minos27         | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
69 12 Arthur Kreymer
| e898      | ifmon@gpsn01            | @reboot sleep 300 ; sg e898  -c /grid/fermiapp/common/tools/lockclean |
70 12 Arthur Kreymer
| e938      | minervadat@if02         | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
71 12 Arthur Kreymer
| gm2       | gm2dat@gpsn01           | @reboot sleep 300 ; sg gm2   -c /grid/fermiapp/common/tools/lockclean |
72 12 Arthur Kreymer
| gpcf      | ifmon@gpsn01            | @reboot sleep 300 ; sg gpcf  -c /grid/fermiapp/common/tools/lockclean |
73 12 Arthur Kreymer
| lbne      | lbnedata@lbnegpvm01     | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
74 12 Arthur Kreymer
| marslbne  | marslbne@lbnegpvm01     | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
75 12 Arthur Kreymer
| marsmu2e  | marsmu2e@detsim         | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
76 12 Arthur Kreymer
| mu2e      | mu2e@mu2egpvm01         | @reboot sleep 300 ;             /grid/fermiapp/common/tools/lockclean |
77 11 Arthur Kreymer
| mu2epro   | mu2epro@mu2egpvm01      | RETIRED FEB 4 2013 |
78 12 Arthur Kreymer
| t-962     | argoneut@argoneutgpvm01 | @reboot sleep 300;              /grid/fermiapp/common/tools/lockclean |
79 12 Arthur Kreymer
| uboone    | uboone@uboonegpvm01     | @reboot sleep 300;              /grid/fermiapp/common/tools/lockclean |
80 12 Arthur Kreymer
| nova      | novadata@gpcf028        | @reboot sleep 300;              /grid/fermiapp/common/tools/lockclean |
81 4 Arthur Kreymer
82 4 Arthur Kreymer
h1. USAGE
83 1 Arthur Kreymer
84 1 Arthur Kreymer
Get an idea of activity by counting lines in log files.
85 1 Arthur Kreymer
86 1 Arthur Kreymer
For example, for Minos, 
87 1 Arthur Kreymer
88 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/LOGS/*.log
89 1 Arthur Kreymer
    9124 /grid/data/e875/LOCK/LOGS/200908.log
90 1 Arthur Kreymer
  140794 /grid/data/e875/LOCK/LOGS/200909.log
91 1 Arthur Kreymer
  181895 /grid/data/e875/LOCK/LOGS/200910.log
92 1 Arthur Kreymer
  196327 /grid/data/e875/LOCK/LOGS/200911.log
93 1 Arthur Kreymer
  125084 /grid/data/e875/LOCK/LOGS/200912.log
94 1 Arthur Kreymer
  272598 /grid/data/e875/LOCK/LOGS/201001.log
95 1 Arthur Kreymer
  284000 /grid/data/e875/LOCK/LOGS/201002.log
96 1 Arthur Kreymer
  275479 /grid/data/e875/LOCK/LOGS/201003.log
97 1 Arthur Kreymer
  354725 /grid/data/e875/LOCK/LOGS/201004.log
98 1 Arthur Kreymer
 1840026 total
99 1 Arthur Kreymer
100 1 Arthur Kreymer
  
101 1 Arthur Kreymer
  $ wc -l /grid/data/e875/LOCK/STALE/LOCKS/*.log
102 1 Arthur Kreymer
$ wc -l /grid/data/e875/LOCK/STALE/QUEUE/*.log
103 2 Arthur Kreymer
104 2 Arthur Kreymer
h2. INITIALIZATION
105 2 Arthur Kreymer
106 13 Arthur Kreymer
New LOCK areas should be created by the ifmon account.
107 13 Arthur Kreymer
We can use root@if-admin-minos to create this area.
108 13 Arthur Kreymer
109 1 Arthur Kreymer
<pre>
110 13 Arthur Kreymer
    ifmon@gpsn01
111 1 Arthur Kreymer
112 13 Arthur Kreymer
GROU=<name of the group>
113 13 Arthur Kreymer
DATA=${GROU}  # directory to be locked, usually ${GROU}/data, but not always
114 13 Arthur Kreymer
115 13 Arthur Kreymer
Determine what the gid is for the area.
116 13 Arthur Kreymer
stat /${DATA}/data | grep Gid
117 13 Arthur Kreymer
118 13 Arthur Kreymer
    root@if-admin-minos
119 13 Arthur Kreymer
120 13 Arthur Kreymer
IFUID=45438  # ifmon UID
121 13 Arthur Kreymer
GRGID=....   # group GID from /${DATA}/data
122 13 Arthur Kreymer
GROU=<name of the group>
123 13 Arthur Kreymer
124 13 Arthur Kreymer
mkdir                   /grid/data/$GROU}/LOCK
125 13 Arthur Kreymer
chown ${IFUID}.${GRGID} /grid/data/$GROU}/LOCK
126 13 Arthur Kreymer
127 13 Arthur Kreymer
REX will verify that the group id name is the same on Fermigrid nodes and in the Lab GID registry, at
128 3 Arthur Kreymer
http://www-giduid.fnal.gov/cd/FUE/uidgid/gid_id.lis
129 2 Arthur Kreymer
130 13 Arthur Kreymer
REX will then log in to ifmon@gpsn01
131 2 Arthur Kreymer
and use 'ups tailor cpn' to create the default files.
132 3 Arthur Kreymer
133 13 Arthur Kreymer
GROU=t-1034
134 13 Arthur Kreymer
sg ${GROU}
135 13 Arthur Kreymer
136 13 Arthur Kreymer
. /grid/fermiapp/product/common/etc/setups.sh
137 13 Arthur Kreymer
ups tailor cpn          - will echo the commands proposed 
138 3 Arthur Kreymer
ups tailor cpn -O write - will execute the commands
139 13 Arthur Kreymer
140 13 Arthur Kreymer
cd /grid/data/${GROU}/LOCK
141 13 Arthur Kreymer
chmod 777 DO LOCKS LOG QUEUE