BOSCO submission setup » History » Version 16
Farrukh Khan, 09/27/2017 04:40 PM
1 | 1 | Farrukh Khan | h1. BOSCO submission setup |
---|---|---|---|
2 | 2 | Farrukh Khan | |
3 | 2 | Farrukh Khan | The wiki contains step by step instructions on how to install and configure BOSCO submission from a glideinWMS factory |
4 | 2 | Farrukh Khan | |
5 | 2 | Farrukh Khan | {{toc}} |
6 | 2 | Farrukh Khan | |
7 | 3 | Farrukh Khan | h2. Terminology |
8 | 3 | Farrukh Khan | |
9 | 3 | Farrukh Khan | | *Term* | *Description* | |
10 | 3 | Farrukh Khan | | BOSCO_HOST | This is the remote login node from which glideins will be submitted to the local batch queue. For example, in the instructions below BOSCO_HOST is 'cori.nersc.gov' | |
11 | 3 | Farrukh Khan | | FACTORY_HOST | This is the node where glideinWMS factory service has been installed and configured to run. For example, in the instructions below FACTORY_HOST is 'fermifactory01.fnal.gov' | |
12 | 3 | Farrukh Khan | | FRONTEND_HOST | This is the node where glideinWMS frontend service has been installed and configured to run. For example, in the instructions below FRONTEND_HOST is 'cmssrv279.fnal.gov' | |
13 | 3 | Farrukh Khan | |
14 | 2 | Farrukh Khan | h2. Vanilla installation |
15 | 1 | Farrukh Khan | |
16 | 6 | Farrukh Khan | # Log into any node from where you can SSH into BOSCO_HOST. It is strongly recommended that the architecture and operating system of the host where you set BOSCO up from is similar to the BOSCO_HOST as you may need to copy some libraries over if needed. In the instructions below, the host being used to setup BOSCO is 'lxplus030.cern.ch'. <pre> |
17 | 4 | Farrukh Khan | [fkhan@dhcp-131-225-82-129 ~]$ ssh fakhan@lxplus030.cern.ch |
18 | 4 | Farrukh Khan | Password: |
19 | 4 | Farrukh Khan | Last login: Tue Sep 19 22:44:19 2017 from dhcp-131-225-82-129.dhcp.fnal.gov |
20 | 4 | Farrukh Khan | * ******************************************************************** |
21 | 4 | Farrukh Khan | * Welcome to lxplus030.cern.ch, SLC, 6.9 |
22 | 4 | Farrukh Khan | * Archive of news is available in /etc/motd-archive |
23 | 4 | Farrukh Khan | * Reminder: You have agreed to comply with the CERN computing rules |
24 | 4 | Farrukh Khan | * https://cern.ch/ComputingRules |
25 | 4 | Farrukh Khan | * Puppet environment: production, Roger state: production |
26 | 4 | Farrukh Khan | * Foreman hostgroup: lxplus/nodes/login |
27 | 4 | Farrukh Khan | * LXPLUS Public Login Service |
28 | 4 | Farrukh Khan | * ******************************************************************** |
29 | 1 | Farrukh Khan | [fakhan@lxplus030 ~]$ |
30 | 5 | Farrukh Khan | </pre> |
31 | 6 | Farrukh Khan | # Take a look at the FTP server at UW Madison hosting different BOSCO versions "here":ftp://ftp.cs.wisc.edu/condor/bosco/. Select the appropriate version to download and wget the relevant boscoinstaller.tar.gz file. For example for version 1.2.10, fetch the installer as follows:<pre> |
32 | 5 | Farrukh Khan | [fakhan@lxplus030 ~]$ wget ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz |
33 | 5 | Farrukh Khan | --2017-09-19 22:29:10-- ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz |
34 | 5 | Farrukh Khan | => “boscoinstaller.tar.gz” |
35 | 5 | Farrukh Khan | Resolving ftp.cs.wisc.edu... 128.105.2.31 |
36 | 5 | Farrukh Khan | Connecting to ftp.cs.wisc.edu|128.105.2.31|:21... connected. |
37 | 5 | Farrukh Khan | Logging in as anonymous ... Logged in! |
38 | 5 | Farrukh Khan | ==> SYST ... done. ==> PWD ... done. |
39 | 5 | Farrukh Khan | ==> TYPE I ... done. ==> CWD (1) /condor/bosco/1.2.10 ... done. |
40 | 5 | Farrukh Khan | ==> SIZE boscoinstaller.tar.gz ... 20480 |
41 | 5 | Farrukh Khan | ==> PASV ... done. ==> RETR boscoinstaller.tar.gz ... done. |
42 | 1 | Farrukh Khan | Length: 20480 (20K) (unauthoritative) |
43 | 5 | Farrukh Khan | |
44 | 5 | Farrukh Khan | 100%[==================================================================================================================>] 20,480 --.-K/s in 0.1s |
45 | 5 | Farrukh Khan | |
46 | 5 | Farrukh Khan | 2017-09-19 22:29:12 (182 KB/s) - “boscoinstaller.tar.gz” saved [20480] |
47 | 5 | Farrukh Khan | </pre> |
48 | 6 | Farrukh Khan | # Untar the downloaded installer and run it to install BOSCO on the current machine. For example:<pre> |
49 | 5 | Farrukh Khan | [fakhan@lxplus030 ~]$ tar -xvf boscoinstaller.tar.gz |
50 | 5 | Farrukh Khan | boscoinstaller |
51 | 5 | Farrukh Khan | |
52 | 5 | Farrukh Khan | [fakhan@lxplus030 ~]$ ./boscoinstaller |
53 | 5 | Farrukh Khan | Downloading BOSCO from ftp://ftp.cs.wisc.edu/condor/bosco/1.2/bosco-1.2-x86_64_RedHat6.tar.gz |
54 | 5 | Farrukh Khan | Installing BOSCO in ~/bosco |
55 | 5 | Farrukh Khan | Installing Condor from /tmp/fakhan/tmpbEI905/condor-8.6.6-x86_64_RedHat6-stripped to /afs/cern.ch/user/f/fakhan/bosco |
56 | 5 | Farrukh Khan | |
57 | 5 | Farrukh Khan | Condor has been installed into: |
58 | 5 | Farrukh Khan | /afs/cern.ch/user/f/fakhan/bosco |
59 | 5 | Farrukh Khan | |
60 | 5 | Farrukh Khan | Configured condor using these configuration files: |
61 | 5 | Farrukh Khan | global: /afs/cern.ch/user/f/fakhan/bosco/etc/condor_config |
62 | 5 | Farrukh Khan | local: /afs/cern.ch/user/f/fakhan/bosco/local.bosco/condor_config.local |
63 | 5 | Farrukh Khan | |
64 | 5 | Farrukh Khan | In order for Condor to work properly you must set your CONDOR_CONFIG |
65 | 5 | Farrukh Khan | environment variable to point to your Condor configuration file: |
66 | 1 | Farrukh Khan | /afs/cern.ch/user/f/fakhan/bosco/etc/condor_config before running Condor |
67 | 5 | Farrukh Khan | commands/daemons. |
68 | 5 | Farrukh Khan | Created a script you can source to setup your Condor environment |
69 | 5 | Farrukh Khan | variables. This command must be run each time you log in or may |
70 | 5 | Farrukh Khan | be placed in your login scripts: |
71 | 5 | Farrukh Khan | source /afs/cern.ch/user/f/fakhan/bosco/bosco_setenv |
72 | 5 | Farrukh Khan | |
73 | 5 | Farrukh Khan | Congratulations, you installed BOSCO succesfully! |
74 | 5 | Farrukh Khan | </pre> |
75 | 7 | Farrukh Khan | # Create a _.bosco_ directory. For exmaple, <pre> |
76 | 7 | Farrukh Khan | [fakhan@lxplus030 ~]$ mkdir ~/.bosco |
77 | 7 | Farrukh Khan | </pre> |
78 | 7 | Farrukh Khan | # If you do not have an existing key pair to access BOSCO_HOST (in our case, _cori.nersc.gov_), generate a passwordless rsa key. Just press enter twice with no password when it prompts for one. Note it is important to name the key bosco_key.rsa: <pre> |
79 | 7 | Farrukh Khan | $ ssh-keygen -t rsa -f ~/.ssh/bosco_key.rsa |
80 | 7 | Farrukh Khan | </pre> *If you already have a key pair, there is no need to generate a new one* |
81 | 7 | Farrukh Khan | # If you do have an existing key pair to access BOSCO_HOST (in our case, _cori.nersc.gov_), copy it to your ssh directory and name it bosco_key. For example, your ~/.ssh/ directory should resemble this: <pre> |
82 | 7 | Farrukh Khan | [fakhan@lxplus030 ~]$ ls -al ~/.ssh/ |
83 | 7 | Farrukh Khan | total 99 |
84 | 7 | Farrukh Khan | drwx------. 3 fakhan def-cg 2048 Sep 13 19:59 . |
85 | 7 | Farrukh Khan | drwxr-xr-x. 17 fakhan def-cg 4096 Sep 19 22:32 .. |
86 | 7 | Farrukh Khan | -rw-------. 1 fakhan zh 1671 Sep 12 00:38 bosco_key.rsa |
87 | 7 | Farrukh Khan | -rw-------. 1 fakhan zh 405 Sep 12 00:42 bosco_key.rsa.pub |
88 | 7 | Farrukh Khan | -rw-------. 1 fakhan zh 1743 Feb 1 2017 id_rsa |
89 | 7 | Farrukh Khan | -rw-r--r--. 1 fakhan zh 408 Feb 1 2017 id_rsa.pub |
90 | 7 | Farrukh Khan | -rw-r--r--. 1 fakhan def-cg 83355 Sep 18 19:17 known_hosts |
91 | 7 | Farrukh Khan | </pre> |
92 | 7 | Farrukh Khan | # Source the bosco environment temporarily. |
93 | 7 | Farrukh Khan | <pre> |
94 | 7 | Farrukh Khan | [fakhan@lxplus055 ~]$ source ~/bosco/bosco_setenv |
95 | 7 | Farrukh Khan | </pre> |
96 | 8 | Farrukh Khan | # Start bosco on the host. |
97 | 8 | Farrukh Khan | <pre> |
98 | 8 | Farrukh Khan | [fakhan@lxplus055 ~]$ bosco_start |
99 | 8 | Farrukh Khan | BOSCO Started |
100 | 8 | Farrukh Khan | </pre> |
101 | 8 | Farrukh Khan | # Now add the BOSCO_HOST as a cluster you would like to submit to. You need to know the platform and the batch system of the BOSCO_HOST. In our example, BOSCO_HOST is cori.nersc.gov and it runs a variant of RH6 with Slurm. The eventual command will be: |
102 | 8 | Farrukh Khan | <pre> |
103 | 8 | Farrukh Khan | [fakhan@lxplus055 ~]$ bosco_cluster --platform RH6 --add timm@cori.nersc.gov slurm |
104 | 8 | Farrukh Khan | Enter the password to copy the ssh keys to timm@cori.nersc.gov: |
105 | 8 | Farrukh Khan | ***************************************************************** |
106 | 8 | Farrukh Khan | * * |
107 | 8 | Farrukh Khan | * NOTICE TO USERS * |
108 | 8 | Farrukh Khan | * --------------- * |
109 | 8 | Farrukh Khan | * * |
110 | 8 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
111 | 8 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
112 | 8 | Farrukh Khan | * Energy. This computer system is the property of the United * |
113 | 8 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
114 | 8 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
115 | 8 | Farrukh Khan | * expectation of privacy.* * |
116 | 8 | Farrukh Khan | * * |
117 | 8 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
118 | 8 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
119 | 8 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
120 | 8 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
121 | 8 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
122 | 8 | Farrukh Khan | * this system, the user consents to such interception, * |
123 | 8 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
124 | 8 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
125 | 8 | Farrukh Khan | * Department of Energy personnel.* * |
126 | 8 | Farrukh Khan | * * |
127 | 8 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
128 | 8 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
129 | 8 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
130 | 8 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
131 | 8 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
132 | 8 | Farrukh Khan | * conditions stated in this warning._* * |
133 | 8 | Farrukh Khan | * * |
134 | 8 | Farrukh Khan | ***************************************************************** |
135 | 8 | Farrukh Khan | Password: |
136 | 8 | Farrukh Khan | ***************************************************************** |
137 | 8 | Farrukh Khan | * * |
138 | 8 | Farrukh Khan | * NOTICE TO USERS * |
139 | 8 | Farrukh Khan | * --------------- * |
140 | 8 | Farrukh Khan | * * |
141 | 8 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
142 | 8 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
143 | 8 | Farrukh Khan | * Energy. This computer system is the property of the United * |
144 | 8 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
145 | 8 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
146 | 8 | Farrukh Khan | * expectation of privacy.* * |
147 | 8 | Farrukh Khan | * * |
148 | 8 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
149 | 8 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
150 | 8 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
151 | 8 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
152 | 8 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
153 | 8 | Farrukh Khan | * this system, the user consents to such interception, * |
154 | 8 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
155 | 8 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
156 | 8 | Farrukh Khan | * Department of Energy personnel.* * |
157 | 8 | Farrukh Khan | * * |
158 | 8 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
159 | 8 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
160 | 8 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
161 | 8 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
162 | 8 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
163 | 8 | Farrukh Khan | * conditions stated in this warning._* * |
164 | 8 | Farrukh Khan | * * |
165 | 8 | Farrukh Khan | ***************************************************************** |
166 | 8 | Farrukh Khan | ***************************************************************** |
167 | 8 | Farrukh Khan | * * |
168 | 8 | Farrukh Khan | * NOTICE TO USERS * |
169 | 8 | Farrukh Khan | * --------------- * |
170 | 8 | Farrukh Khan | * * |
171 | 8 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
172 | 8 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
173 | 8 | Farrukh Khan | * Energy. This computer system is the property of the United * |
174 | 8 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
175 | 8 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
176 | 8 | Farrukh Khan | * expectation of privacy.* * |
177 | 8 | Farrukh Khan | * * |
178 | 8 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
179 | 8 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
180 | 8 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
181 | 8 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
182 | 8 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
183 | 8 | Farrukh Khan | * this system, the user consents to such interception, * |
184 | 8 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
185 | 8 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
186 | 8 | Farrukh Khan | * Department of Energy personnel.* * |
187 | 8 | Farrukh Khan | * * |
188 | 8 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
189 | 8 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
190 | 8 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
191 | 8 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
192 | 8 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
193 | 8 | Farrukh Khan | * conditions stated in this warning._* * |
194 | 8 | Farrukh Khan | * * |
195 | 8 | Farrukh Khan | ***************************************************************** |
196 | 8 | Farrukh Khan | ***************************************************************** |
197 | 8 | Farrukh Khan | * * |
198 | 8 | Farrukh Khan | * NOTICE TO USERS * |
199 | 8 | Farrukh Khan | * --------------- * |
200 | 8 | Farrukh Khan | * * |
201 | 8 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
202 | 8 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
203 | 8 | Farrukh Khan | * Energy. This computer system is the property of the United * |
204 | 8 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
205 | 8 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
206 | 8 | Farrukh Khan | * expectation of privacy.* * |
207 | 8 | Farrukh Khan | * * |
208 | 8 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
209 | 8 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
210 | 8 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
211 | 8 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
212 | 8 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
213 | 8 | Farrukh Khan | * this system, the user consents to such interception, * |
214 | 8 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
215 | 8 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
216 | 8 | Farrukh Khan | * Department of Energy personnel.* * |
217 | 8 | Farrukh Khan | * * |
218 | 8 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
219 | 8 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
220 | 8 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
221 | 8 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
222 | 8 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
223 | 8 | Farrukh Khan | * conditions stated in this warning._* * |
224 | 8 | Farrukh Khan | * * |
225 | 8 | Farrukh Khan | ***************************************************************** |
226 | 8 | Farrukh Khan | Downloading for timm@cori.nersc.gov....... |
227 | 8 | Farrukh Khan | Unpacking.. |
228 | 8 | Farrukh Khan | Sending libraries to timm@cori.nersc.gov. |
229 | 8 | Farrukh Khan | Creating BOSCO for the WN's............................................ |
230 | 8 | Farrukh Khan | Installing on cluster timm@cori.nersc.gov...... |
231 | 8 | Farrukh Khan | Installation complete |
232 | 8 | Farrukh Khan | The cluster timm@cori.nersc.gov has been added to BOSCO |
233 | 8 | Farrukh Khan | It is available to run jobs submitted with the following values: |
234 | 8 | Farrukh Khan | > universe = grid |
235 | 8 | Farrukh Khan | > grid_resource = batch slurm timm@cori.nersc.gov |
236 | 8 | Farrukh Khan | </pre> This command will prompt you for a password. Please note that the above command might take a bit of time since it copies Bosco binaries over to the BOSCO_HOST. Do not panic and wait for command to return. |
237 | 8 | Farrukh Khan | # Log onto the BOSCO_HOST and check for the 'bosco' directory. For example, |
238 | 8 | Farrukh Khan | <pre> |
239 | 8 | Farrukh Khan | [fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov |
240 | 7 | Farrukh Khan | |
241 | 8 | Farrukh Khan | timm@cori07:~> ls -al bosco |
242 | 8 | Farrukh Khan | total 8 |
243 | 8 | Farrukh Khan | drwxr-xr-x 5 timm timm 512 Sep 19 13:58 . |
244 | 8 | Farrukh Khan | drwx--x--x 22 timm timm 4096 Sep 19 14:05 .. |
245 | 8 | Farrukh Khan | drwxr-xr-x 2 timm timm 512 Sep 19 13:58 campus_factory |
246 | 8 | Farrukh Khan | drwxr-xr-x 7 timm timm 512 Sep 19 13:57 glite |
247 | 8 | Farrukh Khan | drwxr-xr-x 2 timm timm 512 Sep 19 13:57 sandbox |
248 | 8 | Farrukh Khan | </pre> |
249 | 9 | Farrukh Khan | # Touch a new file inside the bosco directory with information about the version and deployment date. This is not necessarily needed but is helpful to track thing. For example, |
250 | 9 | Farrukh Khan | <pre> |
251 | 9 | Farrukh Khan | timm@cori07:~/bosco> touch ~/bosco/version_info |
252 | 9 | Farrukh Khan | timm@cori07:~/bosco> echo "bosco: 1.2.10" >> ~/bosco/version_info |
253 | 9 | Farrukh Khan | timm@cori07:~/bosco> echo "condor:8.6.6" >> ~/bosco/version_info |
254 | 9 | Farrukh Khan | timm@cori07:~/bosco> echo "deployed: Sep. 19, 2017" >> ~/bosco/version_info |
255 | 9 | Farrukh Khan | timm@cori09:~/bosco> cat ~/bosco/version_info |
256 | 9 | Farrukh Khan | bosco: 1.2.10 |
257 | 9 | Farrukh Khan | condor:8.6.6 |
258 | 9 | Farrukh Khan | deployed: Sep. 19, 2017 |
259 | 9 | Farrukh Khan | </pre> |
260 | 9 | Farrukh Khan | # The above steps should setup a clean install of bosco. For additional NERSC specific changes, please follow the instructions in the next section. |
261 | 4 | Farrukh Khan | |
262 | 13 | Farrukh Khan | h3. NERSC site specific configuration instructions |
263 | 10 | Farrukh Khan | |
264 | 14 | Farrukh Khan | These instructions assume that you have followed the instructions in the previous section and have a vanilla installation of bosco already in place. Please follow the additional steps below for NERSC: |
265 | 9 | Farrukh Khan | # Vanilla bosco install doesn't have libcrypto.so.10 and libssl.so.10. These two libraries need to be copied over from any SL6/RH6/CC6 64 bit machine. You can use the commands below to identify the relevant library files to copy: |
266 | 9 | Farrukh Khan | <pre> |
267 | 9 | Farrukh Khan | [fakhan@lxplus055 ~]$ ldconfig -p | grep "libcrypto.so.10" |
268 | 9 | Farrukh Khan | libcrypto.so.10 (libc6,x86-64) => /usr/lib64/libcrypto.so.10 |
269 | 9 | Farrukh Khan | libcrypto.so.10 (libc6) => /usr/lib/libcrypto.so.10 |
270 | 9 | Farrukh Khan | [fakhan@lxplus055 ~]$ ldconfig -p | grep "libssl.so.10" |
271 | 9 | Farrukh Khan | libssl.so.10 (libc6,x86-64) => /usr/lib64/libssl.so.10 |
272 | 9 | Farrukh Khan | libssl.so.10 (libc6) => /usr/lib/libssl.so.10 |
273 | 9 | Farrukh Khan | </pre> |
274 | 9 | Farrukh Khan | # Copy the files over to ~/bosco/glite/lib/ on cori.nersc.gov: |
275 | 9 | Farrukh Khan | <pre> |
276 | 9 | Farrukh Khan | [fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libssl.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/ |
277 | 9 | Farrukh Khan | ***************************************************************** |
278 | 9 | Farrukh Khan | * * |
279 | 9 | Farrukh Khan | * NOTICE TO USERS * |
280 | 9 | Farrukh Khan | * --------------- * |
281 | 9 | Farrukh Khan | * * |
282 | 9 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
283 | 9 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
284 | 9 | Farrukh Khan | * Energy. This computer system is the property of the United * |
285 | 9 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
286 | 9 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
287 | 9 | Farrukh Khan | * expectation of privacy.* * |
288 | 9 | Farrukh Khan | * * |
289 | 9 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
290 | 9 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
291 | 9 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
292 | 9 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
293 | 9 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
294 | 9 | Farrukh Khan | * this system, the user consents to such interception, * |
295 | 9 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
296 | 9 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
297 | 9 | Farrukh Khan | * Department of Energy personnel.* * |
298 | 9 | Farrukh Khan | * * |
299 | 9 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
300 | 9 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
301 | 9 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
302 | 9 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
303 | 9 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
304 | 9 | Farrukh Khan | * conditions stated in this warning._* * |
305 | 9 | Farrukh Khan | * * |
306 | 9 | Farrukh Khan | ***************************************************************** |
307 | 9 | Farrukh Khan | libssl.so.10 100% 433KB 433.0KB/s 00:01 |
308 | 9 | Farrukh Khan | [fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libcrypto.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/ |
309 | 9 | Farrukh Khan | ***************************************************************** |
310 | 9 | Farrukh Khan | * * |
311 | 9 | Farrukh Khan | * NOTICE TO USERS * |
312 | 9 | Farrukh Khan | * --------------- * |
313 | 9 | Farrukh Khan | * * |
314 | 9 | Farrukh Khan | * Lawrence Berkeley National Laboratory operates this * |
315 | 9 | Farrukh Khan | * computer system under contract to the U.S. Department of * |
316 | 9 | Farrukh Khan | * Energy. This computer system is the property of the United * |
317 | 9 | Farrukh Khan | * States Government and is for authorized use only. *Users * |
318 | 9 | Farrukh Khan | * (authorized or unauthorized) have no explicit or implicit * |
319 | 9 | Farrukh Khan | * expectation of privacy.* * |
320 | 9 | Farrukh Khan | * * |
321 | 9 | Farrukh Khan | * Any or all uses of this system and all files on this system * |
322 | 9 | Farrukh Khan | * may be intercepted, monitored, recorded, copied, audited, * |
323 | 9 | Farrukh Khan | * inspected, and disclosed to site, Department of Energy, and * |
324 | 9 | Farrukh Khan | * law enforcement personnel, as well as authorized officials * |
325 | 9 | Farrukh Khan | * of other agencies, both domestic and foreign. *By using * |
326 | 9 | Farrukh Khan | * this system, the user consents to such interception, * |
327 | 9 | Farrukh Khan | * monitoring, recording, copying, auditing, inspection, and * |
328 | 9 | Farrukh Khan | * disclosure at the discretion of authorized site or * |
329 | 9 | Farrukh Khan | * Department of Energy personnel.* * |
330 | 9 | Farrukh Khan | * * |
331 | 9 | Farrukh Khan | * *Unauthorized or improper use of this system may result in * |
332 | 9 | Farrukh Khan | * administrative disciplinary action and civil and criminal * |
333 | 9 | Farrukh Khan | * penalties. _By continuing to use this system you indicate * |
334 | 9 | Farrukh Khan | * your awareness of and consent to these terms and conditions * |
335 | 9 | Farrukh Khan | * of use. LOG OFF IMMEDIATELY if you do not agree to the * |
336 | 9 | Farrukh Khan | * conditions stated in this warning._* * |
337 | 9 | Farrukh Khan | * * |
338 | 9 | Farrukh Khan | ***************************************************************** |
339 | 9 | Farrukh Khan | libcrypto.so.10 100% 1925KB 962.6KB/s 00:02 |
340 | 9 | Farrukh Khan | </pre> |
341 | 11 | Farrukh Khan | # Verify that the files have successfully been copied over: |
342 | 11 | Farrukh Khan | <pre> |
343 | 11 | Farrukh Khan | [fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov |
344 | 11 | Farrukh Khan | timm@cori11:~/bosco> ls -al ~/bosco/glite/lib/ |
345 | 11 | Farrukh Khan | total 7232 |
346 | 11 | Farrukh Khan | drwxr-xr-x 3 timm timm 512 Sep 19 14:26 . |
347 | 11 | Farrukh Khan | drwxr-xr-x 7 timm timm 512 Sep 19 13:57 .. |
348 | 11 | Farrukh Khan | drwxr-xr-x 2 timm timm 8192 Sep 11 22:46 condor |
349 | 11 | Farrukh Khan | lrwxrwxrwx 1 timm timm 15 Sep 19 13:57 libclassad.so -> libclassad.so.8 |
350 | 11 | Farrukh Khan | lrwxrwxrwx 1 timm timm 19 Sep 19 13:57 libclassad.so.8 -> libclassad.so.8.6.6 |
351 | 11 | Farrukh Khan | -rwxr-xr-x 1 timm timm 605360 Sep 11 22:46 libclassad.so.8.6.6 |
352 | 11 | Farrukh Khan | -rwxr-xr-x 1 timm timm 4358312 Sep 11 22:46 libcondor_utils_8_6_6.so |
353 | 11 | Farrukh Khan | -rwxr-xr-x 1 timm timm 1971488 Sep 19 14:26 libcrypto.so.10 |
354 | 11 | Farrukh Khan | -rwxr-xr-x 1 timm timm 443416 Sep 19 14:26 libssl.so.10 |
355 | 13 | Farrukh Khan | </pre> |
356 | 11 | Farrukh Khan | # Modify batch_gahp configuration file to add Slurm and update the blah_job_wrapper to accommodate shifter: |
357 | 11 | Farrukh Khan | <pre> |
358 | 11 | Farrukh Khan | timm@cori11:~/bosco> vim ~/bosco/glite/etc/batch_gahp.config |
359 | 11 | Farrukh Khan | </pre> On line 2, modify configuration params so they are as follows (previous supported_lrms is commented out and slurm is added): |
360 | 11 | Farrukh Khan | <pre> |
361 | 11 | Farrukh Khan | #Supported batch systems |
362 | 11 | Farrukh Khan | #supported_lrms=pbs,lsf,sge,condor |
363 | 11 | Farrukh Khan | supported_lrms=slurm |
364 | 11 | Farrukh Khan | </pre> In the same file, go to line 115. This should bring you to the Slurm specific configuration section. Add 'blah_job_wrapper' here so that the configuration file looks as follows: |
365 | 11 | Farrukh Khan | <pre> |
366 | 11 | Farrukh Khan | ## SLURM |
367 | 11 | Farrukh Khan | |
368 | 1 | Farrukh Khan | #path to the slurm executables |
369 | 1 | Farrukh Khan | slurm_binpath=`which sbatch 2>/dev/null|sed 's|/[^/]*$||'` |
370 | 1 | Farrukh Khan | |
371 | 1 | Farrukh Khan | # Needed for correct SLURM submission |
372 | 1 | Farrukh Khan | blah_job_wrapper='srun shifter' |
373 | 1 | Farrukh Khan | </pre> |
374 | 14 | Farrukh Khan | |
375 | 14 | Farrukh Khan | h3. NERSC entry specific configuration instructions |
376 | 14 | Farrukh Khan | |
377 | 14 | Farrukh Khan | These instructions assume that you have followed the instructions in the previous two sections. You should have a bosco directory with a vanilla installation and relevant dependency libraries already in place. The instructions below vary per entry. In the examples below, changes are being made for Cori KNL fullnode entry. Please modify the entry name in the examples to your entry name while following the instructions. |
378 | 14 | Farrukh Khan | |
379 | 14 | Farrukh Khan | # Make sure you are logged into BOSCO_HOST (_cori.nersc.gov_ in our case). |
380 | 14 | Farrukh Khan | # Copy the vanilla installation to another directory and name the new directory such as you can identify different entries. This is useful because it keeps the vanilla install in tact and the same install can be used for multiple entries later if needed. For example, |
381 | 14 | Farrukh Khan | <pre> |
382 | 14 | Farrukh Khan | hufnagel@cori04:~> cp -R ~/bosco ~/bosco_cori_knl_fullnode |
383 | 14 | Farrukh Khan | </pre> |
384 | 14 | Farrukh Khan | # Given that we are setting up a full node entry, we need to modify batch_gahp configuration again. *Note*: You can safely skip this if the entry isn't supposed to run fullnode pilots. |
385 | 14 | Farrukh Khan | <pre> |
386 | 14 | Farrukh Khan | hufnagel@cori04:~> vim ~/bosco_cori_knl_fullnode/glite/etc/batch_gahp.config |
387 | 14 | Farrukh Khan | </pre> Go to line 115 and update 'blah_job_wrapper' so it looks as follows: |
388 | 14 | Farrukh Khan | <pre> |
389 | 14 | Farrukh Khan | # Needed for correct SLURM submission |
390 | 14 | Farrukh Khan | blah_job_wrapper='srun --no-kill shifter' |
391 | 14 | Farrukh Khan | </pre> |
392 | 14 | Farrukh Khan | # Next, we need to update the default log and sandbox location so the pilot do not pollute the vanilla install. Update 'condor_config.ft-gahp' for this as follows: |
393 | 14 | Farrukh Khan | <pre> |
394 | 14 | Farrukh Khan | hufnagel@cori04:~> vim ~/bosco_cori_knl_fullnode/glite/etc/condor_config.ft-gahp |
395 | 14 | Farrukh Khan | </pre> Update the locations of these variables per the bosco directory of your entry. For example, in our case (Cori KNL fullnode) the directory name is 'bosco_cori_knl_fullnode' (look at step 1): |
396 | 14 | Farrukh Khan | <pre> |
397 | 14 | Farrukh Khan | BOSCO_SANDBOX_DIR=$ENV(HOME)/bosco_cori_knl_fullnode/sandbox |
398 | 14 | Farrukh Khan | LOG=$ENV(HOME)/bosco_cori_knl_fullnode/glite/log |
399 | 14 | Farrukh Khan | FT_GAHP_LOG=$(LOG)/FTGahpLog |
400 | 14 | Farrukh Khan | SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD |
401 | 14 | Farrukh Khan | SEC_PASSWORD_FILE = $ENV(HOME)/bosco_cori_knl_fullnode/glite/etc/passwdfile |
402 | 14 | Farrukh Khan | USE_SHARED_PORT = False |
403 | 14 | Farrukh Khan | ENABLE_URL_TRANSFERS = False |
404 | 14 | Farrukh Khan | </pre> |
405 | 15 | Farrukh Khan | # The final step is to edit the 'slurm_local_submit_attributes.sh' file. This file contains a list of job directives that tell Slurm different attributes of the job to run and these attributes can be specific for each entry. For example, for Cori KNL fullnode the file is as follows: |
406 | 15 | Farrukh Khan | <pre> |
407 | 15 | Farrukh Khan | hufnagel@cori01:~> cat ~/bosco_cori_knl_fullnode/glite/bin/slurm_local_submit_attributes.sh |
408 | 15 | Farrukh Khan | #!/bin/sh |
409 | 15 | Farrukh Khan | |
410 | 15 | Farrukh Khan | echo "#SBATCH --account=m2612" |
411 | 15 | Farrukh Khan | |
412 | 15 | Farrukh Khan | echo "#SBATCH --partition=regular" |
413 | 15 | Farrukh Khan | echo "#SBATCH --constraint=knl" |
414 | 15 | Farrukh Khan | |
415 | 15 | Farrukh Khan | echo "#SBATCH --qos=normal" |
416 | 15 | Farrukh Khan | |
417 | 15 | Farrukh Khan | echo "#SBATCH -N 1" |
418 | 15 | Farrukh Khan | |
419 | 15 | Farrukh Khan | echo "#SBATCH --ntasks-per-node=1" |
420 | 15 | Farrukh Khan | echo "#SBATCH --cpus-per-task=138" |
421 | 15 | Farrukh Khan | |
422 | 15 | Farrukh Khan | echo "#SBATCH --image=custom:cms_cvmfs:latest" |
423 | 15 | Farrukh Khan | echo "#SBATCH -L cscratch1" |
424 | 15 | Farrukh Khan | echo "#SBATCH --volume=\"/global/cscratch1/sd/hufnagel/SITECONF:/cvmfs/cms.cern.ch/SITECONF;/global/cscratch1/sd/hufnagel/node_cache:/tmp:perNodeCache=size=1780G\"" |
425 | 15 | Farrukh Khan | |
426 | 15 | Farrukh Khan | echo "#SBATCH -t 24:00:00" |
427 | 15 | Farrukh Khan | </pre> In English: the pilot (job from Slurm perspective) is run under account m2612, in normal KNL queue with regular partition, using one node, 24 hours of max runtime and cms_cvmfs:latest shifter image. More details about these (and many other) attributes can be found here:"http://www.nersc.gov/users/computational-systems/cori/running-jobs/batch-jobs/". You can also see the details of these flags by running 'srun --help' on cori.nersc.gov. |
428 | 13 | Farrukh Khan | |
429 | 16 | Farrukh Khan | h2. Updating an existing installation (NERSC) |
430 | 1 | Farrukh Khan | |
431 | 16 | Farrukh Khan | These instructions below assume that a working bosco setup is already in place and it only needs to be updated with a newer version. Follow the vanilla and site specific NERSC installation steps as provided under above headings to setup the vanilla install for a newer version. Skip the entry specific instructions above in case of an existing install. |
432 | 16 | Farrukh Khan | |
433 | 1 | Farrukh Khan | h3. Manual update |
434 | 16 | Farrukh Khan | # It is always a good idea to back up a working install. In the example below we are upgrading from version 1.2.9 to version 1.2.10. |
435 | 16 | Farrukh Khan | <pre> |
436 | 16 | Farrukh Khan | hufnagel@cori04:~> mv ~/bosco_cori_haswell_fullnode ~/bosco_cori_haswell_fullnode.129 |
437 | 16 | Farrukh Khan | </pre> |
438 | 16 | Farrukh Khan | # Copy the vanilla install over as the base directory for the entry: |
439 | 16 | Farrukh Khan | <pre> |
440 | 16 | Farrukh Khan | hufnagel@cori04:~> cp -R ~/bosco ~/bosco_cori_haswell_fullnode |
441 | 16 | Farrukh Khan | </pre> |
442 | 16 | Farrukh Khan | # Three main files that need to be copied over as is are: batch_gahp.config, condor_config.ft-gahp and slurm_local_submit_attributes.sh. Copy them from the older version to the newer one: |
443 | 16 | Farrukh Khan | <pre> |
444 | 16 | Farrukh Khan | hufnagel@cori04:~> cp ~/bosco_cori_haswell_fullnode.129/glite/etc/condor_config.ft-gahp ~/bosco_cori_haswell_fullnode/glite/etc/condor_config.ft-gahp |
445 | 1 | Farrukh Khan | |
446 | 16 | Farrukh Khan | hufnagel@cori04:~> cp ~/bosco_cori_haswell_fullnode.129/glite/etc/batch_gahp.config ~/bosco_cori_haswell_fullnode/glite/etc/batch_gahp.config |
447 | 16 | Farrukh Khan | |
448 | 16 | Farrukh Khan | hufnagel@cori04:~> cp ~/bosco_cori_haswell_fullnode.129/glite/bin/slurm_local_submit_attributes.sh ~/bosco_cori_haswell_fullnode/glite/bin/slurm_local_submit_attributes.sh |
449 | 16 | Farrukh Khan | </pre> |
450 | 16 | Farrukh Khan | |
451 | 1 | Farrukh Khan | h3. Script based update |
452 | 16 | Farrukh Khan | |
453 | 16 | Farrukh Khan | Script is a work in progress. |