Project

General

Profile

BOSCO submission setup » History » Version 12

Version 11 (Farrukh Khan, 09/26/2017 03:24 PM) → Version 12/19 (Farrukh Khan, 09/27/2017 03:11 PM)

h1. BOSCO submission setup

The wiki contains step by step instructions on how to install and configure BOSCO submission from a glideinWMS factory

{{toc}}

h2. Terminology

| *Term* | *Description* |
| BOSCO_HOST | This is the remote login node from which glideins will be submitted to the local batch queue. For example, in the instructions below BOSCO_HOST is 'cori.nersc.gov' |
| FACTORY_HOST | This is the node where glideinWMS factory service has been installed and configured to run. For example, in the instructions below FACTORY_HOST is 'fermifactory01.fnal.gov' |
| FRONTEND_HOST | This is the node where glideinWMS frontend service has been installed and configured to run. For example, in the instructions below FRONTEND_HOST is 'cmssrv279.fnal.gov' |

h2. Vanilla installation

# Log into any node from where you can SSH into BOSCO_HOST. It is strongly recommended that the architecture and operating system of the host where you set BOSCO up from is similar to the BOSCO_HOST as you may need to copy some libraries over if needed. In the instructions below, the host being used to setup BOSCO is 'lxplus030.cern.ch'. <pre>
[fkhan@dhcp-131-225-82-129 ~]$ ssh fakhan@lxplus030.cern.ch
Password:
Last login: Tue Sep 19 22:44:19 2017 from dhcp-131-225-82-129.dhcp.fnal.gov
* ********************************************************************
* Welcome to lxplus030.cern.ch, SLC, 6.9
* Archive of news is available in /etc/motd-archive
* Reminder: You have agreed to comply with the CERN computing rules
* https://cern.ch/ComputingRules
* Puppet environment: production, Roger state: production
* Foreman hostgroup: lxplus/nodes/login
* LXPLUS Public Login Service
* ********************************************************************
[fakhan@lxplus030 ~]$
</pre>
# Take a look at the FTP server at UW Madison hosting different BOSCO versions "here":ftp://ftp.cs.wisc.edu/condor/bosco/. Select the appropriate version to download and wget the relevant boscoinstaller.tar.gz file. For example for version 1.2.10, fetch the installer as follows:<pre>
[fakhan@lxplus030 ~]$ wget ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz
--2017-09-19 22:29:10-- ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz
=> “boscoinstaller.tar.gz”
Resolving ftp.cs.wisc.edu... 128.105.2.31
Connecting to ftp.cs.wisc.edu|128.105.2.31|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /condor/bosco/1.2.10 ... done.
==> SIZE boscoinstaller.tar.gz ... 20480
==> PASV ... done. ==> RETR boscoinstaller.tar.gz ... done.
Length: 20480 (20K) (unauthoritative)

100%[==================================================================================================================>] 20,480 --.-K/s in 0.1s

2017-09-19 22:29:12 (182 KB/s) - “boscoinstaller.tar.gz” saved [20480]
</pre>
# Untar the downloaded installer and run it to install BOSCO on the current machine. For example:<pre>
[fakhan@lxplus030 ~]$ tar -xvf boscoinstaller.tar.gz
boscoinstaller

[fakhan@lxplus030 ~]$ ./boscoinstaller
Downloading BOSCO from ftp://ftp.cs.wisc.edu/condor/bosco/1.2/bosco-1.2-x86_64_RedHat6.tar.gz
Installing BOSCO in ~/bosco
Installing Condor from /tmp/fakhan/tmpbEI905/condor-8.6.6-x86_64_RedHat6-stripped to /afs/cern.ch/user/f/fakhan/bosco

Condor has been installed into:
/afs/cern.ch/user/f/fakhan/bosco

Configured condor using these configuration files:
global: /afs/cern.ch/user/f/fakhan/bosco/etc/condor_config
local: /afs/cern.ch/user/f/fakhan/bosco/local.bosco/condor_config.local

In order for Condor to work properly you must set your CONDOR_CONFIG
environment variable to point to your Condor configuration file:
/afs/cern.ch/user/f/fakhan/bosco/etc/condor_config before running Condor
commands/daemons.
Created a script you can source to setup your Condor environment
variables. This command must be run each time you log in or may
be placed in your login scripts:
source /afs/cern.ch/user/f/fakhan/bosco/bosco_setenv

Congratulations, you installed BOSCO succesfully!
</pre>
# Create a _.bosco_ directory. For exmaple, <pre>
[fakhan@lxplus030 ~]$ mkdir ~/.bosco
</pre>
# If you do not have an existing key pair to access BOSCO_HOST (in our case, _cori.nersc.gov_), generate a passwordless rsa key. Just press enter twice with no password when it prompts for one. Note it is important to name the key bosco_key.rsa: <pre>
$ ssh-keygen -t rsa -f ~/.ssh/bosco_key.rsa
</pre> *If you already have a key pair, there is no need to generate a new one*
# If you do have an existing key pair to access BOSCO_HOST (in our case, _cori.nersc.gov_), copy it to your ssh directory and name it bosco_key. For example, your ~/.ssh/ directory should resemble this: <pre>
[fakhan@lxplus030 ~]$ ls -al ~/.ssh/
total 99
drwx------. 3 fakhan def-cg 2048 Sep 13 19:59 .
drwxr-xr-x. 17 fakhan def-cg 4096 Sep 19 22:32 ..
-rw-------. 1 fakhan zh 1671 Sep 12 00:38 bosco_key.rsa
-rw-------. 1 fakhan zh 405 Sep 12 00:42 bosco_key.rsa.pub
-rw-------. 1 fakhan zh 1743 Feb 1 2017 id_rsa
-rw-r--r--. 1 fakhan zh 408 Feb 1 2017 id_rsa.pub
-rw-r--r--. 1 fakhan def-cg 83355 Sep 18 19:17 known_hosts
</pre>
# Source the bosco environment temporarily.
<pre>
[fakhan@lxplus055 ~]$ source ~/bosco/bosco_setenv
</pre>
# Start bosco on the host.
<pre>
[fakhan@lxplus055 ~]$ bosco_start
BOSCO Started
</pre>
# Now add the BOSCO_HOST as a cluster you would like to submit to. You need to know the platform and the batch system of the BOSCO_HOST. In our example, BOSCO_HOST is cori.nersc.gov and it runs a variant of RH6 with Slurm. The eventual command will be:
<pre>
[fakhan@lxplus055 ~]$ bosco_cluster --platform RH6 --add timm@cori.nersc.gov slurm
Enter the password to copy the ssh keys to timm@cori.nersc.gov:
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
Password:
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
Downloading for timm@cori.nersc.gov.......
Unpacking..
Sending libraries to timm@cori.nersc.gov.
Creating BOSCO for the WN's............................................
Installing on cluster timm@cori.nersc.gov......
Installation complete
The cluster timm@cori.nersc.gov has been added to BOSCO
It is available to run jobs submitted with the following values:
> universe = grid
> grid_resource = batch slurm timm@cori.nersc.gov
</pre> This command will prompt you for a password. Please note that the above command might take a bit of time since it copies Bosco binaries over to the BOSCO_HOST. Do not panic and wait for command to return.
# Log onto the BOSCO_HOST and check for the 'bosco' directory. For example,
<pre>
[fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov

timm@cori07:~> ls -al bosco
total 8
drwxr-xr-x 5 timm timm 512 Sep 19 13:58 .
drwx--x--x 22 timm timm 4096 Sep 19 14:05 ..
drwxr-xr-x 2 timm timm 512 Sep 19 13:58 campus_factory
drwxr-xr-x 7 timm timm 512 Sep 19 13:57 glite
drwxr-xr-x 2 timm timm 512 Sep 19 13:57 sandbox
</pre>
# Touch a new file inside the bosco directory with information about the version and deployment date. This is not necessarily needed but is helpful to track thing. For example,
<pre>
timm@cori07:~/bosco> touch ~/bosco/version_info
timm@cori07:~/bosco> echo "bosco: 1.2.10" >> ~/bosco/version_info
timm@cori07:~/bosco> echo "condor:8.6.6" >> ~/bosco/version_info
timm@cori07:~/bosco> echo "deployed: Sep. 19, 2017" >> ~/bosco/version_info
timm@cori09:~/bosco> cat ~/bosco/version_info
bosco: 1.2.10
condor:8.6.6
deployed: Sep. 19, 2017
</pre>
# The above steps should setup a clean install of bosco. For additional NERSC specific changes, please follow the instructions in the next section.

h3. NERSC specific configuration instructions

These instructions assume that you have followed the instructions in the previous section and have a vanilla installation of bosco already in place. Please the additional steps below for NERSC:
# Vanilla bosco install doesn't have libcrypto.so.10 and libssl.so.10. These two libraries need to be copied over from any SL6/RH6/CC6 64 bit machine. You can use the commands below to identify the relevant library files to copy:
<pre>
[fakhan@lxplus055 ~]$ ldconfig -p | grep "libcrypto.so.10"
libcrypto.so.10 (libc6,x86-64) => /usr/lib64/libcrypto.so.10
libcrypto.so.10 (libc6) => /usr/lib/libcrypto.so.10
[fakhan@lxplus055 ~]$ ldconfig -p | grep "libssl.so.10"
libssl.so.10 (libc6,x86-64) => /usr/lib64/libssl.so.10
libssl.so.10 (libc6) => /usr/lib/libssl.so.10
</pre>
# Copy the files over to ~/bosco/glite/lib/ on cori.nersc.gov:
<pre>
[fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libssl.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
libssl.so.10 100% 433KB 433.0KB/s 00:01
[fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libcrypto.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/
*****************************************************************
* *
* NOTICE TO USERS *
* --------------- *
* *
* Lawrence Berkeley National Laboratory operates this *
* computer system under contract to the U.S. Department of *
* Energy. This computer system is the property of the United *
* States Government and is for authorized use only. *Users *
* (authorized or unauthorized) have no explicit or implicit *
* expectation of privacy.* *
* *
* Any or all uses of this system and all files on this system *
* may be intercepted, monitored, recorded, copied, audited, *
* inspected, and disclosed to site, Department of Energy, and *
* law enforcement personnel, as well as authorized officials *
* of other agencies, both domestic and foreign. *By using *
* this system, the user consents to such interception, *
* monitoring, recording, copying, auditing, inspection, and *
* disclosure at the discretion of authorized site or *
* Department of Energy personnel.* *
* *
* *Unauthorized or improper use of this system may result in *
* administrative disciplinary action and civil and criminal *
* penalties. _By continuing to use this system you indicate *
* your awareness of and consent to these terms and conditions *
* of use. LOG OFF IMMEDIATELY if you do not agree to the *
* conditions stated in this warning._* *
* *
*****************************************************************
libcrypto.so.10 100% 1925KB 962.6KB/s 00:02
</pre>
# Verify that the files have successfully been copied over:
<pre>
[fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov
timm@cori11:~/bosco> ls -al ~/bosco/glite/lib/
total 7232
drwxr-xr-x 3 timm timm 512 Sep 19 14:26 .
drwxr-xr-x 7 timm timm 512 Sep 19 13:57 ..
drwxr-xr-x 2 timm timm 8192 Sep 11 22:46 condor
lrwxrwxrwx 1 timm timm 15 Sep 19 13:57 libclassad.so -> libclassad.so.8
lrwxrwxrwx 1 timm timm 19 Sep 19 13:57 libclassad.so.8 -> libclassad.so.8.6.6
-rwxr-xr-x 1 timm timm 605360 Sep 11 22:46 libclassad.so.8.6.6
-rwxr-xr-x 1 timm timm 4358312 Sep 11 22:46 libcondor_utils_8_6_6.so
-rwxr-xr-x 1 timm timm 1971488 Sep 19 14:26 libcrypto.so.10
-rwxr-xr-x 1 timm timm 443416 Sep 19 14:26 libssl.so.10
</pre>
# Modify batch_gahp configuration file to add Slurm and update the blah_job_wrapper to accommodate shifter:
<pre>
timm@cori11:~/bosco> vim ~/bosco/glite/etc/batch_gahp.config
</pre> On line 2, modify configuration params so they are as follows (previous supported_lrms is commented out and slurm is added):
<pre>
#Supported batch systems
#supported_lrms=pbs,lsf,sge,condor
supported_lrms=slurm
</pre> In the same file, go to line 115. This should bring you to the Slurm specific configuration section. Add 'blah_job_wrapper' here so that the configuration file looks as follows:
<pre>
## SLURM

#path to the slurm executables
slurm_binpath=`which sbatch 2>/dev/null|sed 's|/[^/]*$||'`

# Needed for correct SLURM submission
blah_job_wrapper='srun shifter'
</pre>


h2. GlideinWMS frontend configuration

h2. GlideinWMS factory configuration