Project

General

Profile

BOSCO submission setup » History » Version 15

« Previous - Version 15/19 (diff) - Next » - Current version
Farrukh Khan, 09/27/2017 04:30 PM


BOSCO submission setup

The wiki contains step by step instructions on how to install and configure BOSCO submission from a glideinWMS factory

Terminology

Term Description
BOSCO_HOST This is the remote login node from which glideins will be submitted to the local batch queue. For example, in the instructions below BOSCO_HOST is 'cori.nersc.gov'
FACTORY_HOST This is the node where glideinWMS factory service has been installed and configured to run. For example, in the instructions below FACTORY_HOST is 'fermifactory01.fnal.gov'
FRONTEND_HOST This is the node where glideinWMS frontend service has been installed and configured to run. For example, in the instructions below FRONTEND_HOST is 'cmssrv279.fnal.gov'

Vanilla installation

  1. Log into any node from where you can SSH into BOSCO_HOST. It is strongly recommended that the architecture and operating system of the host where you set BOSCO up from is similar to the BOSCO_HOST as you may need to copy some libraries over if needed. In the instructions below, the host being used to setup BOSCO is 'lxplus030.cern.ch'.
    [fkhan@dhcp-131-225-82-129 ~]$ ssh fakhan@lxplus030.cern.ch
    Password: 
    Last login: Tue Sep 19 22:44:19 2017 from dhcp-131-225-82-129.dhcp.fnal.gov
    * ********************************************************************
    * Welcome to lxplus030.cern.ch, SLC, 6.9
    * Archive of news is available in /etc/motd-archive
    * Reminder: You have agreed to comply with the CERN computing rules
    * https://cern.ch/ComputingRules
    * Puppet environment: production, Roger state: production
    * Foreman hostgroup: lxplus/nodes/login
    * LXPLUS Public Login Service
    * ********************************************************************
    [fakhan@lxplus030 ~]$
    
  2. Take a look at the FTP server at UW Madison hosting different BOSCO versions here. Select the appropriate version to download and wget the relevant boscoinstaller.tar.gz file. For example for version 1.2.10, fetch the installer as follows:
    [fakhan@lxplus030 ~]$ wget ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz 
    --2017-09-19 22:29:10--  ftp://ftp.cs.wisc.edu/condor/bosco/1.2.10/boscoinstaller.tar.gz
               => “boscoinstaller.tar.gz”
    Resolving ftp.cs.wisc.edu... 128.105.2.31
    Connecting to ftp.cs.wisc.edu|128.105.2.31|:21... connected.
    Logging in as anonymous ... Logged in!
    ==> SYST ... done.    ==> PWD ... done.
    ==> TYPE I ... done.  ==> CWD (1) /condor/bosco/1.2.10 ... done.
    ==> SIZE boscoinstaller.tar.gz ... 20480
    ==> PASV ... done.    ==> RETR boscoinstaller.tar.gz ... done.
    Length: 20480 (20K) (unauthoritative)
    
    100%[==================================================================================================================>] 20,480      --.-K/s   in 0.1s    
    
    2017-09-19 22:29:12 (182 KB/s) - “boscoinstaller.tar.gz” saved [20480]
    
  3. Untar the downloaded installer and run it to install BOSCO on the current machine. For example:
    [fakhan@lxplus030 ~]$ tar -xvf boscoinstaller.tar.gz 
    boscoinstaller
    
    [fakhan@lxplus030 ~]$ ./boscoinstaller 
    Downloading BOSCO from ftp://ftp.cs.wisc.edu/condor/bosco/1.2/bosco-1.2-x86_64_RedHat6.tar.gz
    Installing BOSCO in ~/bosco
    Installing Condor from /tmp/fakhan/tmpbEI905/condor-8.6.6-x86_64_RedHat6-stripped to /afs/cern.ch/user/f/fakhan/bosco
    
    Condor has been installed into:
        /afs/cern.ch/user/f/fakhan/bosco
    
    Configured condor using these configuration files:
      global: /afs/cern.ch/user/f/fakhan/bosco/etc/condor_config
      local:  /afs/cern.ch/user/f/fakhan/bosco/local.bosco/condor_config.local
    
    In order for Condor to work properly you must set your CONDOR_CONFIG
    environment variable to point to your Condor configuration file:
    /afs/cern.ch/user/f/fakhan/bosco/etc/condor_config before running Condor
    commands/daemons.
    Created a script you can source to setup your Condor environment
    variables. This command must be run each time you log in or may
    be placed in your login scripts:
       source /afs/cern.ch/user/f/fakhan/bosco/bosco_setenv
    
    Congratulations, you installed BOSCO succesfully!
    
  4. Create a .bosco directory. For exmaple,
    [fakhan@lxplus030 ~]$ mkdir ~/.bosco
    
  5. If you do not have an existing key pair to access BOSCO_HOST (in our case, cori.nersc.gov), generate a passwordless rsa key. Just press enter twice with no password when it prompts for one. Note it is important to name the key bosco_key.rsa:
    $ ssh-keygen -t rsa -f ~/.ssh/bosco_key.rsa
    
    If you already have a key pair, there is no need to generate a new one
  6. If you do have an existing key pair to access BOSCO_HOST (in our case, cori.nersc.gov), copy it to your ssh directory and name it bosco_key. For example, your ~/.ssh/ directory should resemble this:
    [fakhan@lxplus030 ~]$ ls -al ~/.ssh/
    total 99
    drwx------.  3 fakhan def-cg  2048 Sep 13 19:59 .
    drwxr-xr-x. 17 fakhan def-cg  4096 Sep 19 22:32 ..
    -rw-------.  1 fakhan zh      1671 Sep 12 00:38 bosco_key.rsa
    -rw-------.  1 fakhan zh       405 Sep 12 00:42 bosco_key.rsa.pub
    -rw-------.  1 fakhan zh      1743 Feb  1  2017 id_rsa
    -rw-r--r--.  1 fakhan zh       408 Feb  1  2017 id_rsa.pub
    -rw-r--r--.  1 fakhan def-cg 83355 Sep 18 19:17 known_hosts
    
  7. Source the bosco environment temporarily.
    [fakhan@lxplus055 ~]$ source ~/bosco/bosco_setenv
    
  8. Start bosco on the host.
    [fakhan@lxplus055 ~]$ bosco_start
    BOSCO Started
    
  9. Now add the BOSCO_HOST as a cluster you would like to submit to. You need to know the platform and the batch system of the BOSCO_HOST. In our example, BOSCO_HOST is cori.nersc.gov and it runs a variant of RH6 with Slurm. The eventual command will be:
    [fakhan@lxplus055 ~]$ bosco_cluster --platform RH6 --add timm@cori.nersc.gov slurm
    Enter the password to copy the ssh keys to timm@cori.nersc.gov:
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
    Password:
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
    Downloading for timm@cori.nersc.gov.......
    Unpacking..
    Sending libraries to timm@cori.nersc.gov.
    Creating BOSCO for the WN's............................................
    Installing on cluster timm@cori.nersc.gov......
    Installation complete
    The cluster timm@cori.nersc.gov has been added to BOSCO
    It is available to run jobs submitted with the following values:
    > universe = grid
    > grid_resource = batch slurm timm@cori.nersc.gov
    
    This command will prompt you for a password. Please note that the above command might take a bit of time since it copies Bosco binaries over to the BOSCO_HOST. Do not panic and wait for command to return.
  10. Log onto the BOSCO_HOST and check for the 'bosco' directory. For example,
    [fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov
    
    timm@cori07:~> ls -al  bosco
    total 8
    drwxr-xr-x  5 timm timm  512 Sep 19 13:58 .
    drwx--x--x 22 timm timm 4096 Sep 19 14:05 ..
    drwxr-xr-x  2 timm timm  512 Sep 19 13:58 campus_factory
    drwxr-xr-x  7 timm timm  512 Sep 19 13:57 glite
    drwxr-xr-x  2 timm timm  512 Sep 19 13:57 sandbox
    
  11. Touch a new file inside the bosco directory with information about the version and deployment date. This is not necessarily needed but is helpful to track thing. For example,
    timm@cori07:~/bosco> touch ~/bosco/version_info
    timm@cori07:~/bosco> echo "bosco: 1.2.10" >> ~/bosco/version_info
    timm@cori07:~/bosco> echo "condor:8.6.6" >> ~/bosco/version_info
    timm@cori07:~/bosco> echo "deployed: Sep. 19, 2017" >> ~/bosco/version_info
    timm@cori09:~/bosco> cat ~/bosco/version_info
    bosco: 1.2.10  
    condor:8.6.6
    deployed: Sep. 19, 2017
    
  12. The above steps should setup a clean install of bosco. For additional NERSC specific changes, please follow the instructions in the next section.

NERSC site specific configuration instructions

These instructions assume that you have followed the instructions in the previous section and have a vanilla installation of bosco already in place. Please follow the additional steps below for NERSC:
  1. Vanilla bosco install doesn't have libcrypto.so.10 and libssl.so.10. These two libraries need to be copied over from any SL6/RH6/CC6 64 bit machine. You can use the commands below to identify the relevant library files to copy:
    [fakhan@lxplus055 ~]$ ldconfig -p | grep "libcrypto.so.10" 
        libcrypto.so.10 (libc6,x86-64) => /usr/lib64/libcrypto.so.10
        libcrypto.so.10 (libc6) => /usr/lib/libcrypto.so.10
    [fakhan@lxplus055 ~]$ ldconfig -p | grep "libssl.so.10" 
        libssl.so.10 (libc6,x86-64) => /usr/lib64/libssl.so.10
        libssl.so.10 (libc6) => /usr/lib/libssl.so.10
    
  2. Copy the files over to ~/bosco/glite/lib/ on cori.nersc.gov:
    [fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libssl.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
    libssl.so.10                                                                                                              100%  433KB 433.0KB/s   00:01    
    [fakhan@lxplus055 ~]$ scp -i .ssh/bosco_key.rsa /usr/lib64/libcrypto.so.10 timm@cori.nersc.gov:~/bosco/glite/lib/
     *****************************************************************
     *                                                               *
     *                      NOTICE TO USERS                          *
     *                      ---------------                          *
     *                                                               *
     *  Lawrence Berkeley National Laboratory operates this          *
     *  computer system under contract to the U.S. Department of     *
     *  Energy.  This computer system is the property of the United  *
     *  States Government and is for authorized use only.  *Users    *
     *  (authorized or unauthorized) have no explicit or implicit    *
     *  expectation of privacy.*                                     *
     *                                                               *
     *  Any or all uses of this system and all files on this system  *
     *  may be intercepted, monitored, recorded, copied, audited,    *
     *  inspected, and disclosed to site, Department of Energy, and  *
     *  law enforcement personnel, as well as authorized officials   *
     *  of other agencies, both domestic and foreign.  *By using     *
     *  this system, the user consents to such interception,         *
     *  monitoring, recording, copying, auditing, inspection, and    *
     *  disclosure at the discretion of authorized site or           *
     *  Department of Energy personnel.*                             *
     *                                                               *
     *  *Unauthorized or improper use of this system may result in   *
     *  administrative disciplinary action and civil and criminal    *
     *  penalties.  _By continuing to use this system you indicate   *
     *  your awareness of and consent to these terms and conditions  *
     *  of use.  LOG OFF IMMEDIATELY if you do not agree to the      *
     *  conditions stated in this warning._*                         *
     *                                                               *
     *****************************************************************
    libcrypto.so.10                                                                                                           100% 1925KB 962.6KB/s   00:02    
    
  3. Verify that the files have successfully been copied over:
    [fakhan@lxplus055 ~]$ ssh -i ~/.ssh/bosco_key.rsa timm@cori.nersc.gov
    timm@cori11:~/bosco> ls -al ~/bosco/glite/lib/
    total 7232
    drwxr-xr-x 3 timm timm     512 Sep 19 14:26 .
    drwxr-xr-x 7 timm timm     512 Sep 19 13:57 ..
    drwxr-xr-x 2 timm timm    8192 Sep 11 22:46 condor
    lrwxrwxrwx 1 timm timm      15 Sep 19 13:57 libclassad.so -> libclassad.so.8
    lrwxrwxrwx 1 timm timm      19 Sep 19 13:57 libclassad.so.8 -> libclassad.so.8.6.6
    -rwxr-xr-x 1 timm timm  605360 Sep 11 22:46 libclassad.so.8.6.6
    -rwxr-xr-x 1 timm timm 4358312 Sep 11 22:46 libcondor_utils_8_6_6.so
    -rwxr-xr-x 1 timm timm 1971488 Sep 19 14:26 libcrypto.so.10
    -rwxr-xr-x 1 timm timm  443416 Sep 19 14:26 libssl.so.10
    
  4. Modify batch_gahp configuration file to add Slurm and update the blah_job_wrapper to accommodate shifter:
    timm@cori11:~/bosco> vim ~/bosco/glite/etc/batch_gahp.config
    
    On line 2, modify configuration params so they are as follows (previous supported_lrms is commented out and slurm is added):
    #Supported batch systems
    #supported_lrms=pbs,lsf,sge,condor
    supported_lrms=slurm
    
    In the same file, go to line 115. This should bring you to the Slurm specific configuration section. Add 'blah_job_wrapper' here so that the configuration file looks as follows:
    ## SLURM
    
    #path to the slurm executables
    slurm_binpath=`which sbatch 2>/dev/null|sed 's|/[^/]*$||'`
    
    # Needed for correct SLURM submission
    blah_job_wrapper='srun shifter'
    

NERSC entry specific configuration instructions

These instructions assume that you have followed the instructions in the previous two sections. You should have a bosco directory with a vanilla installation and relevant dependency libraries already in place. The instructions below vary per entry. In the examples below, changes are being made for Cori KNL fullnode entry. Please modify the entry name in the examples to your entry name while following the instructions.

  1. Make sure you are logged into BOSCO_HOST (cori.nersc.gov in our case).
  2. Copy the vanilla installation to another directory and name the new directory such as you can identify different entries. This is useful because it keeps the vanilla install in tact and the same install can be used for multiple entries later if needed. For example,
    hufnagel@cori04:~> cp -R ~/bosco ~/bosco_cori_knl_fullnode
    
  3. Given that we are setting up a full node entry, we need to modify batch_gahp configuration again. Note: You can safely skip this if the entry isn't supposed to run fullnode pilots.
    hufnagel@cori04:~> vim ~/bosco_cori_knl_fullnode/glite/etc/batch_gahp.config 
    
    Go to line 115 and update 'blah_job_wrapper' so it looks as follows:
    # Needed for correct SLURM submission
    blah_job_wrapper='srun --no-kill shifter'
    
  4. Next, we need to update the default log and sandbox location so the pilot do not pollute the vanilla install. Update 'condor_config.ft-gahp' for this as follows:
    hufnagel@cori04:~> vim ~/bosco_cori_knl_fullnode/glite/etc/condor_config.ft-gahp
    
    Update the locations of these variables per the bosco directory of your entry. For example, in our case (Cori KNL fullnode) the directory name is 'bosco_cori_knl_fullnode' (look at step 1):
    BOSCO_SANDBOX_DIR=$ENV(HOME)/bosco_cori_knl_fullnode/sandbox
    LOG=$ENV(HOME)/bosco_cori_knl_fullnode/glite/log
    FT_GAHP_LOG=$(LOG)/FTGahpLog
    SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD
    SEC_PASSWORD_FILE = $ENV(HOME)/bosco_cori_knl_fullnode/glite/etc/passwdfile
    USE_SHARED_PORT = False
    ENABLE_URL_TRANSFERS = False
    
  5. The final step is to edit the 'slurm_local_submit_attributes.sh' file. This file contains a list of job directives that tell Slurm different attributes of the job to run and these attributes can be specific for each entry. For example, for Cori KNL fullnode the file is as follows:
    hufnagel@cori01:~> cat ~/bosco_cori_knl_fullnode/glite/bin/slurm_local_submit_attributes.sh
    #!/bin/sh
    
    echo "#SBATCH --account=m2612" 
    
    echo "#SBATCH --partition=regular" 
    echo "#SBATCH --constraint=knl" 
    
    echo "#SBATCH --qos=normal" 
    
    echo "#SBATCH -N 1" 
    
    echo "#SBATCH --ntasks-per-node=1" 
    echo "#SBATCH --cpus-per-task=138" 
    
    echo "#SBATCH --image=custom:cms_cvmfs:latest" 
    echo "#SBATCH -L cscratch1" 
    echo "#SBATCH --volume=\"/global/cscratch1/sd/hufnagel/SITECONF:/cvmfs/cms.cern.ch/SITECONF;/global/cscratch1/sd/hufnagel/node_cache:/tmp:perNodeCache=size=1780G\"" 
    
    echo "#SBATCH -t 24:00:00" 
    
    In English: the pilot (job from Slurm perspective) is run under account m2612, in normal KNL queue with regular partition, using one node, 24 hours of max runtime and cms_cvmfs:latest shifter image. More details about these (and many other) attributes can be found here:"http://www.nersc.gov/users/computational-systems/cori/running-jobs/batch-jobs/". You can also see the details of these flags by running 'srun --help' on cori.nersc.gov.

Updating an existing installation

Manual update

Script based update