Project

General

Profile

Old Configure the Server

Configure the Server

Configuration Files

  • the jobsub RPM creates two important configuration files
    • /etc/httpd/conf.d/jobsub_api.conf
    • /opt/jobsub/server/conf/jobsub.ini

configure jobsub_api.conf settings

The relevant section of jobsub_api.conf:

    WSGIDaemonProcess jobsub user=grid group=condor processes=2 threads=25 python-path=/opt/jobsub/server/webapp:/opt/jobsub/lib/JobsubConfigParser:/opt/jobsub/lib/logger
    WSGIProcessGroup jobsub
    WSGIScriptAlias / /opt/jobsub/server/webapp/jobsub_api.py

    SetEnv JOBSUB_INI_FILE /opt/jobsub/server/conf/jobsub.ini
    SetEnv JOBSUB_LOG_DIR /opt/jobsub/server/log
    SetEnv JOBSUB_APP_NAME jobsub
    SetEnv JOBSUB_ENV_RUNNER /opt/jobsub/server/webapp/jobsub_env_runner.sh

    # Assuming user grid
    SetEnv JOBSUB_CREDENTIALS_DIR /home/grid/.security
    SetEnv KADMIN_PASSWD_FILE /home/grid/.security/kadmin_passwd
  • If you are running the server as different uid:gid than grid:condor edit the line 'WSGIDaemonProcess jobsub user=grid group=condor ....' to the uid:gid you prefer.
  • make the security directory corresponding to JOBSUB_CREDENTIALS_DIR The RPM sets this value to be ~grid/.security, so if you do not change it in the config file you would do the following:
    • mkdir ~grid/.security
    • copy fifegrid.keytab and kadmin_passwd into ~grid/.security (I am not at liberty to tell you where you get these keytab and password files from)
    • chmod -R 700 ~grid/.security
    • chown -R grid:condor ~grid/.security
  • make sure JOBSUB_LOG_DIR exists and is writable by whoever is defined by WSGIDaemonProcess (grid:condor in this example). For the default install, that would mean the following 2 commands:
    • mkdir -p /opt/jobsub/server/log
    • chown grid:condor /opt/jobsub/server/log
  • consider placing your log files in a more 'standard' area (/var/log/ or wherever your organizations standard log area is) and enabling log rotation if this is a heavily used production machine.

configure jobsub.ini settings

Here is the relevant section of jobsub.ini:

[REPLACE_THIS_WITH_SUBMIT_HOST]
command_path_root = /scratch/uploads
condor_tmp = /scratch/uploads/${GROUP}/${LOGNAME}/${WORKDIR_ID}
condor_exec = /scratch/uploads/${GROUP}/${LOGNAME}/${WORKDIR_ID}
x509_user_proxy = /scratch/proxies/${LOGNAME}/${LOGNAME}.${GROUP}.proxy
desired_os = ''
storage_group=condor

  • change REPLACE_THIS_WITH_SUBMIT_HOST with the hostname of the machine you are installing on
    • ex: if you are installing on fifebatch1.fnal.gov replace the entry '[REPLACE_THIS_WITH_SUBMIT_HOST]' with [fifebatch1.fnal.gov]'
  • the 'x509_user_proxy' setting governs where generated proxies for users are put, by default it is under /scratch/proxies . In this case, /scratch/proxies must be readable and writable by grid:condor and no other user
  • IMPORTANT change the x509_user_proxy to (wherever_you_decide_proxies_go)/${GROUP}/x509cc_${LOGNAME}
    • example you decide to use /scratch/proxies, the default set x509_user_proxy to:
      • x509_user_proxy = /scratch/proxies/${GROUP}/x509cc_${LOGNAME}
  • the 'command_path_root' setting governs where the users 'sandbox' for a particular job submission is set. This must be writable by grid:condor and readable by others
  • if you changed the uid:gid that the server runs as in jobsub_api.conf, you need to change the storage_group=condor setting to storage_group=(whatever_gid_you_are_using) in the jobsub.ini file
  • create working directories that jobsub server expects based on the jobsub.ini settings. For the default case that the RPM generates, this would be:
    • mkdir -p /scratch/proxies
    • mkdir -p /scratch/uploads/
    • touch /scratch/uploads/job.log
    • chown -R grid:condor /scratch/proxies
    • chown -R grid:condor /scratch/uploads
    • chmod 755 /scratch
    • chmod -R 700 /scratch/proxies
    • chmod -R 775 /scratch/uploads

configure the condor schedd

  • edit the QUEUE_SUPER_USERS and QUEUE_SUPER_USER_MAY_IMPERSONATE value as below;
    #whatever you need to talk to external collector, plus these two QUEUE_SUPER_USER values
    @
    #nb if you change the server to run as some other uid than 'grid' 
    #you have to put that uid in the QUEUE_SUPER_USERS list instead of 'grid'
    @
    QUEUE_SUPER_USERS       = root, condor, grid 
    QUEUE_SUPER_USER_MAY_IMPERSONATE = .*
    

You will also need to configure the Glideinwms frontend to look for the jobs in this schedd

configure the cert and crl updates

  • /usr/sbin/osg-ca-manage setupCA --location root --url osg
  • /sbin/service osg-update-certs-cron start
  • /sbin/chkconfig fetch-crl-boot on
  • /sbin/chkconfig fetch-crl-cron on

enable proxy refresh script in a cron job

  • /opt/jobsub/server/admin/krbrefresh.sh must be run from a cron job
  • must be run from the account that runs the webserver i.e. 'grid' in this example, 'rexbatch' on fifebatch1.fnal.gov
  • here is online help for krbrefresh.sh :
$ /opt/jobsub/server/admin/krbrefresh.sh
###################################################################
file:krbrefresh.sh
usage: krbrefresh.sh [ -h ] 
                     [--help] 
                     [--refresh-proxies ]  [age_in_seconds]

it must be run as user grid who has the ability to refresh user 
kerberos principals and voms-proxies in $JOBSUB_CREDENTIALS_DIR

This script refreshes the kerberos proxies of any user in the queue 
that has a kerberos principal older than [age_in_seconds].  If no
[age_in_seconds] argument is given, the default of 3600 seconds is used.

This script logs its actions to file /opt/jobsub/server/log/admin.log
##################################################################

changes necessary for HA servers

  • there are typically more than one HA servers sitting behind a DNS round-robin alias
  • add a line for each server in the DNS round robin to allow the servers to authenticate condor commands to each other. In this example there are two servers, fermicloud114 and fermicloud396:
    • GSI "/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=fermicloud114.fnal.gov" submitter
    • GSI "/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=fermicloud396.fnal.gov" submitter
  • add this line to CERTIFICATE_MAPFILE to identify jobsub users to condor:
    • GSI "/DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=fifegrid/CN=batch/CN=(.*)/CN=UID:(.*)/" rexbatch

Start the server

  • service httpd start
  • service condor start

Server v1.1 additional settings

jobsub_api.conf

  • SSLVerifyDepth 5

jobsub.ini

  • authentication_methods = gums,kca-dn