Project

General

Profile

MINOS CONDOR AND GLIDEINWMS ADMINISTRATION

Overview

Minos uses a local condor pool for batch slot running on minos50/51/52/53,
and as a gateway to the larger Fermigrid and OSG via glideinWMS.

The CS/SCD/SCS/GCSO FermiGrid group supports these systems.

Minos has limited administrative access for operations.

  • Minos25 is the submit machine running condor scheduler (schedd). Minos54 runs glideinWMS 2.5.2 and its accompanying condor daemons to enable batch submission to the grid, usually fermigrid-osg general purpose pool, sometimes fermigrid cdf pool. Both machines are currently at condor 7.4.4. Minos50-53 are attached as a local batch pool running 32 slots.
  • here is the crontab for gfactory@minos54
    [gfactory@minos54 ~]$ crontab -l
    MAILTO='minos-admin@fnal.gov'
    
    @reboot     /home/gfactory/start_glideinWMS.sh
    
    55 5 * * *  /home/gfactory/refresh_cert
    

HOWTO set priority factors

Starting/Stopping

  • Minos54: log in as 'gfactory', start_glideinWMS.sh stop_glideinWMS.sh scripts are in the home directory
  • the web server on minos54 has to be running as well, sudo /etc/init.d/httpd start if necessary
  • Minos25, minos50-53 : log in, sudo /etc/init.d/condor (start or stop)

Stopping(Draining) the Queues

  • to prevent new jobs from starting do a condor_off -peaceful. Running jobs will continue until they finish, the node will not accept new jobs.
    [dbox@minos25 ~]$ . /opt/condor/condor.sh
    [dbox@minos25 ~]$ sudo /opt/condor/sbin/condor_off -peaceful
    
    
  • run the stop_factory.sh script as uid gfactory on minos54.

Resuming the Queues

  • run stop_glideinWMS.sh on minos54 (uid gfactory)
  • sudo /etc/init.d/condor stop on minos25 (uid someone in sudoers)
  • run start_glideinWMS.sh on minos54 (uid gfactory)
  • sudo /etc/init.d/condor start on minos25 (uid someone in sudoers)

Checking/Clearing Held Glideins

  • Glideins report to their own condor collector (called the WMSCollector) on minos54. A regular condor_q from minos25 will not see them. To use condor tools on the WMSCollector:
  • ssh gfactory@minos54
  • source working/v2_5_2/wmscollectorcondor/condor.sh
  • condor_q, condor_status -any, etc
  • this is where you can condor_rm a misbehaving glidein

Requirements

Minos Batch/Condor requirements per FIFE survey request of May 2013