Project

General

Profile

Installing GlideinWMS 2 5 1 on SL5 Machine Development Machine using ini file installer

  • tar xzvf glideinWMS_v2_5_1.tgz
  • cd glideinWMS/install ; export src=`pwd`
  • have to do install in this order, if its all on the same machine you can do it this way:
    • $src/manage-glideins --ini ~dbox/glideinWMS.ini --install wmscollector; $src/manage-glideins --ini ~dbox/glideinWMS.ini --install factory; $src/manage-glideins --ini ~dbox/glideinWMS.ini --install usercollector ; $src/manage-glideins --ini ~dbox/glideinWMS.ini --install submit; $src/manage-glideins --ini ~dbox/glideinWMS.ini --install vofrontend
  • A screen capture of how I answered these questions. When it asks if you want entry points from RESS, I say yes, then follow up saying I only want the entry point ress_ITB_INSTALL_TEST_3. After testing that this works, I edit glideinWMS.xml to add additional entry points. If you say you want them all you end up with a zillion entry points that you will never use that just make site maintenance more complex.
  • now test your setup to see if you can submit to ress_ITB_INSTALL_TEST_3
    • you have installed 4 condors. Much confusion results from using the wrong one for a given task. The submitcondor is used for submission, oddly enough.
    • . /home/gfactory/working/submitcondor/condor.sh
    • my test files in this example are test.cmd (a condor submit file) and test.sh (what I want to run on the worker node)
      [gfactory@sngpvm03 tmpwork]$ condor_submit test.cmd
      Submitting job(s).
      Logging submit event(s).
      1 job(s) submitted to cluster 2.
      [gfactory@sngpvm03 tmpwork]$ condor_q
      
      -- Submitter: sngpvm03.fnal.gov : <131.225.67.18:46314> : sngpvm03.fnal.gov
       ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
         2.0   gfactory        3/15 10:30   0+00:00:00 I  0   0.0  test.sh           
      
      1 jobs; 1 idle, 0 running, 0 held
      
    • check that the job ran
      [gfactory@sngpvm03 tmpwork]$ condor_q
      
      -- Submitter: sngpvm03.fnal.gov : <131.225.67.18:46314> : sngpvm03.fnal.gov
       ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
      
      0 jobs; 0 idle, 0 running, 0 held
      
      [gfactory@sngpvm03 tmpwork]$ ls -lart test.sh.2*
      -rw-r--r-- 1 gfactory gpcf 15959 Mar 15 10:33 test.sh.2.0.output
      -rw-r--r-- 1 gfactory gpcf  1298 Mar 15 10:33 test.sh.2.0.log
      
      
  • edit original glideinWMS.xml file to add fermigrid entry
    [gfactory@sngpvm03 glidein_v2_4.cfg]$ pwd
    /home/gfactory/working/factory/glidein_v2_4.cfg
    [gfactory@sngpvm03 glidein_v2_4.cfg]$ ls
    glideinWMS.xml  glideinWMS.xml~
    [gfactory@sngpvm03 glidein_v2_4.cfg]$ cp glideinWMS.xml glideinWMS.xml.orig
    [gfactory@sngpvm03 glidein_v2_4.cfg]$ vi glideinWMS.xml
    
  • copy the xml section that starts: <entry name="ress_ITB_INSTALL_TEST_3" enabled="True" gatekeeper="cms-xen9.fnal.gov/jobmanager-condor" all the way through its xml close '</entry>'
  • paste this copy back in and change to name="fermigrid" enabled="True" gatekeeper="fnpcfg1.fnal.gov/jobmanager-condor". Make sure that the <entry and </entry> attributes match up so that they both open and close correctly, xml-wise.
  • now enable the entry point you just created in the xml
    [gfactory@sngpvm03 glidein_v2_4.cfg]$ cd ..
    [gfactory@sngpvm03 factory]$ ls
    client_files  factory_logs  factory.sh  glidein_v2_4  glidein_v2_4.cfg
    [gfactory@sngpvm03 factory]$ cd glidein_v2_4
    [gfactory@sngpvm03 glidein_v2_4]$ ls
    attributes.cfg                 glidein_startup.sh     monitor
    client_log                     glideinWMS.b3ehJu.xml  params.cfg
    client_proxies                 glideinWMS.xml         rsa.key
    entry_ress_ITB_INSTALL_TEST_3  job_submit.sh          signatures.sha1
    factory_startup                local_start.sh         update_proxy.py
    frontend.descript              lock
    glidein.descript               log
    [gfactory@sngpvm03 glidein_v2_4]$ . ../factory.sh
    [gfactory@sngpvm03 glidein_v2_4]$ ./factory_startup reconfig ../glidein_v2_4.cfg/glideinWMS.xml
    Shutting down glideinWMS factory v2_4@factory:             [OK]
    Reconfigured glidein 'v2_4'
    Active entries are:
      ress_ITB_INSTALL_TEST_3
      fermigrid
    Submit files are in /home/gfactory/working/factory/glidein_v2_4
    Reconfiguring the factory                                  [OK]
    Starting glideinWMS factory v2_4@factory:                  [OK]
    
  • turn off the old entry point so you can test submission to fermigrid:
    ./factory_startup down -entry ress_ITB_INSTALL_TEST_3 -delay 0
    Setting downtime...                                        [OK]
    [gfactory@sngpvm03 glidein_v2_4]$ 
    
  • submit a test job
    [gfactory@sngpvm03 ~]$ cd tmpwork/
    [gfactory@sngpvm03 tmpwork]$ which condor_submit
    /scratch/gfactory/wmscollectorcondor/bin/condor_submit
    [gfactory@sngpvm03 tmpwork]$#remember about all those condor installations? change back to submitcondor
    [gfactory@sngpvm03 tmpwork]$ . $HOME/working/submitcondor/condor.sh
    
    [gfactory@sngpvm03 tmpwork]$ condor_submit test.cmd
    Submitting job(s).
    Logging submit event(s).
    1 job(s) submitted to cluster 3.
    
    
  • when the job has run, verify that it ran on a fermigrid node
    [gfactory@sngpvm03 tmpwork]$ grep Host test.sh.3.0.output 
    Grid Job pid(22942) 20110315 115155: Hostname........... fnpc2072.fnal.gov
    

At this point, you are done installing glideinWMS 2.5.1 on a single node installation.

  • What follows is how to connect your glideinWMS installation to an existing local condor pool installed via rpms and run as root. The idea is this: the rpm condor is originally running these daemons:COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
  • However, glideinWMS installed as a non-privileged user is running all these daemons as well.
  • To get these two installations to play together nicely, the rpm condor must run the MASTER, SCHEDD, STARTD , and glideinWMS must run the COLLECTOR and NEGOTIATOR, and they all have to agree on what ports to use and authenticate with each other (glideinWMS is techy about authentication).
Heres' how I got it all to work.
  1. stop glideinWMS
    $src/manage-glideins --stop all --ini ~dbox/glideinWMS.ini
    
  2. edit the (rpm) condor config files
    • COLLECTOR_HOST = $(CONDOR_HOST) becomes COLLECTOR_HOST = $(CONDOR_HOST):9640 on schedd machine and all local worker nodes
    • all local worker nodes get /etc/grid-security hostcert.pem hostkey.pem and condor-mapfile to make sure they can talk to the glideinWMS user collector running on port 9640
    • DAEMON_LIST = MASTER, SCHEDD, STARTD
    • I ended up putting a ton of GSI_ and SEC_ related stuff in the condor_config_files, I am still trying to figure out how much of this is necessary:
  3. edit the glideinWMS usercollector condor-mapfile
    [gfactory@sngpvm03 working]$ cd usercollectorcondor/
    [gfactory@sngpvm03 usercollectorcondor]$ . condor.sh
    [gfactory@sngpvm03 usercollectorcondor]$ condor_config_val -dump | grep MAP
    CERTIFICATE_MAPFILE = /home/gfactory/working/usercollectorcondor/certs/condor_mapfile
    [gfactory@sngpvm03 usercollectorcondor]$ #need to add schedd and local worker nodes to mapfile so find them first
    [gfactory@sngpvm03 usercollectorcondor]$ openssl x509 -in /etc/grid-security/hostcert.pem -subject -noout
    subject= /DC=org/DC=doegrids/OU=Services/CN=sngpvm03.fnal.gov
    [gfactory@sngpvm03 usercollectorcondor]$ ssh sngpvm02 openssl x509 -in /etc/grid-security/hostcert.pem -subject -noout
    subject= /DC=org/DC=doegrids/OU=Services/CN=sngpvm02.fnal.gov
    
    [gfactory@sngpvm03 usercollectorcondor]$ vi /home/gfactory/working/usercollectorcondor/certs/condor_mapfile
    
  4. edit the frontend.xml file to know about the rpm schedd
    [gfactory@sngpvm03 usercollectorcondor]$ cd
    [gfactory@sngpvm03 ~]$ cd working/frontend
    [gfactory@sngpvm03 frontend]$ ls
    frontend_frontend-v2_4  frontend.sh  instance_v2_4.cfg
    [gfactory@sngpvm03 frontend]$ cd instance_v2_4.cfg/
    [gfactory@sngpvm03 instance_v2_4.cfg]$ ls
    frontend.xml  frontend.xml~
    [gfactory@sngpvm03 instance_v2_4.cfg]$ vi frontend.xml
    
    
    • change all instances of <schedd DN="/DC=org/DC=doegrids/OU=Services/CN=gfactory/sngpvm03.fnal.gov"
    • to <schedd DN="/DC=org/DC=doegrids/OU=Services/CN=sngpvm03.fnal.gov"
  5. now reconfigure the frontend with your edited frontend.xml
    [gfactory@sngpvm03 instance_v2_4.cfg]$ cd ../frontend_frontend-v2_4/
    [gfactory@sngpvm03 frontend_frontend-v2_4]$ . ../frontend.sh
    [gfactory@sngpvm03 frontend_frontend-v2_4]$ ls
    frontend.b3ehVC.xml     frontend.mapfile  group_main  monitor
    frontend.condor_config  frontend_startup  lock        params.cfg
    frontend.descript       frontend.xml      log         signatures.sha1
    [gfactory@sngpvm03 frontend_frontend-v2_4]$ ./frontend_startup reconfig 
    Usage: frontend_startup reconfig <fname>
    [gfactory@sngpvm03 frontend_frontend-v2_4]$ ./frontend_startup reconfig  ../instance_v2_4.cfg/frontend.xml
    Reconfigured frontend 'frontend-v2_4'
    Active entries are:
      main
    Work files are in /home/gfactory/working/frontend/frontend_frontend-v2_4
    Reconfiguring the frontend                                 [OK]
    [gfactory@sngpvm03 frontend_frontend-v2_4]$ 
    
  6. Start everything up !
    • as user gfactory
      export src=/home/gfactory/working/glideinWMS/install
      $src/manage-glideins --start wmscollector --ini ~dbox/glideinWMS.ini
      $src/manage-glideins --start factory --ini ~dbox/glideinWMS.ini
      $src/manage-glideins --start usercollector --ini ~dbox/glideinWMS.ini
      
    • as user who has sudo to run condor commands
      sudo /etc/init.d/condor start
      
    • as user gfactory again:
      $src/manage-glideins --start vofrontend --ini ~dbox/glideinWMS.ini