Project

General

Profile

PMT Development

PMT (process managment tool) is a ruby script that will take as input a list of task/node/port triplets and startup and monitor those processes using MPI. It will monitoring their output and report their exit status.

Config file format

The configuration if passed in to PMT via a file must have the following format:

<program name> <host name> <xmlrpc port>
<program name> <host name> <xmlrpc port>
...

Note that each program/host/port triplet must be unique. The hostname must be the same as the output returned by the hostname command on the node. The config file is documented is the DS50 DAQ overview document in section 4.3.2. It gives an example document:

progFr DSFR1 11080 
progFr DSFR1 11081 
progFr DSFR1 11082 
progFr DSFR1 11083 
progFr DSFR2 11080 
progFr DSFR2 11081 
procFr DSFR2 11082 
progFr DSFR2 11083 
progEb DSEB1 11080 
progEb DSEB1 11081 
progEb DSEB2 11080 
progEb DSEB2 11081 
progEb DSEB3 11080 
progEb DSEB3 11081 
progAg DSEB1 11085 
progAg DSEB2 11085 
progAg DSEB3 11085

XMLRPC interface

There are currently two XMLRPC methods defined:
  • pmt.status - Returns the status of all of the applications that are currently defined. It returns a hash with the following keys:
    • program - Name of the executable
    • host - Name of the host the executable is to run on
    • options - Command line options to the executable
    • status - Current status of the executable
      • idle - Has not started running yet
      • running - Currently running
      • finished - Completed running on it's own
      • interrupted - Was shutdown by PMT before it completed running
    • exitcode - The exit code of the executable. Only valid when the executable has a finished status.
  • pmt.startSystem - Causes PMT to start any configured executables. A list of dictionaries can be passed with this call to configure PMT if it has not been configured on the command line. The dictionaries should have the following keys:
    • program - Name of the executable
    • host - Name of the host the executable is to run on
    • port - The XMLRPC port

Implementation

The PMT ruby script currently has two classes: PMTRPCHandler and MPIHandler. The script creates an instance of MPIHandler and will pass configuration triplets from the command line if a configuration file is specified on the command line. When prompted the MPIHandler class will spawn a thread and create configuration files for MPI. It will then spawn the mpirun_rsh process and monitor the output.

Each executable is spawned by the mpi_wrapper.sh script. This script prints the host name, application name and command line parameters for the application once it is started on a host by MPI. It will also print out the exit code once the application terminates. The wrapper script makes it easier for PMT to monitor what is happening in the distributed application.

Open Questions
  • We (I) need to cleanup where code is placed and run out of. I think the best way to do this is to:
    • Refactor pmt.rb into two files. Most of the code goes into pmt_imp.rb or something and then all pmt.rb would do is import the pmt_imp, instantiate the class and then call start.
    • pmt_imp.rb would go into some ruby or scripts directory that is pointed to by the RUBYLIB env variable
    • pmt.rb would go into bin/ and pmt_t.rb would also go to bin/.
  • How exactly are we passing the exit code? Does PMT return it? If PMT returns it how does SystemControl know what happened?
  • Should we change the port parameter in the configuration to a more generic command line options? Will we have to do any configuration of the processes that are started up?