Project

General

Profile

Tunnel monitoring » History » Version 3

« Previous - Version 3/6 (diff) - Next » - Current version
Matthew Tamsett, 04/05/2016 09:47 AM
Complete first draft


Tunnel monitoring

This page explains the ROC tunnel monitor page.

Introduction

NOvA ROCs have to connect to VNC sessions in order to control the NOvA detectors. These VNC sessions live on machines that are only accessible within the fnal.gov domain. To connect to these machines remotely one must connect via gateway machines as explained on this page. This page documents the software used to monitor the status of these tunnels.

The ROC tunnel monitor software is provided in the devs repository here. It consists of a light weight monitoring client that runs on each of the two gateway nodes. The results of this monitoring are sent to a server that logs this information. Both the clients and the server are run via crontabs. Finally a web page visualises this information.

Common tools

The monitoring clients and server are both written in python and use a common set of tools provided here.

Client

The tunnel monitoring client runs on both of the gateway machines (novadaq-far-gateway-01, novadaq-near-gateway-01). The software to run it is installed here:

/home/novadaq/DAQ-gateway/monitoring

This software consists of a clone of the devs repository mentioned above. The software was manually put in place rather than checked out via SVN, this should be corrected. The client software can be run via the following commands:

$ cd /home/novadaq/DAQ-gateway/monitoring
$ ./run_tunnel_monitoring.sh 

This will result in the output:

--- Run status monitoring
Tue Apr  5 07:51:17 CDT 2016
--- user: novadaq
--- whoami: novadaq
--- Running
tools:     machine: novadaq-far-gateway-01.fnal.gov
tools:     2016-04-05 07:51:17
5900 5951 novacr01 novadaq-far-master-02 novacr01 Apr04 1:56
5901 5952 novacr02 novadaq-far-master-02 novacr02 Apr04 1:39
--- done
Tue Apr  5 07:51:17 CDT 2016

The above command runs a very simple wrapper script run_tunnel_monitoring.sh, which in turn runs the python script tunnel_monitoring_client.py with no augments. This python script run without any command line augments will determine the status of the tunnels using the functions in the common tools. Specifically it runs the FindTunnels function which uses the subprocess python module to run the ps aux shell command. The output of this command is parsed to find ssh tunnel processes. The details of any processses found are stored in a dictionary object and then transmitted, along with a timestamp and the machine ID, as a string via the UDP protocal to a server running on the machine and port configured in the script. Note the machine and port can be overridden using the "-a" and "-p" command line options to tunnel_monitoring_tools.py script if desired.

The client can also be used to send a few other simple commands to the monitoring server. These include a simple heartbeat message:

$ python tunnel_monitoring_client.py -b

a custom message:

$ python tunnel_monitoring_client.py -m "my message" 

or a shutdown command:

$ python tunnel_monitoring_client.py -s

This message will result in the termination of the server.

Server

A small python based UDP server is run on the machine novagpvm10.fnal.gov. This server receives information from both the monitoring clients. It then parses and logs this information into a web accessible JSON file that will later be read by the monitoring web page. It also records and writes heartbeats to enable users to tell that the server is currently active.

The server is installed here:

/nova/app/users/tamsett/devsrepo/TunnelMonitoring

It is run via the tunnel_monitoring_server.py script. A new server can be initiated via the following command:

$ python tunnel_monitoring_server.py-p 12399 -b /tmp/tmp_mon.json

Once initiated a server will persist until it is terminated (either via ^c or by receiving a shutdown message from a client). To probe this server start a fresh terminal and from the tunnel monitoring directory run the following (assuming the above server was started on novagpvm09):

$ python tunnel_monitoring_client.py -p 12399 -a novagpvm09.fnal.gov -b

on the terminal with the server running you should see:

2016-04-05 08:38:13 Received HEARTBEAT message from novagpvm10.fnal.gov

The server works by listening on the configured port for incoming messages from clients. If a received message is a shutdown message the server terminates. If it is of a non-standard format then it is simply printed to screen. If the message is a heartbeat then the time of the heartbeat is writting into the heart beat file. If this message is the expected dictionary of information then the server will append this information to an output file using the appendToFile function in the common tools. The output file is a JSON formatted file containing information on all the status checks run. An individual output file is written for each machine from which the server receives information. Output files are automatically truncated to a maximum of 1,000 entries.

Crontab control

The running of the server and clients is controlled via crontabs that run shell scripts on each of the respective machines. For example the client on novadaq@novadaq-far-gateway-01:

$ crontab -l
SHELL=/bin/bash
MAILTO="" 
#-----------------------------------------------------------------------------
BASE=/home/novadaq/DAQ-gateway/monitoring
LOGS=/home/novadaq/DAQ-gateway/logs
#-----------------------------------------------------------------------------
1-59/2 * * * *           $KCRON $BASE/run_tunnel_monitoring.sh &> $LOGS/cronlog_run_tunnel_monitoring.txt

An example client crontab is provided here. The actual crontabs used are slight variations on this (with the far gateway sending information every odd minute and the near gateway every even minute).

The server crontab runs both the server itself (which is restarted once a day) as well and a instance of the client which simply outputs heartbeat messages:

SHELL=/bin/bash
MAILTO="" 
#-----------------------------------------------------------------------------
KCRON=/usr/krb5/bin/kcron
LOGS=/nova/app/users/tamsett/cronlogs
TUN=/nova/app/users/tamsett/devsrepo/TunnelMonitoring
#-----------------------------------------------------------------------------
32 07 * * *         $KCRON $TUN/run_tunnel_monitor_server.sh    &> $LOGS/cronlog_tunnel_monitor_server.txt
* * * * *           $KCRON $TUN/run_tunnel_monitor_heartbeat.sh &> $LOGS/cronlog_tunnel_monitor_heartbeat.txt

Web page

The results of the monitoring are visualised using a web page. This page consists of a skeleton HTML page and a JavaScript programme.

An example of the web page can be seen here. This page is written in HTML and formatted using the Bootstrap framework it then loads the D3 JS library that will be used to visualise the results (the installation it loads is that used by the production web pages). It then also loads a small JS program which runs the loading spinner and the visualisation script itself.

The actual versions of the web page and visualisation script live in a web accessible folder here:

/nusoft/app/web/htdoc/nova/users/tamsett/tunnel_monitor/

The visualisation script reads in two JSON files, one for each gateway and runs the setup function on each of these. This function reads the appropriate input file, then runs the draw_plot, fill_summaries and fill_open_tunnels functions for each of these. These functions draw the n_tunnels vs time plot, fill the summary panel (the one above the plot) and fill the information on open tunnels panel (the one below the plot) respectively. If the latest identified tunnels match the expectations of the number of tunnels configured here then the status is marked as good.

The script also runs the server_status function. This function opens the heartbeat file for the date of the last heartbeat. It then sets the status of the top panel of the page appropriately.

Setup

To setup an instance of the monitoring from scratch one would do the following (not tested):

Server setup

Make a working directory:

$ mkdir -p /nova/app/users/<user name>/TunnelMonitoring
$ cd !$

download the latest version of the software:

$ export DEVSREPO=svn+ssh://p-novaart@cdcvs.fnal.gov/cvs/projects/novaart-devs
$ svn checkout $DEVSREPO/trunk/users/tamsett/TunnelMonitoring .

Edit the software so that all paths point to your desired locations. Specifically:

  • change the "prefix" default in this line to where the output data will be held (see Tunnel_monitoring for details on this path). * change the "HEARTBEAT_FILE" here to where the output heartbeat file will live * change LOGS and TUN here * You should then also change the address of the server and the port (note this needs to be done in both the server and client software, or the shell scripts need to be updated to provide new defaults.

The above is likely a non-exhaustive list - the first person to do this should be careful to check for all non-relative paths in the software - anything with "tamsett" in it is suspect.

Setup up a crontab to run the client.

$ crontab server_crontab

Web area setup

Make a web accessible folder:

$ mkdir -p /nusoft/app/web/htdoc/nova/users/<user name>/tunnel_monitor/data/

then copy the two files in the web subdirectory into it

$ cp web/* /nusoft/app/web/htdoc/nova/users/<user name>/tunnel_monitor/

Your web page should new be visible here: http://nusoft.fnal.gov/nova/users/&lt;user name>/tunnel_monitor/index.html

Client setup

For each of the near and far gateways one should log on as novadaq. Navigate to an appropriate folder, copy the latest version of the tunnel monitoring software in place (I do this by taring up the client software and scp-ing it over.

Next you should check that the client software is pointing to your new server. Then when ready setup the crontabs on each of these machines (note the below command will overwrite the existing crontab so be sure to make a copy):

$ crontab client_crontab

That should be all. In a minute or so data should start rolling in to the server.

To do

  • Replace manual copies of software on each of the gateway machines with SVN versioned software. This was not done intially as neither SVN or NOvA-soft is provided out-of-the-box on these machines.
    h2. Web page * Move existing tunnel monitoring out of tamsett's area. * Test the setup instructions