Project

General

Profile

Introduction

Every 5 minutes this web app takes a snapshot of the SAM database and stores these snapshots in a fully relational SQLite database. Every snapshot has a timestamp, along with a type associated with it; these types currently being datatiers, analysis projects, data locations, and consumed files. By being able to view a high level view of the SAM database, the experiments can now better assess data both quatitatively and visually on various metrics.

SAM Metrics currently supports monitoring of datatiers by file count and storage space, and provides an up to the minute sparkline view of the file count per datatier in the last 3 hours so experiments can see the files coming in as it happens. Experiments can also monitor analysis projects to find an accumated total of completed projects or see the sparklines on active users who have completed projects in the last 3 hours. It is now possible for the experiments to see how many files they have on tape or disk in the data locations section. Lastly, experiments can see how many of their files have been consumed and delivered.

Production

Deployment / Install Notes

1. Get into the node.

kinit <username>
ssh sam@sammongpvm01

2. Get new app installed.

cd /home/sam/web2py/applications
setup git v1_6_4a
git clone ssh://p-sam-metrics@cdcvs.fnal.gov/cvs/projects/sam-metrics
mv sam-metrics metrics

3. Add database connection strings to db.py of the application.

cd metrics
cd models
vim db.py
/*add connection strings*/

4. Add routes to web2py instance.

cd /home/sam/web2py
vim routes.py

routers = dict(
BASE = dict(path_prefix='sam'),
metrics = dict(default_language = '', languages = ['nova', 'minerva', 'minos', 'gm2', 'mu2e', 'coupp', 'holometer']),             
)

5. Add ProxyPass lines and restart Apache.

vim /home/sam/httpd/conf.d/sam_web_servers.conf
ProxyPass /sam/metrics/ http://localhost:21000/sam/metrics/

apachectl configtest
apachectl graceful

6. Start web2py.

source /fnal/ups/etc/setups.sh  //this is actually declared in bashrc
setup python v2_7_3
setup cx_Oracle v5_1_1_o11_py2_7
setup postgres_client v9_3_5_p2_7
nohup python web2py.py --nogui -p 21000 -i 127.0.0.1 -a '' -l /home/sam/private/logs/sam_web2py/httpserver.log &  //do this in the web2py directory

7. Initialise database.
http://sammongpvm01.fnal.gov/sam/metrics/other/populate_all
Now comment out everything in other.py that lives in web2py/applications/metrics/controllers

8. Start scheduler to consume task.
The web application is preloaded with a queued task. The queued task is a python script that takes the snapshots. It is set to run every 300 seconds. In order for the task to run we need to start the scheduler.

nohup python web2py.py -K metrics &  //do this in the web2py directory

9. Visit the following to see everything running:
http://sammongpvm01.fnal.gov:8480/sam/metrics/nova
http://sammongpvm01.fnal.gov:8480/sam/metrics/minerva
http://sammongpvm01.fnal.gov:8480/sam/metrics/minos
http://sammongpvm01.fnal.gov:8480/sam/metrics/gm2
http://sammongpvm01.fnal.gov:8480/sam/metrics/coupp
http://sammongpvm01.fnal.gov:8480/sam/metrics/mu2e
http://sammongpvm01.fnal.gov:8480/sam/metrics/holometer

Optional / Other

1. Add crontab item:

@reboot . /fnal/ups/etc/setups.sh && setup python v2_7_3 && setup cx_Oracle v5_1_1_o11_py2_7 && setup postgres_client v9_3_5_p2_7 && /home/sam/web2py/web2py.py

2. To view the web2py processes that are running do:

pgrep -fl web2py

3. To update the application source do the following in the web2py applications directory:

git pull ssh://p-sam-metrics@cdcvs.fnal.gov/cvs/projects/sam-metrics master

Web2py Instance Install

This is a web2py application that will live in a web2py instance, so web2py is the first dependency for this application to run. Fortunately web2py is easy to run from source. You don't need to follow these steps if a web2py instance already exists as it does in the samweb production environment.

wget http://www.web2py.com/examples/static/web2py_src.zip
unzip web2py_src.zip
rm -r web2py_src.zip
cd web2py
nohup python web2py.py --nogui -p 21000 -i 127.0.0.1 -a '' &

Now navigate to localhost:21000 to see it running if you deployed this locally on your machine. You can use any unused port number of your liking.

Let's say you deployed web2py on fermicloud and you want to expose the web2py administration screen, which can only be accessed locally, you can start web2py with the following:

nohup python web2py.py --nogui -p 23000 -i 0.0.0.0 -a '<ask>'

You will then be asked to create a password.
On your local machine do:

kinit
ssh -L 23000:127.0.0.1:23000 <your user name>@fermicloud###.fnal.gov

If you now navigate to http://localhost:23000/welcome/default/index on your local machine you should be able to get into the administration area from there if need be. You can update the web2py instance to a newer version with a click of a button in the administration area. Also, you can create/read/update/delete/deploy applications in this administration area if you choose, it is very robust.

System Dependencies

cx_Oracle //not anymore
postgres_client v9_3_5_p2_7
Python 2.5, 2.6, or 2.7

To alleviate the pain of system dependencies, the following conveniently ship with web2py:
pg8000 - is the driver that could be used to talk to PostgreSQL databases.
sqlite3 - is the driver that is used to talk to the SQLite database.
simplejson - is used in place of json.

Metrics Web App Specifics

The only thing that talks to the database is the task that gets run by the scheduler every 5 minutes. The task is defined in scheduler.py. It runs the following queries against each of the SAM experiment databases and stores all this data in a fully relational localized SQLite 3 database:

SELECT c.DATA_TIER, COUNT(U.DATA_TIER_ID), SUM(u.FILE_SIZE_IN_BYTES) FROM DATA_FILES u RIGHT OUTER JOIN DATA_TIERS c ON u.DATA_TIER_ID = c.DATA_TIER_ID GROUP BY DATA_TIER;
SELECT c.USERNAME, u.PROJECT_STATUS, COUNT(*) FROM ANALYSIS_PROJECTS u JOIN PERSONS c ON u.PERSON_ID = c.PERSON_ID WHERE u.END_TIME > (current_timestamp - interval '5' minute) GROUP BY c.USERNAME, u.PROJECT_STATUS ORDER BY c.USERNAME;
SELECT PROJECT_STATUS,COUNT(PROJECT_STATUS) FROM (SELECT PROJECT_STATUS FROM ANALYSIS_PROJECTS WHERE PROJECT_STATUS IN ('running','reserved','starting', 'ended complete', 'ended incomplete', 'unknown') ) ATABLE GROUP BY PROJECT_STATUS;
SELECT b.LOCATION_TYPE, COUNT(*) FROM DATA_FILE_LOCATIONS a JOIN DATA_STORAGE_LOCATIONS b ON a.LOCATION_ID = b.LOCATION_ID GROUP BY b.LOCATION_TYPE;
SELECT CONSUMED_FILE_STATUS,COUNT(CONSUMED_FILE_STATUS) FROM (SELECT CONSUMED_FILE_STATUS FROM CONSUMED_FILES WHERE CONSUMED_FILE_STATUS IN ('failed','delivered','skipped', 'consumed')) ATABLE GROUP BY CONSUMED_FILE_STATUS;

This is a multi-tenant application that currently supports nova, minerva, minos, gm2, coupp, mu2e, and holometer. Every request get's associated with an exp_id (experiment id) which is then used to get the appropriate data, per tenant, from the SQLite database. This is possible because each snapshot is associated with an experiment.

Models

The application currently is designed to work with both Oracle and PostgreSQL databases. Currently most experiments are using PostgreSQL databases. The only thing to change in order for this to run on a PostgreSQL database or an Oracle database is to change the connection strings in the db.py file.

If it is ever decided to move away from SQLite, the sql.log can be found at: sql.log

SQLite Tables

You can view the SQLite graph model here: Graph Model

Here is a short description of the tables and how they are used:

datatier: This is the applications internal representation of the data tiers. Every experiment has their own list of data tiers in this table. All the experiments data tiers are tagged. If the web application detects a change in data tiers for a particular experiment, it immediately marks the current ones as inactive, and then creates a new tag, and lastly will create a new set as active based on the new tag. This change effectively resets the data tier chart to prevent the chart from breaking. A bookmark is created in the bookmark table for book keeping and backtracking.

type: Every snapshot is associated with a type. These currently being data_tier, analysis_project, analysis_project_2, data_file_location, and consumed_file.

experiment: Every snapshot is associated with an experiment. These currently being nova, minerva, minos, gm2, mu2e, coupp, and holometer.

tag: This is table that is used to tag the data tiers. Every experiment starts off with one tag. The tag.id auto increments so if a data tier change was detected for a particular experiment, then this particular experiment gets a new entry in this table. When it's time to create the data tier chart for an experiment, it will use its last tag.id to retrieve the appropriate data tiers.

snapshot: Here is where the metadata for the snapshots get stored. Every query listed above will get run for every experiment. Each and every time a query runs, a new record is inserted in this table. As of now, one pass through will create 35 records because 7 experiments * 5 types = 35. So every 5 minutes 35 records are being created in this table.

metric: This is where the data gets stored for the data tiers in regards to total count and total size for each of the data tiers. This table references the tag table.

project_status: This table contains starting, running, ended complete, ended incomplete, reserved, and unknown.

analysis_project: This table does not always get populated with data per snapshot because it queries the PostgreSQL database with an interval of 3 hours ago. If there is data to be stored it retrieves the username, project status, and total count of users who are running analysis projects. It is used by the the analysis project spark lines.

analysis_project_2: In this table it stores the total counts per project status. It is used by the analysis project chart.

location_type: You guessed it. This table contains tape, disk, and station.

data_file_location: In this table it stores the total counts per location_type. It is used by the data locations chart.

consumed_file_status: Contains delivered, consumed, skipped, and failed.

consumed_file: This is the table that stores the total counts per consumed file status. It is used by the consumed files chart.

response_time: Only one query is timed per experiment. The response time gets stored here in seconds, and is used by the response time row on the index page which just looks at the last response time for the experiment.

bookmark: This is the book keeping table in regards to data tier changes. It stores what changed, also known as the delta, and stores the data tiers before going ahead and storing the new ones.

connectivity_notification: When the web app can't connect to a remote database, it sends out an email notification, and logs the exception in this table. Only one email will be sent in a 2 hour span, per experiment, to prevent spam.

scheduler_task This is the table where the queued task lives. The scheduler picks up the task from this table, and logs certain information in the record like times run, times failed, last run time, and status. A task is defined by its function, how many times to repeat, how many times to retry when the status is failed, period in-between runs in seconds, and a timeout in seconds.

scheduler_run: Things like run output, run result, and tracebacks are stored in this table along with the worker name and its associated pid.

scheduler_worker: Contains the heartbeat of the worker, and its status. Heartbeat is used in the heartbeat row of the index page.

Pre-populating Tables

There is a function that needs to be called in order to populate the database with much needed information and it needs to be done only once. You can do this through the use of the following URL: http://sammongpvm01.fnal.gov/sam/metrics/other/populate_all
After population it is necessary to comment everything out of other.py as this is an exposed function.

The populate_all function populates the following tables:
experiment
tag
datatier
project_status
type
location_type
consumed_file_status
scheduler_task

If you add another experiment to this application, it is necessary that the SQLite database knows about it. You will need to add an entry in the experiment table with the experiment name, and you must add an entry in the tag table of the new experiments id.

Scheduler

The script that takes the snapshots is called scheduler.py and lives in: /web2py/applications/metrics/models
This script gets run as the task every 5 minutes. If you add a new experiment, you must add new blocks of code to this file. It should be rather straight forward to do this.

Controllers

All the controllers are in default.py which lives in: /web2py/applications/metrics/controllers

The following functions are exposed:
index
datatier_sparklines_ajax
datatier_chart
datatier_pie_charts
datatier_sparklines
analysis_project_sparklines
analysis_project_chart
data_locations_chart
consumed_files_chart
analysis_project_sparklines_ajax
user*
download*
call*
data*

*denotes web2py specific

Every non web2py specific controller that does not end in "ajax" gets associated with a view.

Views

All the views live in: /web2py/applications/metrics/views

The following views are exposed:

default/analysis_project_chart.html extends layout.html
default/analysis_project_sparklines.html extends layout2.html
default/consumed_files_chart.html extends layout.html
default/data_locations_chart.html extends layout.html
default/datatier_chart.html extends layout.html
default/datatier_pie_charts.html extends layout.html
default/datatier_sparklines.html extends layout2.html
default/index.html extends layout.html

layout.html includes web2py_ajax.html
layout2.html includes web2py_ajax2.html
web2py_ajax.html
web2py_ajax2.html

All static files are in the static directory which lives in: web2py/applications/metrics/static
This includes the css, images, and js directories.

Routes

The file called routes.py that lives in the root directory of web2py is very important in order for everything to work appropriately. Web2py did not ship with this file, we had to add it manually. The contents of this file is as follows:

routers = dict(
BASE = dict(path_prefix='sam'),
metrics = dict(default_language = '', languages = ['nova', 'minerva', 'minos', 'gm2', 'mu2e', 'coupp', 'holometer']),             
)

It is adding a path prefix to all of web2py. It is also then assigning an application specific route. Languages are being used to specify the experiments. We use the languages in order to make this application multi-tenant.
We can now effectively have a path of: sam/metrics/<experiment> thanks to routes.py. Remember to append any new experiment to this list. Changes take effect upon server restart, or when you click on the "Reload routes" button in the admin screen of web2py.

General Web2py

httpserver.log

Web2py uses a built in server called Rocket. It logs all requests to a file called httpserver.log that lives in the web2y root directory by default. However, we are specifying a special path for web2py logs to go when we start the web2py instance; the full path to the file is /home/sam/private/logs/sam_web2py/httpserver.log. Because the Apache server proxies everything, you can just look in the following directory for Apache log files: /home/sam/httpd/logs

Resources

http://www.web2py.com/book

https://groups.google.com/d/forum/web2py

https://developers.google.com/chart/interactive/docs/reference

https://developers.google.com/chart/interactive/docs/gallery/annotationchart

Other

Psycopg2

This application uses the psycopg2 driver. If it is not available it will fall back on the web2py built in pg8000 driver. These steps show you how to install it on a new virtual machine, and how to use it in python.

Install

kinit
ssh fermicloud060
ksu products
source /fnal/ups/etc/setups.sh
setup upd
upd list postgres_client -aK+
ups flavor  //this gives useful information so we can pick the right product
rpm -q glibc  //so does this
upd install postgres_client v9_3_5_p2_7
ups list postgres_client -aK+
setup postgres_client v9_3_5_p2_7
setup python v2_7_6

Use

python
import psycopg2
c = psycopg2.connect('dbname=<datbasename> host=<hostname> port=<portnumber> user=<username> password=<password>')
cur = c.cursor()
cur.execute("select * from pg_stat_activity")
cur.fetchall()
c.rollback()

VM Rebooted?

last reboot
uptime
cd web2py
setup python v2_7_3
setup postgres_client v9_3_5_p2_7
nohup python web2py.py --nogui -p 21000 -i 127.0.0.1 -a '' -l /home/sam/private/logs/sam_web2py/httpserver.log &
nohup python web2py.py -K metrics &

Other Possible Queries

#Get file counts by user.
SELECT COUNT(df.FILE_ID), df.CREATE_USER_ID, p.USERNAME FROM DATA_FILES df LEFT JOIN PERSONS p ON df.CREATE_USER_ID = p.PERSON_ID GROUP BY df.CREATE_USER_ID, p.USERNAME
SELECT COUNT(df.FILE_ID), df.CREATE_USER_ID, p.USERNAME FROM DATA_FILES df RIGHT OUTER JOIN PERSONS p ON df.CREATE_USER_ID = p.PERSON_ID GROUP BY df.CREATE_USER_ID, p.USERNAME

Runaway Processes?

pkill -9 -f "python web2py.py" 
pstree -p 51566