Project

General

Profile

SISPI Database Guide

The SISPI database uses PostgreSQL as the underlying database server. Instructions on how to install and initially configure the database can be found in the DES DocDB 4029. The notes below describe the SISPi-specfic configuration of the database. They assume you are familiar with performing admin tasks on a PostgreSQL database.

Table of Contents

Database Configuration

The database software is installed as a eups product in the SISPI products tree, /usr/remote/software/products. We are currently using 9.03. The master database is running on server2. The database directory on server2 is /raid2/database/sispi_db. The mirror is running on server3. The database directory on server3 is /raid3/database/sispi_db. The database runs as user postgres. It is started at boot time by a System V style start script in /etc/init.d/postgresql. The database directories are owned by the postgres account.

The following configuration options in @postgresql.conf@are set differently from the defaults:

# CONNECTIONS AND AUTHENTICATION

listen_addresses = '*'
max_connections = 400

# ERROR REPORTING AND LOGGING

logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m.log'
log_truncate_on_rotation = on
log_rotation_age = 43200

# CLIENT CONNECTION DEFAULTS

search_path = 'constants,telemetry,alarms,users,exposure,bss,obstac,"$user",public' 
timezone = 'UTC'

pg_hba.conf

# All SISPI machines in subnet 139.229.14 (addresses < 128)
host    all         all         139.229.14.0/25       md5

Database Replication

We are using PostgreSQL 9 streaming replication to make a standby copy of the database. We are also streaming a copy to Fermilab to des10. Here are the notes on setting up the replication and doing the failover. The primary is running on server2 and the standby on server3. Note that the configuration of each database must be the same,e.g., if the primary has max_connections = 400 then the standby must also.

CTIO Primary and standby database configuration
  1. In postgresql.conf
    wal_level = hot_standby
    max_wal_senders = 3
    wal_keep_segments = 64
    
FNAL mirror database configuration
  1. In postgresql.conf
    wal_level = hot_standby
    max_wal_senders = 3
    wal_keep_segments = 64
    hot_standby = on
    

    The wal_level=hot_standby allows the FNAL database mirror to support read-only queries. If wal_keep_segments is large enough then we don't need the disk storage of WAL segments. The current value is good for about 1.5 weeks. If the standby gets further out of synch then it must be re-synched as described below. Note that we have added the hot_standby=on option which allows read-only queries to the FNAL database. The standby on server3 is not configured to do this.
  2. In pg_hba.conf on the primary
    # Replication for the standby server3.ctio.noao.edu
    host    replication dbadmin     139.229.14.36/32       md5
    # Replication for the standy server at FNAL
    host    replication dbadmin     131.225.58.29/32       md5
    # Access from des10.fnal.gov
    host    decam_dev   dbadmin     131.225.58.29/32       md5
    
  3. To start up a new standby or to re-synch an existing one you need to make a base backup by copying the primary database data directory to the standby. The rsync will only work between server2 and server3 at CTIO.
    psql -c "SELECT pg_start_backup('label', true)" -U dbadmin postgres
    rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid
    psql -c "SELECT pg_stop_backup()" -U dbadmin postgres
    

    If the FNAL mirror gets out of synch then instead of rsync you will need to make a tar file of the /raid2/database/sispi_db directory. You should exclude the files pg_hba.conf, postgresql.conf and postmaster.pid and the pg_log directory when you make the tar file. Instructions on copying this tarfile to FNAL will be forthcoming (due to the large size we cannot use scp).
  4. Create a recovery.conf file in the standby data directory on server3 (/raid3/database/sispi_db).
    # this parameter must to be set to on.
    standby_mode          = 'on'
    
    # Specifies a connection string which is used for the standby server to connect
    # with the primary.
    primary_conninfo      = 'host=server2.ctio.noao.edu port=5442 user=dbadmin'
    
    # Specifies a trigger file whose presence should cause streaming replication to
    # end (i.e., failover).
    trigger_file = '/raid3/database/failover_trigger'
    
  5. Start standby server
  6. You should see a wal receiver process on the standby and a wal sender on the primary
Triggering failover
  1. Create the trigger file in the specified path by running the failover.sh script in /usr/remote/user/postgres. The standby will detect this file and switch out of replication mode. It will copy the recovery.conf file to recovery.done in the data directory.
  2. When the original machine that contained the primary is restored you can make this machine the new standby by running the script recover_standby.sh in /usr/remote/user/postgres. Note that there will need to be changes made to the SISPI configuration for it to see the new database. Also the FNAL mirror would have to be switched. The pg_hba.conf file would also need to be edited. This needs some more thought.

Database Web Viewers

Configuring Apache
The web space is in /usr/remote/software/www. The Apache server runs on system1. The following configuration options are set in httpd.conf:

# This needs to be on a local disk
LockFile /data_local/web_logs/accept.lock 

# Pick up our version of mod_python and wsgi
LoadModule python_module /usr/remote/software/products/mod_python-3.3.1/lib/mod_python.so 
LoadModule wsgi_module /usr/remote/software/products/mod_wsgi-3.3/lib/mod_wsgi.so

The modules directory is a link to /etc/httpd/modules. The server should be started/stopped/restarted by running /etc/httpd start/stop/restart in ~codemanager/bin. It should be started as codemanager. This should normally be taken care of when the machine boots.

The *.conf files for TV, ConstantsDB, QE and AV are in the conf.d directory. These applications all use wsgi. The ECL application is also in the conf.d but still uses mod_python.

Notes on installing mod_wsgi at CTIO

Installing a mod_wsgi Web Viewer

Notes on installing the ECL

python.conf

<Directory "">
        AddHandler mod_python .py
        PythonHandler handler
        PythonOption session_directory /data_local/web_logs/tmp/mp_sessions
        PythonOption SessionStoragePath /data_local/web_logs/tmp/mp_sessions
        PythonDebug on
</Directory>

Install robots.txt in html.

User-agent: *
Disallow: /

The application configurations are in html/cfg. There is one file for the primary database server and one for the standby. There is a link to the current version.

constants.cfg -> constants.cfg_server2
constants.cfg_server2
constants.cfg_ser@@ver3
des_alarms.cfg -> des_alarms.cfg_server2
des_alarms.cfg_server2
des_alarms.cfg_server3
des_tv.cfg -> des_tv.cfg_server2
des_tv.cfg_server2
des_tv.cfg_server3

In the event of the database failing over to the standby on server3 you will need to run the script switch_web_tools.sh in ~codemanager/bin (you will need to be user codemanager) to switch the links to the server3 version.

The applications are installed in /usr/remote/software/dbwebtools. The support applications ChartDirector, jinja2 and dbiapp_pylib are in the webpylib subdirectory:

Versions

ConstantsDB v1_5b.wsgi
TV 2.0a.wsgi
AV 1.0.wsgi
dbiapi_pylib v1_4a
wsgi_py v1_3
ChartDirector 5.0.2 (64 bit version)

Install ChartDirector-5.0.2/lib into ChartDirector. It was necessary to add a python __init__.py file

from pychartdir import *

The current version of each product has a link to current. In the www/html directory there is a link to this current version

AV -> /usr/remote/software/dbwebtools/des_alarms/current/product
ConstantsDB -> /usr/remote/software/dbwebtools/ConstantsDB/current
TV -> /usr/remote/software/dbwebtools/tv/current/product

Database Monitoring

#
# Every hour, check if database is running. If not send mail to postgres-status alias.
1 * * * * /usr/remote/user/postgres/bin/db_status.sh

alias postgres-status buckley@fnal.gov elliotta@mps.ohio-state.edu

Database Backups

The SISPI database is backed up on a weekly basis. The eups package pgTools contains the script used for the backup as well as a set of utilities that are used by the script. The backup currently runs via cron every Sunday morning at 11:30 am local CTIO time. On completion it sends email to the alias defined above.

#
# Run weekly database backups at 2 minutes past 11 am on Sunday
#
2 11 * * Sun source /usr/remote/software/products/eups/bin/setups.sh; setup pgTools; run_backup.sh
The backup has the following features:
  • We use pg_dump with the -F c option to reduce the size of the dump files - this is the PostgreSQL custom format, these are binary dumps and must be restored using pg_restore.
  • The backup directory is /raid2/database/db_backup/server2. Each backup has its own directory.
  • The script dumps out the current size of the database tables.
  • Large tables are backed up into individual dump files. The tables in this list currently are:
    photo_diodes
    focal_plane_temperature_cntrl
    image_health
    facility_utilities
    ion_pump
    ln2_process
    environmental_data
    cage_temperatures
    telescope_data
    donut_fn1
    donut_fn2
    donut_fn3
    donut_fn4
    donut_fs1
    donut_fs2
    donut_fs3 
    donut_fs4
    
  • The monsoon telemetry tables are put into another dump file as are all the rest of the of the small telemetry tables.
  • The other schema are fairly small and are each put into their own dump file (obstac, constants, alarms, exposure).
  • The desbrdev (Quick Reduce) database and the decamcrl (ECL) database each have their own dump file. It may be necessary to divide up the desbrdev into individual dump files if some of the tables are expected to be large. I am still waiting for this information from the QR team.
  • The user accounts and other administrative things are put into another dump file.
  • A copy of the configuration files is made.
  • Old dump files are cleaned up.
  • The backup directory are rsynced to /raid3/database/db_backup on server3
  • The plan is to also copy the files to FNAL but this is not in place yet.