Project

General

Profile

DCS setup diary

2016/06/09

Install postgres server needed for CSS AlarmServer, ArchiveEngine

-Use postgres (not postgresql) UPS package which has postgres server. (Note postgresql package on scisoft only has clients.) Copied from MicroBooNE.
-Install any dependencies (openssl, openldap)
-UPS package copied to /mu2e directory 2016/06/09 by Glenn

Use epics and epics_css UPS packages already made (by me a while ago)

- Copied to /mu2e directory 2016/06/09 by Glenn
- Source the script in ~mu2edcs/bin/setup-EPICS.sh to set up epics and CSS

Set up postgres database

- run under mu2edcs account for now
- Run initdb in /home/mu2edcs/pgsql/data as follows:
-- initdb -D /home/mu2edcs/pgsql/data --auth-local=trust --auth-host=md5
- Start server: pg_ctl -D /home/mu2edcs/pgsql/data -l logfile start
- Verify working using psql: psql -U mu2edcs postgres
- Set a password for the mu2edcs user using SQL command
-- alter user mu2edcs encrypted password 'XXXXXXX';
- Edit pg_hba.conf and change “trust” to “md5” for authentication method.
- Restart server: pg_ctl -D /home/mu2edcs/pgsql/data -l logfile restart
- Try psql again and verify password required.
- All set!

Set up running EPICS system for mu2e: our data-providing things (IOCs and data-scrapers)

- Get the stuff from git: git clone ssh:///cvs/projects/mu2e-dcs
- Copy pyepics-3.2.4-py2.6.egg from http://cars9.uchicago.edu/software/python/pyepics3/ to ~/.local/lib/python2.6/site-packages/ so python-based data-scrapers will work
- [.... update code to de-MicroBooNE it … future set-uppers won’t have to do that….]
- make_db

2016/06/10

[... continue de-uBooNE-ing the dcs code reused from uB…]

-- IFBeamDataReader -- done
-- GangliaReader -- in progress
-- Rackmonbox -- next

2016/06/14

GangliaReader

- Got EPICS db files setup for DAQStatus
-- For now, Only acquiring Data Logger and Online Monitor values for Run Number, Average Event Size, Data Rate, Event Rate, Average Input Wait Time, art Queue Wait Time
-- There are 40 event builders with various data rates -- do all need to be imported into EPICS for alarm purposes? We can do it if needed, not done now.
- Also got CompStatus (computer status) db set up
- Need to start CompStatus data provider running in cron jobs
-- IPMI access issue on mu2edaq01

2016/06/16

Crontab

- Worked on crontab set up a bit

GangliaReader, cont

- Still needed to set up ganglia_reader_config.csv
- Ganglia isn’t multicasting -- can monitor work anyway?
- Set up ganglia_reader_config.csv, beamdata_config.csv

Finish setting up some run scripts, crontab

- The scripts run many things in "screen"
-- especially the ioc's since they really work best with a console input,
-- but also the java-based apps, since they sometimes dump things to stdout without timestamps, and "screen" can fix that
- Had to restart gmond and gmetad -- they were being slow and not responding to new metrics, not sure why, but restart seemed to fix
- Gmetad not running on any host except daq01, so no load_one, disk_free, or other statistics
- Now running all things necessary for IFBeamDBReader (BeamData, Weather) and GangliaReader (CompStatus, DAQStatus) and running cron job to keep them running and keep pushing data in.

2016/06/20

Checking “rackmon box” BeagleBones -- do they still work?

- Reminder: the Beagles are on 192.168.1.1 subnet, on “eth1:1” interface from daq01, with addresses 192.168.1.204-207.
- IOC runs ok, can read metric with EPICS_CA_ADDR_LIST=192.168.1.204, but not 192.168.1.255
-- My mistake, need to use 192.168.1.0 in EPICS_CA_ADDR_LIST for broadcast.
- Strange broadcast settings for subnets, also strange mask for eth1:1:

eth1      Link encap:Ethernet  HWaddr 00:25:90:88:0E:E5  
          inet addr:192.168.157.1  Bcast:192.168.157.127  Mask:255.255.255.128
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                  
          RX packets:82000395 errors:0 dropped:0 overruns:0 frame:0           
          TX packets:17051151 errors:0 dropped:0 overruns:0 carrier:0         
          collisions:0 txqueuelen:1000                                        
          RX bytes:111002453916 (103.3 GiB)  TX bytes:5096015428 (4.7 GiB)    

eth1:0    Link encap:Ethernet  HWaddr 00:25:90:88:0E:E5  
          inet addr:192.168.157.200  Bcast:192.168.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                  

eth1:1    Link encap:Ethernet  HWaddr 00:25:90:88:0E:E5  
          inet addr:192.168.1.1  Bcast:192.168.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1            

- Problems:
-- The eth1:1 interface should have bcast 192.168.1.255 and mask 255.255.255.0
-- The eth1:0 interfaces should have bcast 192.168.157.255 and mask 255.255.255.0.
- I fixed those using ksu privilege and ifconfig eth1:1 “netmask 255.255.255.0; ifconfig eth1:1 broadcast” and likewise for eth1:0.
-- Not sure that was necessary, but it was weird as it was.

Now turn to setting up archiver

- I discover https://panda-wiki.gsi.de/foswiki/bin/view/DCS/PANDACSS and think it looks interesting, get distracted
-- it's just a different bundle of CSS 3.1.1
-- apparently abandoned in 2013? Promised update to 3.2 on March 28, 2013, never came.
- (… then I waste hours trying to see if I can trivially download some sources and type a few commands to build from source… nope…)
- Get PostgreSQL schema for archiver from https://github.com/ControlSystemStudio/cs-studio/blob/master/applications/archive/archive-plugins/org.csstudio.archive.rdb/dbd/postgres_schema.txt
-- Better URL for direct downlaod: https://raw.githubusercontent.com/ControlSystemStudio/cs-studio/master/applications/archive/archive-plugins/org.csstudio.archive.rdb/dbd/postgres_schema.txt
- Then EDIT to change “timestamp” to “timestamp with timezone”
- Further edits to name engine, table users as I like
- Set up demo channels, works fine
- Then set up archiving of all channels in so far (except current time and the like) semi-manually using add_channels.py
- Archiver running on Beam, Computer, DAQ, and Weather status!

2016/06/23

Set up alarm server

- Get PostgreSQL schema for archiver from https://github.com/ControlSystemStudio/cs-studio/blob/master/applications/alarm/alarm-plugins/org.csstudio.alarm.beast/dbd/ALARM_POSTGRES.sql
- Then EDIT to change TIMESTAMP to TIMESTAMP WITH TIME ZONE
- also change user names, table names as I like
- run that in psql -- tables now set up
- now run the AlarmConfigTool in just the right way to set things up
-- in mu2e-dcs/apps/alarmserver_setup directory, use alarm-modify.sh with ../../make_db/alarm.xml made by running "make" in make_db earlier.
-- Note: I didn't use alarm-modify.sh exactly as-is for a new install; it will work as-is on later modifications. Slight change to options needed if you want to import alarm.xml in a way that creates all database tables, possibly throwing away what was there before.
- Updated the run_alarmserver script.
- Installed apache activemq in a local directory, made symlink from setup_alarmserver
- run -- seems to be working!
- set configuration of CSS GUI to use right JMS and RDB.
-- Alarm panels show up correctly in CSS now!

Now address some alarms

- See the following alarms in Weather system (copied from Alarm Table):

wind speed from G:WINDSP Mu2e_Weather_2/windspeed_mph
PV                      : Mu2e_Weather_2/windspeed_mph
Alarm Time              : 2016/06/23 10:28:03 (Time since event: 00:18:09)
Alarm Severity/Message  : INVALID/UDF_ALARM
Alarm Value             : 0
Current Severity/Message: INVALID/UDF_ALARM

solar intensity from G:SOLRAD Mu2e_Weather_2/solarintensity
PV                      : Mu2e_Weather_2/solarintensity
Alarm Time              : 2016/06/23 10:28:953 (Time since event: 00:18:09)
Alarm Severity/Message  : INVALID/UDF_ALARM
Alarm Value             : 0.0
Current Severity/Message: INVALID/UDF_ALARM

- Seems to be a problem with how I was reading those metrics from IFBeamDB
-- was misspelling in ifbdreader_config.csv, fixed

2016/06/24

Switch to running (most) things on mu2edaq08

Port scanner alarm triggered by ActiveMQ web page. Probably archive engine page would eventually too. Switching to machine with no address on the public net should eliminate that.

- Could also reconfigure servers to use only certain interfaces, and/or configure iptables, but it was thought this would be quicker.
- Actually took many hours on a Friday night. :(
- subnet issues of various kinds
-- udp broadcasts not working on 192.168.122.* net (vibr0), switched to 192.168.157.* (ctrl net).
-- mu2edaq08 can't reach external world! route table says mu2edaq01-ctrl should be gateway, but can't reach ifbeam db server, for example.
--- Running ifbeamdb data scraper on daq01 still for now. (No web page, pure pull from outside, caput to net)
- switched postgresql server to daq08 too -- not sure this was a good idea, but I did it anyway.
-- have to change history .plt files for saved history plots in CSS
- mu2edaq01 is broadcasting it's metrics as host "mu2edaq01.fnal.gov" instead of "mu2edaq01-ctrl.fnal.gov" --- all others use "*-ctrl.fnal.gov"
- real pain with some of our java programs not liking the /etc/hosts file on mu2edaq08 not having an entry corresponding to what "hostname" was set to. The hostname was "mu2edaq08.fnal.gov", but DNS has no such entry, nor does /etc/hosts.
-- fixed by changing hostname to "mu2edaq08" instead.

2016/06/25

Finished work on mu2edaq08 switch

- and committed.

Investigate variables in alarm

- Adjusted fan speed ranges
- Some hosts are offline or gone, need to disable alarms for those (have not done yet)
- daq04,05,07 don't seem to be running ganglia standard metrics (load_one, disk_free, etc)
-- investigate later

2016/06/26

Improved displays