Project

General

Profile

How to run the production DAQ system

Standard running

For standard data taking, the following steps should be used:
  • log into the dsfr1 node as user daq
    • ssh -X -l daq -p 2021 ds50daq.lngs.infn.it
  • connect to the existing tmux session
    • tmux attach -t current
  • in the first window, is the primary program for running the DAQ. In principle, it can be left alone. However, if needs to be restarted, you can type ctrl-c and then type
    • startCommPhase2System.sh
  • in the second window are the commands that control the state of data taking. (for a full list of the options that are available to the manageCommPhase2System.sh script, please see the bottom of this page)
    • manageCommPhase2System.sh <optional parameters> init
    • manageCommPhase2System.sh <optional parameters> start
    • manageCommPhase2System.sh <optional parameters> stop
    • manageCommPhase2System.sh <optional parameters> reinit
    • manageCommPhase2System.sh <optional parameters> fast-reinit

PLEASE NOTE that when we are writing data to disk, it is very important to stop the run as gracefully as possible to ensure that the disk file(s) get closed correctly. (If we don't close the data file(s) correctly, then it will be impossible to read it offline.) So, please always try to send the "stop" command to the DAQ, and please be patient as the manageCommPhase2System.sh script sends it to the individual DAQ processes.

Different options to the manageCommPhase2System.sh script work for different commands. For example, the gate width and other "configuration" options go with the "init" command. And, the run type and run comment options go with the "start" command. And, there are options to the "stop" command that allow the user to specify an automated way to end the run (based on number of events, time of run, or size of disk file).

The four trigger-related options are
  • -l, which specifies the laser trigger rate
  • -p, which specifies the internal pulser triggger rate
  • -r, which enables the random trigger
  • -t, which specifies the minimum number of phototubes in the majority logic trigger
Here are sample commands for a typical laser run:
  • vme_sysreset; manageCommPhase2System.sh -l<laserRate> -g<gateWidth> reinit
    • this command has disk-writing and online monitoring ON, by default.
    • if you prefer to run without disk-writing, add the "-D" option to the "reinit" command listed above
    • if you prefer to run without the online monitoring, add "-m off" to the "reinit" command
  • manageCommPhase2System.sh -T laser -C <run comment> start
  • manageCommPhase2System.sh stop

Online monitoring configuration

The configuration of the online monitoring is now contained in a special file. This file is currently located at
  • /home/daq/current/onmon_config.rb
It contains Ruby variables that specify
  • the prescale that should be used for events that are sent to the online monitoring
  • which online monitoring modules should be run
  • the configuration of each of the online monitoring modules

When you change this file, and save it, you will need to send a "reinit" command to the system to have your changes take effect.

XPRA

To start the XPRA server on the dsag machine
  • log into dsag as user daq
    • ssh -X -l daq -p 2100 ds50daq.lngs.infn.it
  • xpra start --no-pulseaudio --enable-sharing :50
To run an XPRA client - PLEASE NOTE that these instructions may change as we find (and document) better ways to run the xpra client
  • log into dsag as user daq
    • ssh -X -l daq -p 2100 ds50daq.lngs.infn.it
  • xpra attach ssh:dsag:50 --enable-sharing --encoding=jpeg
  • from other nodes (not dsag), the following command can be used:
    • xpra attach --ssh="ssh -p 2100 -l daq" ssh:ds50daq.lngs.infn.it:50 --encoding=jpeg --enable-sharing
If you have problems connecting from dsag (it prompts you for a password), please try the following:
  • xpra attach --ssh="ssh -i $HOME/.ssh/id_rsa_empty -p 2100 -l daq" ssh:ds50daq.lngs.infn.it:50 --encoding=jpeg --enable-sharing

Data files

The data files are written to the /data disk on the dsag machine. They are first written to the /data/incoming area and then are moved to /data/complete.

Log files

The log files are in directory /daqlogs, and this directory is common and available from all of the nodes on the DAQ cluster. There are subdirectories for each of the types of applications

  • boardreader
  • eventbuilder
  • aggregator

and there are directories for the main program that runs the DAQ ("pmt") and for the commands that are sent to the DAQ ("masterControl").

TMUX tips

To switch between windows:
  • 'ctrl-a, <number>'
  • (there are other ways to do this if your keyboard supports it)
To disconnect from a TMUX session:
  • 'ctrl-a, d'
If you accidentally exit one of the windows in the TMUX session, you can use the following steps to recover:
  • type 'ctrl-a, c' (that is, type 'ctrl-a', release those keys, and then type the letter 'c' [for create window])
  • if you want the specify a special name for the window, you can type 'ctrl-a' then a comma (','). This will display the current name of the window, which you can then change.
  • to setup the environment in the new window, type 'setup_ds50daq'
To start a new TMUX session:
  • simply type 'tmux' after logging in to the daq account on dsfr1
  • to change the name of the session, use 'ctrl-a, $' and over-write the existing name
    • you may need to do this if you are creating the primary session for data taking that usually has the name "current"
To scroll up in a TMUX window (in case the console history isn't working naturally for you):
  • type 'ctrl-a, ['
  • then use 'ctrl-b' and 'ctrl-f' to scroll backward and forward
  • hit Enter to leave the special scroll mode

Here is a reference page on TMUX: http://www.openbsd.org/cgi-bin/man.cgi?query=tmux&sektion=1

manageCommPhase2System.sh reference

For reference, here is the help text associated with the manageCommPhase2System.sh command:

[dsfr1:1004:0]~/current/profile$ manageCommPhase2System.sh --help

Usage: manageCommPhase2System.sh [options] <command> [command options]
Where command is one of:
init, start, pause, resume, stop,
shutdown, start-system, restart, reinit, exit,
fast-shutdown, fast-restart, fast-reinit, or fast-exit
Options:
-h, --help: prints this usage message
-p <pulser rate>: specifies the rate of the pulser trigger
-l <laser rate>: specifies the rate of the laser trigger (default is zero, i.e. laser is off)
-r : enables the random trigger
-g <gate width>: specifies the acquisition gate width in usec [default=400]
-t <low threshold>: specifies the low threshold for the majority
logic trigger [default=5]
-c <compression level>: specifies the ADC data compression level
0 = no compression
1 = compression, both raw and compressed data kept
2 = compression, only compressed data kept [default]
-m <on|off>: specifies whether to run online monitoring [default=on]
-o <data dir>: specifies the directory for data files [default=/data/test]
-C <comment>: specify the run comment
-T <type>: specify the run type
-D : disables the writing of data to disk
-n <event count>: specifies the desired number of events in the run (stop command)
-d <run duration>: specifies the desired length of the run (minutes, stop command)
-s <file size>: specifies the desired file size (in MB, stop command)
Notes:
The primary commands are the following:
* init - initializes (configures) the DAQ processes
* start - starts a run
* pause - pauses the run
* resume - resumes the run
* stop - stops the run
Additional commands include:
* shutdown - stops the run (if one is going), resets the DAQ processes
to their ground state (if needed), and stops the MPI program (DAQ processes)
* start-system - starts the MPI program (the DAQ processes)
* restart - this is the same as a shutdown followed by a start-system
* reinit - this is the same as a shutdown followed by a start-system and an init
* exit - this resets the DAQ processes to their ground state, stops the MPI
program, and exits PMT.
Expert-level commands:
* fast-shutdown - stops the MPI program (all DAQ processes) no matter what
state they are in. This could have bad consequences if a run is going!
* fast-restart - this is the same as a fast-shutdown followed by a start-system
* fast-reinit - this is the same as a fast-shutdown followed by a start-system
and an init
* fast-exit - this stops the MPI program, and exits PMT.
Examples: manageCommPhase2System.sh -p 32768 init

Hints for logging into various computers

alias dsag='ssh -X -p 2100 ds50daq.lngs.infn.it -l daq'
alias dseb1='ssh -X -p 2001 ds50daq.lngs.infn.it -l daq'
alias dseb2='ssh -X -p 2002 ds50daq.lngs.infn.it -l daq'
alias dseb3='ssh -X -p 2003 ds50daq.lngs.infn.it -l daq'
alias dseb4='ssh -X -p 2004 ds50daq.lngs.infn.it -l daq'
alias dseb5='ssh -X -p 2005 ds50daq.lngs.infn.it -l daq'
alias dsfr1='ssh -X -p 2021 ds50daq.lngs.infn.it -l daq'
alias dsfr2='ssh -X -p 2022 ds50daq.lngs.infn.it -l daq'
alias dsfr3='ssh -X -p 2023 ds50daq.lngs.infn.it -l daq'
alias dsfr4='ssh -X -p 2024 ds50daq.lngs.infn.it -l daq'

Typical rates

In tests on 16-Sep-2013, the following maximum rates were observed (these numbers have relatively large error bars):
  • 10 usec gate width - 375 events/sec
  • 100 usec gate width - 180 events/sec
  • 300 usec gate width - 64 events/sec
  • 500 usec gate width - 43 events/sec

These results were achieved with a system with 6 boardreaders (5 1720s, one 1495), 16 eventbuilders, and two aggregators. Disk writing and online monitoring were turned off in these tests.

Please Note that the rates shown in the Trigger Rate in the online monitoring will be a prescaled fraction of these full event rates. However, the full event rate is reported periodically by the disk-writing aggregator in the first TMUX window.