Project

General

Profile

Running DAQ Interface » History » Version 29

Version 28 (John Freeman, 07/14/2014 12:13 PM) → Version 29/223 (John Freeman, 07/14/2014 06:03 PM)

h1. Running DAQ Interface

DAQ Interface is designed to be run, along with rest of the run control code, on lbnedaqtest01.fnal.gov . To obtain an account on this system, contact John Freeman, jcfree@fnal.gov . Once you have an account, you may do the following:

* *Check out the run control software*:

Create a new directory, cd into it, and execute <pre>git clone ssh://p-lbnerc@cdcvs.fnal.gov/cvs/projects/lbnerc</pre>

* *Make sure you're on the feature/DAQInterface branch*
cd into lbnerc/, and execute
<pre>git checkout feature/DAQInterface </pre>

* *Set up the environment*:
From the lbnerc/ directory, execute <pre>source source_me</pre> This will both create a local build of /usr/local/lbne-artdaq-base/lbne-artaq, as well as set up the Python virtual environment needed by the LBNE RC code in the parent directory of lbnerc, in a directory call "env" (in other words, "env" and "lbnerc" are at the same level of the directory hierarchy on the system). If this is the first time you've build lbne-artdaq and/or set up the Python virtual environment, each process will take roughly two minutes. Note that while there will be a few error/warning messages displayed at different points of the setup, at the end you should see <pre>Environment ready; consider running the unit tests via command nosetests</pre>
n.b. As of 7/8/14 if you run <code>nosetests</code> 4 of the 65 tests will fail; more than this, and there may be a problem which will affect the running of DAQInterface. The most likely cause is that an lbnecontrol and/or daqinterface process is already running (described right below).

* *Start LBNE run control*: <pre> lbnecontrol & </pre>. Note this won't work if lbnecontrol is already running; to find this out, run "<code>ps -A | grep lbnecontrol</code>"

* *Start DAQ Interface*: <pre> daqinterface -n daqint -r 5570 -c localhost -H localhost & </pre> . Like lbnerc, this also won't work if daqinterface is already running

* *Take DAQ Interface through the standard transitions* :
Fire up a new shell/terminal in which the artdaq processes are launched, and initialize them with FHiCL documents, by executing the following:
<pre>
lbnecmd init daq
</pre>
Start the toy fragment generator, which produces simulated CAEN board data, and plot the data using an Art module:
<pre>
lbnecmd start daq
</pre>
Pause it, ending the subrun but not the run:
<pre>
lbnecmd pause daq
</pre>
Resume DAQ running:
<pre>
lbnecmd resume daq
</pre>
Halt the running of the DAQ:
<pre>
lbnecmd stop daq
</pre>
Kill all the artdaq processes:
<pre>
lbnecmd terminate daq
</pre>

* *A Closer Look*

As mentioned briefly above, the "init" transition will both create the artdaq processes as well as initialize them via a set of FHiCL documents. But which documents? The answer to this is that they are defined in the file "docs/fcl_devel.txt"; note that the name of the file DAQInterface uses to search for FHiCL documents can be changed if the member string "fcl_list_filename" is changed. "docs/fcl_devel.txt" will look something like the following:

<pre>
*If Problems Arise*
# This list of fcl files, when read in by DAQInterface, will be used
# to initialize the artdaq processes

/data/fcl/daqinterface/BoardReader_TOY1_lbnedaqtest01.fnal.gov_5205.fcl
/data/fcl/daqinterface/BoardReader_TOY2_lbnedaqtest01.fnal.gov_5206.fcl
/data/fcl/daqinterface/EventBuilder_lbnedaqtest01.fnal.gov_5235.fcl
/data/fcl/daqinterface/EventBuilder_lbnedaqtest01.fnal.gov_5236.fcl
/data/fcl/daqinterface/Aggregator_lbnedaqtest01.fnal.gov_5265.fcl
/data/fcl/daqinterface/Aggregator_lbnedaqtest01.fnal.gov_5266.fcl
</pre>

Currently (7/14/14) the FHiCL files are all in a public area; however, you can edit this file to use your private FHiCL files, but keep the following in mind:
# The FHiCL documents should all contain either the token "BoardReader", "EventBuilder" or "Aggregator" in their names; this is so DAQInterface can determine whether the document is designed to initialize a BoardReaderMain, EventBuilderMain or AggregatorMain process
# Order matters: BoardReaderMain documents should be listed before EventBuilderMain documents which in turn should be listed before AggregatorMain documents

* *Error handling*
As of this writing (7/14/14) (7/8/14) there has not yet been extensive user feedback concerning DAQInterface; despite this, certain potential problems have been anticipated and are handled within DAQInterface. These problems include:
# An artdaq process returns an error state after a transition request, or an exception is thrown by the XML-RPC library during the request
# During periodic checks, one or more artdaq processes expected to exist are not found

In either case, an error is reported via 0MQ to run control, and the "Recover" transition is automatically triggered. This transition is a fairly blunt instrument: it will kill any remaining artdaq processes and return DAQInterface to its original state of "stopped" (i.e., one in which it requires the "init" transition before anything else is done).

In order to see this for yourself, you can deliberately sabotage one of the transitions. E.g., you during the "init" transition, FHiCL documents located in /data/fcl/daqinterface are used to initialize the artdaq processes after these processes have been started. You can replace one of these filenames listed in docs/fcl_devel.txt the lbnerc/rc/control/daqinterface.py file with one of your own files intentionally designed to be improper FHiCL; this will then trigger a "recover" transition automatically when an "init" transition is requested. The same thing will occur if you use the name of a file which doesn't exist. After the "recover" transition, you You can then use the "lbnecmd check" command to see for yourself that DAQ Interface has returned to its original state. Another thing to do is, after the init transition, once the artdaq process terminal pops up, close it -- this will terminate the artdaq processes, also triggering a call to "recover". Note that currently, if recover sees that the artdaq processes don't exist, it will raise an exception, which will in turn trigger DAQInterface to end itself.

"Recover".

Please note that if you issue an "init" transition and then follow it with a "terminate" transition, you'll see an exception in the artdaq terminal window which looks like the snippet below; this is because statistics collection which occurs during termination will fail if no data's been collected, which is expected:

<pre>
Tue Jul 08 14:16:33 -0500 2014: Time Summary:
Tue Jul 08 14:16:33 -0500 2014: Min: 0
Tue Jul 08 14:16:33 -0500 2014: Max: 0
Tue Jul 08 14:16:33 -0500 2014: Avg: inf
Tue Jul 08 14:16:33 -0500 2014: %MSG-s ArtException: Aggregator-lbnedaqtest01-5265 JobSetup
Tue Jul 08 14:16:33 -0500 2014: cet::exception caught in art
Tue Jul 08 14:16:33 -0500 2014: ---- DataCorruption BEGIN
Tue Jul 08 14:16:33 -0500 2014: NetMonInputDetail: Could not receive message!
Tue Jul 08 14:16:33 -0500 2014: ---- DataCorruption END
Tue Jul 08 14:16:33 -0500 2014: %MSG
Tue Jul 08 14:16:33 -0500 2014: %MSG-s ArtException: Aggregator-lbnedaqtest01-5266 JobSetup
Tue Jul 08 14:16:33 -0500 2014: cet::exception caught in art
Tue Jul 08 14:16:33 -0500 2014: ---- DataCorruption BEGIN
Tue Jul 08 14:16:33 -0500 2014: NetMonInputDetail: Could not receive message!
Tue Jul 08 14:16:33 -0500 2014: ---- DataCorruption END
</pre>

* *Troubleshooting*
** *Your change to daqinterface.py doesn't seem to do anything*
Make sure you kill the existing daqinterface process and restart it
** *On the initial transition ("lbnecmd init daq"), you see "error: [Errno 111] Connection refused"*
If a "Recover" is triggered and you can see via "lbnecmd check" that DAQInterface is in the "stopped" state, try initializing again. If that doesn't work, you can try increasing the value of the pauseBeforeCommands member variable in the DAQInterface class; this is the pause, in seconds, between when the artdaq processes are launched and when they're sent the FHiCL documents to initialize. Empirically, it appears there needs to be a pause of at least 4 seconds on lbnedaqtest01.fnal.gov before the FHiCL documents can be successfully sent via XML-RPC to the processes; increasing this value may make it less likely that the "Connection refused" error occurs. Remember to kill and then restart DAQInterface if you've altered its source code.