Project

General

Profile

Some high-level notes on DAQ interface

Available tools

It appears information can be sent unprompted to RC in JSON (JavaScript Object Notation) format via 0MQ. Hierarchical data, somewhat similar to FHiCL and XML.

Configuration format

It seems like it would be appropriate to exclude artdaq-specific info from the configuration info in the database (e.g., number of event builders, parameters controlling the online monitoring modules, etc.) In other words, separate physics-relevant info (trigger table, etc.) from how that physics is collected.

JCF, 4/10/14 According to Erik, labeling of configurations is taken from Nova, and is high-level. Not 100% clear what the information DAQ Interface will use to configure itself and artdaq will look like yet, but should be relatively easy to implement a "stub" to work with (e.g., an ASCII file with config info) in the meantime

Principles:

Don't let RC know there's a problem if it can be easily fixed

--Retries?

Transparency

If a problem can't be fixed, send RC a JSON object with as much info as possible (e.g., what time DAQ interface realized the problem occurred, which specific board reader / event builder caused the error should this be the case, what the error string was, etc.) [ _JCF, 4/10/14 Thinking on this has changed-- see below]

Be poll-able at all times, and send as much info as is reasonably possible to send when polled

An example of what a response might look like, if RC queries DAQ interface when it seems to be taking an unusual amount of time to respond to a "stop" command while running, could be:

state: "running" 
transition_req: "stop" 
transition_complete: "no" 
any_errors: "yes" 
artdaq_state: "running" 
artdaq_transition_req: "stop" 

processes = [
boardreader1 : {last_req: "stop", status: "OK", response: "a longer string", time_of_response: "time_here"}
boardreader2 : {last_req: "stop", status: "error", response: "error message", time_of_response: "time_here"}
boardreader3 : {last_req: "stop", status: "OK", response: "a longer string", time_of_response: "time_here"}
.
.
.
]

In fact, this doesn't even need to be a response to a query -- DAQ interface, as soon as it receives an error from one of the processes, could just send this message up to RC via 0MQ unprompted. This way RC doesn't need to worry about a scenario where an error occurs, DAQ interface knows about it, but several seconds -- or worse -- go by without corrective action being applied. All of the parameters given above should be pretty self-explanatory, except for the fact that there's a "state" and an "artdaq_state"; this is because if, e.g., RC has only a "start" and a "stop" state, then at the artdaq level "start" would translate into an "initialize" followed by a "start", and if something goes wrong during the RC-level "start" it's worth knowing whether the actual error occurred because a boardreader process couldn't initialize its upstream hardware or because it could, but then there was a problem during its attempt to go into "running" mode.

JCF, 4/10/14 After today's discussion with Kurt and Erik, the plan for communication between Run Control and DAQ Interface is as follows:

  • When RC sends DAQ Interface a state transition request via XML-RPC, DAQ Interface immediately responds via XML-RPC letting RC know that it has received the request
  • Upon receiving the request, DAQ Interface will send JSON structures via 0MQ providing simple monitoring update info to RC (e.g., "N of M processes running")
  • Upon entering a new state (either the desired one or the "Error" state) send a JSON structure describing the new state of artdaq to RC
  • Extensive information will be logged (though not sent to RC) so that experts can troubleshoot if an Error state is entered and DAQ Interface can't itself recover artdaq. It may be worthwhile to specify a "verbosity" level so that when troubleshooting the DAQ more log info than normal can be specified.

Program layout

Always have a thread capable of responding when polled. Info such as the state we're in (stopping, stopped, error, etc.), and perhaps some info about the state as well (if we're stopping, which processes have already successfully stopped?)

Command queue issues: should a user be allowed to hit "init" and then immediately hit "start", where "start" gets executed after "init"? Or should we not use a queue and require that the system not be in a transition state in order to issue a command? [ John, 5/6/14: Reflecting on it, I would choose the latter over the former ] What happens if a RC operator gets impatient and hits the same transition twice in a row? Perhaps a "busy" error. [ John, 5/6/14: if you're not allowed to request another transition while DAQ Interface is transitioning, then all this is saying is, what happens if you request to transition into the state you're already in? Probably nothing, as opposed to sending an "invalid transition" error (see next sentence) ] Also a good idea, brought up a couple of months ago: an "invalid transition" error. Both the "busy" and the "invalid transition" errors should require no further troubleshooting from the RC side.

Implementation language ( JCF, 4/10/14)

Competitors are Ruby, C++ and Python. Pros and cons are as follows:

Ruby

--Can cannibalize DemoControl.rb from artdaq-demo
--Scripting language good for gluing together the FHiCL documents needed to control the artdaq processes

Python

--Run Control is written in Python and provides a base class (component.py) from which DAQ Interface can inherit. Erik will write a reasonably complex inheritor class which will contain much of the functionality DAQ Interface will need)
--Like Ruby, it's a scripting language good for gluing together the FHiCL documents needed to control the artdaq processes
--More popular than Ruby among physicists

C++

--Not a good choice for a glue language (slower development time than Ruby or Python)
--Most popular language among physicists
--Great power and flexibility