NOvA DataLogger User's Guide - version 1.23¶
NOvA DataLogger (NDL) Purpose and Functions¶
The NDL is the last module in the NOvA DAQ software chain. Its primary purpose is to write the accumulated data to disk files. It receives input from the Global Trigger and the Buffer Nodes. It writes completed events to Run and Subrun output disk files and continuously populates a shared memory segment with events for online monitoring. It resides on its own dedicated cpu node (named xxxx) in the NOvA service building. In the title of this guide, the version number decodes as follows : 1 means that this is the first DataLogger ever written by the author and the numbers after the dot correspond to the version number contained in the code.
NDL Process Overview¶
Activity in the NOvA DataLogger centers around the EventPool. Residing in the EventPool are incomplete Events called DataAtoms consisting of a TriggerBlock and/or DataBlocks. Each DataAtom, whether it is a TriggerBlock or DataBlocks or a combination of both has an associated Trigger Number which is used to build Events. Upon entering the EventPool, the Trigger Number from the new TriggerBlock or DataBlock is compared with existing DataAtoms already in the EventPool – a matching Trigger Number results in adding the object to the DataAtom, while a non-matching Trigger Number results in formation of a new DataAtom in the EventPool. A DataAtom becomes an Event when 1) the DataAtom consists of a TriggerBlock and all DataBlocks (1 per BufferNode), or 2) a timeout is reached with the DataAtom consisting of a TriggerBlock and an incomplete number of DataBlocks. Case 1) is normal operation of the NOvA DataLogger and the resulting complete Events are written to disk. Case 2) is an indication of a DAQ problem and generates a Warning message sent to the Run Control, with the resulting incomplete Event written to disk. While data is written out in this case, there may be no way to know if the missing DataBlocks had data in them so the run should be stopped and the DAQ problem fixed before continuing. If the timeout for a DataAtom is reached and the DataAtom has no TriggerBlock, this indicates a fatal DAQ problem and generates an Error message sent to the Run Control. This should never happen since Trigger Blocks from the Global Trigger should arrive at the DataLoggger before any DataBlocks with the corresponding Trigger Number arrive from the BufferNodes.
So, in normal NOvA DataLogger operation, each incoming TriggerBlock enters the EventPool, initiating a new DataAtom with a unique Trigger Number. As DataBlocks are received from the BufferNodes, they are added to the existing DataAtoms according to their Trigger Number. Also in normal operation, 1 and sometimes 2 DataBlocks have data for a given Trigger Number (for the anticipated 30 microsec trigger width) – the others have no data and consist only of a 6 word header and checksum. At the present time, it has been decided that DataBlocks from all available BufferNodes should be included in a complete Event, whether or not they contain data.
Events are written to disk – data streams are formed at Run start time corresponding to each trigger type for the Run plus a stream for all events. In this way, each Event appears in 2 output files. These files consist of various headers, data blocks, tails, and checksum words (data format described later).
Events are also written to a shared memory segment which is formed at Run setup on the DataLogger node. This memory segment is overwritten at a constant rate with new Events. An Event Viewer is available to view the data structure of these Events and an online Event Display can also be run on them.
NDL Classes and Methods¶
The following classes and header files reside in the NDLTest subdirectory of the DAQ:
EventPool – EventPool.cpp, EventPool.h¶
This class contains the bulk of the NDL processing – combining a TriggerBlock from the Global Trigger with a set of DataBlocks from the BufferNodes into an Event. In the EventPool, both DataBlocks and TriggerBlocks become DataAtoms – a DataAtom is defined as an incomplete Event with a Trigger Number different than all other DataAtoms. A DataAtom becomes a complete Event when it consists of one TriggerBlock and n DataBlocks where n is the (fixed for each run) number of DataBlocks corresponding to the number of BufferNodes participating in the run. At some future date, it may be decided to include only DataBlocks containing data in an Event, but at the present time, exactly n DataBlocks plus one TriggerBlock is the necessary condition for a DataAtom to be considered a complete Event. Many DataAtoms can exist in the EventPool at any given time, however, at least for trigger rates of a few Hertz, normally only one DataAtom at a time is present. To avoid overfilling of the EventPool, a timeout of 20 sec is set for each DataAtom upon initialization. If the timeout is reached before the DataAtom becomes a complete Event, and as long as the DataAtom has a Trigger Block, the DataAtom becomes an incomplete Event and is written out with a flag and a warning message sent to Run Control. This condition is an indication of a problem with the DAQ and should result in a Stop Run condition until the DAQ is fixed. An incomplete Event may or may not contain the data for its Trigger Number, so it can not be used reliably for offline analysis. In some very rare cases, a DataAtom could consist of only DataBlocks with no TriggerBlock. If a timeout is reached with this condition, no Event is formed and an error message is sent to RunControl. This should never occur, but if it does, it is probably an indication of a severe mismatch of timeout values between the BufferNodeEVB and the NDL.
DLRCC – DLRCC.cpp, DLRCC.h¶
This class handles communications between the NDL and the Run Control. Section 4 describes the messages and DataLogger responses to the various Run Control requests.
NDL Response to Run Control¶
The various actions and activities of the NDL are determined and initiated by responses to Run Control Requests. These are as follows with messages as passed through Message Facility :
The NDL is asked to join a Partition Number for data-taking. The NDL saves the partition number and responds with a message to Run Control indicating it has joined the Partition.
INFO / DLRCC 19-Oct-2010 13:46:29 CDT novatest01.fnal.gov (188.8.131.52) NovaDataLoggerapp (14253) DLRCC.cpp (94) DataLogger / DataLogger / MF-online EstablishPartitionRequest received with partition number 0
There is no NDL response to this message – the connection between the BufferNodes and the NDL is made and controlled by the BufferNode process.
Run Control requests the NDL to read its RunConfiguration file. This file contains information about the number of trigger types and the Trigger Masks associated with them. The NDL uses this to determine the number and file names of the output data streams for the upcoming run. The NDL responds with a message to Run Control indicating it has read the file.
INFO / DLRCC 19-Oct-2010 13:46:34 CDT novatest01.fnal.gov (184.108.40.206) NovaDataLoggerapp (14253) DLRCC.cpp (279) DataLogger / DataLogger / MF-online LoadRunConfigurationRequest received
Run Control requests the NDL to setup run configurations. Here, an initial check of the amount of available memory and disk space is done with messages sent to Run Control indicating that memory and disk space are available.
INFO / DLRCC 19-Oct-2010 13:46:34 CDT novatest01.fnal.gov (220.127.116.11) NovaDataLoggerapp (14253) DLRCC.cpp (350) DataLogger / DataLogger / MF-online ConfigureRunRequest received
Run Control requests the NDL to begin processing data. The Run and Subrun numbers are set here and the output data files are created. The shared memory segment is defined and an attachment is made. Appropriate messages are sent to Run Control indicating the status of these actions.
INFO / DLRCC 19-Oct-2010 13:46:37 CDT novatest01.fnal.gov (18.104.22.168) NovaDataLoggerapp (14253) DLRCC.cpp (411) DataLogger / DataLogger / MF-online BeginRunRequest received for partition number 0 WARNING / DLRCC_cppL1 19-Oct-2010 13:46:37 CDT novatest01.fnal.gov (22.214.171.124) NovaDataLoggerapp (14253) DLRCC.cpp (415) DataLogger / DataLogger / MF-online DLRCC::Setting the DUMMY ConfigurationBlock HERE for now INFO / DLRCC 19-Oct-2010 13:46:37 CDT novatest01.fnal.gov (126.96.36.199) NovaDataLoggerapp (14253) DLRCC.cpp (444) DataLogger / DataLogger / MF-online DLRCC::Attempting to connect to shared memory ID: 0x4d454d53 INFO / DLRCC 19-Oct-2010 13:46:37 CDT novatest01.fnal.gov (188.8.131.52) NovaDataLoggerapp (14253) DLRCC.cpp (450) DataLogger / DataLogger / MF-online Attached to shared memory at 0x2aaab4000000
Run Control requests to pause data-taking. The NDL sets its running flag to “false” and clears the EventPool of any remaining data. A response is sent to Run Control.
Run Control requests to resume data-taking after a pause. The NDL sets its running flag to “true” and resumes processing. A response is sent to Run Control.
Run Control requests that the run end.
INFO / DLRCC 19-Oct-2010 13:54:42 CDT novatest01.fnal.gov (184.108.40.206) NovaDataLoggerapp (14253) DLRCC.cpp (543) DataLogger / DataLogger / MF-online StopRunRequest received for partition number 0
Run Control requests the NDL to start a new Subrun. The existing Subrun files are closed and new ones opened to receive data. Data is continuously processed, only the output destination changes. A response is sent to Run Control which includes setting a variable to the number of events written to the previous Subrun and the data size variable is reset.
Run Control requests that a new run start with a new sequence of Subruns.
Run Control requests the NDL to exit from the current Partition. The NDL goes into its idle state and waits for new instructions.
No NDL response to this message – connection is controlled by the BufferNodes.
Run Control sends a “ping” to test presence of the NDL. The NDL responds if alive and sets a variable to the current data size of the Subrun file containing all of the events.
Debugging the NOvA DataLogger¶
Information about the performance and operation of the NDL is provided through the TRACE utility. Various levels defined by bits in a 32-bit word correspond to different parts of the NDL software. The TRACE levels can be set and changed anytime including during operation either by a command or by using the tracemanager utility. For the NDL, the 32-bits have been divided into 8 4-bit fields, each addressing a particular class and/or methods and objects of the NDL process. Usually, the lowest bit of a 4-bit field is reserved for class constructors and destructors, the higher bits give increasing levels of detail for the methods and objects. In this way, one can examine the performance of the NDL class-by-class and object-by-object if desired. The following table defines the bit fields and corresponding classes, methods and objects:
Bits in field Class Methods and Objects 0-3 DLRCC RC Messages 4-7 Low Rate Information 8-11 NDL Debug 12-15 EventPool DataAtoms 16-19 DataBlockReader DataBlocks 20-23 DataLoggerGTC TriggerBlocks 24-27 Shared Memory 28-31 RunStream Output Streams
On startup, the current default setting for the DataLogger is 0x220000ff. This corresponds to bits 0-3, 4-7, 25, and 29 set on and all others off. Bits 4 and 5 are run start up and configuration messages which occur only once per run. Bits 6 and 7 are low rate (every 10 secs by default) messages that display periodic statistics from the EventPool. Bit 25 displays the size of the Event written to shared memory, and bit 29 reports which disk files the Event is written into – both of these report once per Event. For standard NOvA data-taking, the default will be changed to 0xcf so that only the periodic statistics and critical Run Control messages are displayed.
The following is a snapshot of the NDL default TRACE output from start up through the end of a run:
insert snapshot of test run
Testing the DataLogger¶
The DataLogger can be tested on the node novatest01 using simulated data in a test of the full DAQ software chain – called integtest. In particular, a version of integtest using the commands contained in a file called SimMode1 is useful for a variety of tests involving the DataLogger.
a) Continuous running of DAQ Software chain
The most basic test is to start running with simulated data in one continuous run. This tests the setup procedures and startup of the DAQ chain including the DataLogger. The DataLogger should respond to the following RC requests and should display TRACE messages on the console : 1) Establish Partition, 2) etc.
b) DAQ running with Subruns
Normal running of NOvA will have Runs divided into Subruns which correspond to the output files of the DAQ. Upon start of a new NOvA run, the first Subrun(0) is begun. The DataLogger opens files for writing out the data. Timing tests – subrun, run times seen by DL.
c) Temporary loss of BufferNode(s)
One useful test is to temporarily disable a BufferNode to see how the DataLogger copes with the absence of data. Presently, the DataLogger requirements define a complete event as including datablocks from all available BufferNodes. In the case where a datablock is missing from the EventPool upon reaching its DataAtom timeout, the event is written out (if it contains a TriggerBlock) as an incomplete event. The following text shows the results of the DataLogger response to a temporary loss of one of 2 BufferNodes in the test.
On-Off-On BN test
Trigger #s 13, 14, 15, 16 were written out with incomplete datablocks. Warning messages were sent to RC for each of these. At present, RC stops a run after 10 skipped heartbeats, so this test can't be done if a BN is off longer than 10 heartbeats or if 10 skipped heartbeats are accumulated by multiple tests. Starting with Trigger # 13, the DataAtom for this trigger number enters its timeout phase, waiting for the missing datablock. The BN was disabled for ~8 seconds, so no datablocks from the missing BN were formed for 4 triggers (since the triggers occur every 2 seconds in this test). When the missing BN was re-enabled, it sent datablocks with the current trigger number – so event #13 was reconstructed as a complete event with trigger #17. In the sequence, one can see the incomplete events (19, 21, 23, and 25) being written out as their timeouts are reached with the corresponding trigger #s 13, 14, 15, and 16. There is enough time between triggers for the NDL to catch up while data is being transmitted, and one can see that by event # 26, the event #s and trigger #s are again the same. The BNs and the NDL perform correctly - the NDL catches up after the BN comes back.
Event # Trigger # # DataBlocks Incomplete flag 1 1 2 0 . . . . . . . . . . . . 12 12 2 0 13 17 2 0 14 18 2 0 15 19 2 0 16 20 2 0 17 21 2 0 18 22 2 0 19 13 1 1 20 23 2 0 21 14 1 1 22 24 2 0 23 15 1 1 24 25 2 0 25 16 1 1 26 26 2 0 . . . . . . . . . . . . 34 34 2 0 last event in run
Debugging the Output Data File¶
The data is written out by the DataLogger as a binary file. The format of the data can be found in complete detail in the document DocDB-4390 “NOvA DAQ DataFormats”. The DataLogger forms complete events and packages them with a Run Header and Tail and a Configuration Block. To view a raw data file, use the octal dump utility (od). There are various ways to display the data – the following command line input writes the data to an output text file in an easy to understand format:
od -t x4 input path and filename > output filename
The option “-t x4” displays the data in hexadecimal format with 4 bytes per integer. This means that the data is displayed as 4 32-bit words per row (after the counter field). The following is an example of a dump of the raw data:
0000000 41764f4e 39323945 00000211 80000000 word 1 - ascii "NOvA", word 2 ascii "E929" 0000020 00000000 00000013 4cfbe331 4cfbe331 25 words in Run Header 0000040 4cfbe3a3 4cfbe3a3 00000039 00000000 0000060 00000000 00000000 00000000 00001d07 0000100 00000000 00000000 00000000 00000000 * 0000140 aad56127 464e4f43 54525453 000c2000 word 26 - ascii "CONF", word 27 ascii "STRT" 0000160 00000000 00000211 00000000 00000000 12 words in empty Configuration Block 0000200 00000000 00000000 464e4f43 44444e45 0000220 b308f63f e929e929 00000078 00000000 word 38 - first event delimiter "e929e929" 0000240 00000211 00000000 00000001 00000000 8 words in Event Header 0000260 00000002 aaaa0001 28000000 00000001 event word 9 - TriggerBlock delimiter "aaaa" in bits 16-31 0000300 00000000 00000000 00000000 7d1cda40 20 words in TriggerBlock 0000320 0006a7dc 4cfbe331 000ec529 000000ff 0000340 00000000 00000000 00000000 00000000 0000360 0000003c 80000001 00000000 00000000 0000400 00000001 dabc0000 00000006 00000002 event word 29 - first data block delimiter in bits 16-31 0000420 00000001 20000000 b9722664 dabc0000 6 words in empty datablock, next datablock has data 0000440 00000051 00060001 00000001 20000000 (>6 words) 0000460 00006101 00000008 00000000 00000000 0000500 00006101 00000008 00000140 00000000 0000520 00006101 c00000a4 00000280 00000000 0000540 200080fb 000003a0 00910000 200180fb 0000560 000003a4 00820000 200280fb 000003a8 0000600 00840000 00000000 00000000 00000000 0000620 00000000 00000000 00000000 00000000 * 0000760 00000000 00000000 00000000 00006101 0001000 c000002c 000003c0 00000000 200b809b 0001020 0000042c 00000000 00000000 00000000 0001040 00000000 00000000 00000000 00000000 0001060 00006101 c0000014 00000500 00000000 0001100 00000000 00000000 00000000 00006101 0001120 00000008 00000640 00000000 4e184287 0001140 eeee0000 00000001 00000078 929e929e EventTail delimiter "eeee" - word 4 "929e929e" event end 0001160 00000000 e929e929 00000090 00000000 1 checksum word after event end delimiter, then next event . . . 0071640 eeee0000 00000039 00000084 929e929e last EventTail delimiter "eeee" 0071660 00000000 4c494154 4c494154 00000211 last event checksum word, RunTail 2 words ascii "TAIL" 0071700 80000000 00000000 00000013 4cfbe331 25 words in RunTail 0071720 4cfbe331 4cfbe3a3 4cfbe3a3 00000039 0071740 00000000 00000000 00000000 00000000 0071760 00001d07 00000000 00000000 00000000 0072000 00000000 00000000 00000000 39323945 words 24, 25 of RunTail ascii "E929", "NOvA" 0072020 41764f4e 654ce72a 00000000 RunTail checksum word and final 0 0072034