Project

General

Profile

Bug #7246

Passing --file-events on init results in extra TOY2 Event in subrun 1

Added by Eric Flumerfelt about 6 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Category:
Known Issues
Target version:
Start date:
10/30/2014
Due date:
% Done:

0%

Estimated time:
Co-Assignees:
Duration:

Description

I ran artdaq-demo using the manage2x2x2System.sh script with the following parameters:
manage2x2x2System.sh -v -m on --event-size 1048576 --file-events 100 init (event-size is a handle I added to specify the byte-size of each event from each board reader)
manage2x2x2System.sh -N 104 start

When I examined the artdaqdemo_r000104_sr01_<TIME>.root files, I found that one of them had an extra TOY2 fragment at event id 101. Event 101 of sr02 had a TOY1 but no TOY2, and had an extra TOY2 fragment at the end (presumably from the same issue).

History

#1 Updated by Kurt Biery about 6 years ago

John has asked whether this issue may affect all artdaq-based systems, not just artdaq-demo. It certainly can. And, maybe that means that we should change the Project for this Issue to artdaq...

As Eric is correctly determining, this issue can affect any artdaq-based system that has multiple BoardReaders, uses one of the configurable options to have the Aggregator periodically close files, and does not use a hardware trigger to initiate the readout of the different parts of the system. (Although, I've seen also this symptom in the DS-50 DAQ where there is a hardware trigger, but it is much more rare there. And, that seems to be a problem in the readout of the hardware rather than artdaq itself.)

When the Aggregator sends the Pause messages to the BoardReaders when it is time to close a file, there currently is no synchronization to ensure that all of the BoardReaders have sent the same number of Fragments. So, there is a race condition where some BoardReaders get the Pause command after fragment N and others get it after fragment N+1.

In the current system, when the Pause command is sent to EventBuilders, partial events are drained from the EventStore and sent downstream without any error flags. So, the partial event(s) are dutifully included in the data stream (and file).

This Issue is somewhat related to #5974 (Improve the robustness of the automatic pause & resume mechanism).

I'm not sure that there is an obvious solution to this. Maybe we could (mis?)use timestamp parameter that is now part of the Pause command to tell the BoardReaders to Pause after a specified number of events. We would only want to enable this feature in certain installations of the system. And, the Aggregator would need to get reliable information on the number of Fragments processed by each BoardReader.

#2 Updated by Eric Flumerfelt over 4 years ago

  • Target version set to 981

#3 Updated by Eric Flumerfelt almost 4 years ago

  • Category set to Known Issues
  • Target version deleted (981)

I have evidence from this morning that this issue still exists in artdaq v1_13_03. There should definitely be some form of sequenceID-based synchronization for system transitions, but I don't know what the best way of implementing it might be.

#4 Updated by Eric Flumerfelt over 3 years ago

  • Status changed from New to Resolved
  • Assignee set to Eric Flumerfelt
  • Target version set to artdaq-demo Next Release

This issue has disappeared, most likely due to the replacement of Aggregator-based file closing with art-based file closing.

#5 Updated by Eric Flumerfelt over 3 years ago

  • Status changed from Resolved to Closed

#6 Updated by Eric Flumerfelt over 3 years ago

  • Target version changed from artdaq-demo Next Release to artdaq-demo v2_10_00

Also available in: Atom PDF