Project

General

Profile

Bug #21863

mediumsystem_with_routing_master fails during integration testing

Added by Eric Flumerfelt over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
Known Issues
Target version:
Start date:
02/08/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Experiment:
-
Co-Assignees:
Duration:

Description

On mu2edaq12, the mediumsystem_with_routing_master test fails the 3 run x 60 s test. Problems appear to be related to stopping the DataLogger art process in the first run.

The following error message is sometimes printed by the DL art process at the start of the second run:

Error / ArtException
07-Feb-2019 10:46:10 CST
mu2edaq12.fnal.gov (192.168.157.12)
UDPMessage 33 / PID 47210
art / PostEndJob / ModuleEndJob
cet::exception caught in art
---- OtherArt BEGIN
  ---- DataCorruption BEGIN
    readNext returned a new Run and Event without a SubRun
  ---- DataCorruption END
---- OtherArt END


Subtasks

Feature #21868: Periodic warnings when backpressure detectedClosedEric Flumerfelt

Bug #21869: Data can persist in TransferPlugins between runs (stop/start transitions)ClosedEric Flumerfelt

Bug #21870: DataReceiverManager::stop_threads ends before all threads are actually stoppedClosedEric Flumerfelt

History

#1 Updated by Eric Flumerfelt over 1 year ago

I believe that the configuration I was running in on mu2edaq12 was inadvertently testing the case where disk-writing is not keeping up. However, we should definitely handle this case more gracefully...

#2 Updated by Eric Flumerfelt over 1 year ago

  • Start date changed from 02/07/2019 to 02/08/2019
  • Due date set to 02/08/2019

due to changes in a related task: #21868

#3 Updated by Eric Flumerfelt over 1 year ago

  • Assignee set to Eric Flumerfelt
  • Status changed from New to Work in progress
  • Category set to Known Issues

I have added artdaq/bugfix/21863_ArtdaqInput_Tweaks, which resolves the error message seen by art. I believe the problems I am currently seeing are in part due to the fact that data is kept in the shared memory buffers between runs, causing art to switch back and forth between the first run and the second, exacerbating the disk-writing delay.

#4 Updated by Eric Flumerfelt over 1 year ago

  • Due date set to 02/08/2019

due to changes in a related task: #21869

#5 Updated by Eric Flumerfelt over 1 year ago

  • Due date set to 02/08/2019

due to changes in a related task: #21870

#6 Updated by Eric Flumerfelt over 1 year ago

The combination of branches
bugfix/21863_ArtdaqInput_Tweaks
bugfix/21869_TransferInterface_FlushBuffers
bugfix/21870_DRM_StopThreads_WaitForAllThreads

leads to this example working on mu2edaq12.

#7 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Work in progress to Resolved

#8 Updated by John Freeman over 1 year ago

  • Status changed from Resolved to Reviewed

Things are looking good at this point. I've added Issue-specific comments to Issue #21870 and Issue #21869. Here, I'll add that the art's "readNext" error hasn't showed up in my runs since I merged bugfix/21863_ArtdaqInput_Tweaks into /home/jcfree/artdaq-demo_test_fixes_to_v3_03_02/srcs/artdaq. The one dangling question about this config is covered in an Issue I just added, Issue #21908, but it may not be a big deal...

#9 Updated by Eric Flumerfelt over 1 year ago

  • Target version set to artdaq v3_04_00
  • Co-Assignees John Freeman added

#10 Updated by Eric Flumerfelt over 1 year ago

  • Status changed from Reviewed to Closed


Also available in: Atom PDF