Project

General

Profile

Bug #21267

Problems seen with large, and non-unique, request windows

Added by Kurt Biery 5 months ago. Updated 15 days ago.

Status:
Reviewed
Priority:
Normal
Category:
Known Issues
Target version:
Start date:
01/08/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Experiment:
-
Co-Assignees:
Duration:

Description

On protoDUNE, the Trigger FragmentGenerator generates many more fragments than the ones that are associated with eventual events. To provide insight into what happened between the triggers that generated events, they would like to read out all of those fragments, and to do that they configured their BoardReader to read out 3 second windows (with the majority of that window extending into the past). Thinking that event requests would be received in order, they also set the request_windows_are_unique config parameter to TRUE, in the hope that this would keep the data volume manageable.

However, on protoDUNE, data requests from the EBs are not necessarily received in order, and this led to events which had no Trigger fragments in them (because an out-of-order request had already claimed all of the existing fragments). The work-around on protoDUNE was to narrow the trigger window so that the data requests never ask for overlapping windows, but we tried changing the request_windows_are_unique parameter to false.

With the request windows not unique, and circular buffer mode on, and the 3 second request window, data taking would work fine for a while, but would then run into trouble.

Here is a note from the elog:

MSG
%MSG-w CTB_Receiver: Early 23-Oct-2018 07:33:18 CEST pre-events Word Buffer full and cannot store more%MSG
/nfs/sw/scripts/boot.sh: line 13: 1824 Killed $2 -c "id: $3 commanderPluginType: xmlrpc rank: $4 application_name: $5 partition_number: $6"

It also seemed like there were more timeout waiting for data messages

MSG-w trigger_0_CommandableFragmentGenerator: Early 23-Oct-2018 14:42:31 CEST pre-events CommandableFragmentGenerator.cc:931
applyRequests: A timeout occurred waiting for data to close the request window ({77014927312623432-77014927462623432}, buffer={77014925946619888-77014927446369888} ). Time waiting: 2538545 us (> 2000000 us).
%MSG

This is probably something that we should try to reproduce at protoDUNE once beam running finishes.


Subtasks

artdaq Utilities - Feature #21641: Improve performance of metric reporting subsystemClosedEric Flumerfelt


Related issues

Related to artdaq - Bug #21640: Memory leak in ContainerFragmentLoaderClosed2019-01-08

Related to artdaq - Feature #21717: It would be nice for ContainerFragmentLoader/Fragment/QuickVec to use fewer memcpy's when adding lots of fragments to a ContainerClosed2019-01-18

History

#1 Updated by Eric Flumerfelt 5 months ago

I have created a simple_test_config, circular_buffer_mode_example (on artdaq-utilities-daqinterface/feature/CircularBufferMode_NonUniqueTest), which I have been using to study this issue. It appears that the main issue is a performance bottleneck in CommandableFragmentGenerator where it is spending too much time calculating the byte-size of the data buffer when the buffer is composed of many small Fragments.

#2 Updated by Eric Flumerfelt 5 months ago

Candidate fix implemented on bugfix/CFG_ExplicitlyTrackDataBufferSizeBytes. Running circular_buffer_mode_example without change leads to many "A timeout occurred waiting for data to close the request window" error messages as the BoardReader falls behind. With change, run is clean. Note that this is on ironwork, more performant nodes may have different results.

#3 Updated by Eric Flumerfelt 5 months ago

  • Category set to Known Issues
  • Status changed from New to Work in progress
  • Assignee set to Eric Flumerfelt
  • Target version set to artdaq v3_04_00

#4 Updated by Kurt Biery 3 months ago

It seems like there is still some amount of dependence on the data size (which seems unexpected).

I merged the develop branch of artdaq-utilities-daqinterface into the feature/CircularBufferMode_NonUniqueTest branch and modified boot.txt to use the new pattern for the location of the 'DAQ setup script'.

With this change, I ran some tests on mu2edaq01. As expected, with the new code, and the circular_buffer_mode_example configuration, I don't see any problems. One side note, though, is that the BoardReader for component03 typically takes ~95% of a CPU during these runs.

Changing the nADCcounts parameter in the component03_hw_cfg.fcl file from 10 to 1000 causes that BoardReader to take 100% of a CPU in subsequent runs, and causes errors to appear (and fragments to go missing).

Making the additional configuration change of setting request_windows_are_unique to false in the component03 FHiCL file brings the BR CPU usage down to ~55%.

#5 Updated by Kurt Biery 3 months ago

Surprisingly, making the subsequent configuration change of setting nADCcounts to 100 (from 1000) caused the CPU usage of the component03 BR to go up (to ~95%).

#6 Updated by Eric Flumerfelt 3 months ago

Are the parameters

usecs_between_sends: 1000
throttle_usecs: 0

still at these values? What does tshow|grep "getNext_"|tdelta -d 1 show?

#7 Updated by Kurt Biery 3 months ago

Yes, the parameters that determine the rates of fragments are still the same.

[biery@mu2edaq01 circular_buffer_mode_example]$ pwd
/home/biery/331Demo/artdaq-utilities-daqinterface/simple_test_config/circular_buffer_mode_example
[biery@mu2edaq01 circular_buffer_mode_example]$ date
Wed Jan  2 13:45:08 CST 2019
[biery@mu2edaq01 circular_buffer_mode_example]$ git diff .
diff --git a/simple_test_config/circular_buffer_mode_example/component03_hw_cfg.fcl b/simple_test_config/circular_buffer_mode_example/component03_hw_cfg.fcl
index 7983abb..fccd2e5 100644
--- a/simple_test_config/circular_buffer_mode_example/component03_hw_cfg.fcl
+++ b/simple_test_config/circular_buffer_mode_example/component03_hw_cfg.fcl
@@ -32,7 +32,7 @@ request_address: "227.128.12.26" # Multicast request address
 request_window_offset: 2000 # Request message contains tzero. Window will be from tzero - offset to tz-o + width
 request_window_width: 2000 # These settings correspond to "two before, two after" 
 stale_request_timeout: "0xFFFFFFFF" # How long to wait before discarding request messages that are outside the available data
-request_windows_are_unique: false # If request windows are unique, avoids a copy operation, but the same data point cannot be used for two requests. If this is no
+request_windows_are_unique: true # If request windows are unique, avoids a copy operation, but the same data point cannot be used for two requests. If this is not
 circular_buffer_mode: true

 separate_data_thread: true # MUST be true for requests to be applied! If requesting is not desired, but a separate readout thread is, set this to true, requests_e
@@ -40,7 +40,7 @@ separate_data_thread: true # MUST be true for requests to be applied! If request
 separate_monitoring_thread: false # Whether a thread should be started which periodically calls checkHWStatus_, a user-defined function which should be used to ch
 hardware_poll_interval_us: 0 # How often, in microseconds, checkHWStatus_() should be called. 0 to disable.

-nADCcounts: 10
+nADCcounts: 100

 usecs_between_sends: 1000
 throttle_usecs: 0
[biery@mu2edaq01 circular_buffer_mode_example]$ grep usec *
component01_hw_cfg.fcl:usecs_between_sends: 1000000 # 10 Hz
component01_hw_cfg.fcl:throttle_usecs: 0
component02_hw_cfg.fcl:throttle_usecs: 1000000 # 1 Hz
component03_hw_cfg.fcl:usecs_between_sends: 1000
component03_hw_cfg.fcl:throttle_usecs: 0

#8 Updated by Eric Flumerfelt 2 months ago

  • Due date set to 01/08/2019
  • Start date changed from 10/31/2018 to 01/08/2019

due to changes in a related task: #21641

#9 Updated by Kurt Biery 2 months ago

  • Related to Bug #21640: Memory leak in ContainerFragmentLoader added

#10 Updated by Kurt Biery 2 months ago

With the changes from Issues 21641 and 21640, the CPU usage of the component03 BoardReader (the BR with the many fragments per event) is essentially to the same as the other BR when running the following command:
  • sh ./run_demo.sh --config circular_buffer_mode_example --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/circular_buffer_mode_example/boot.txt --comps component01 component03 --runduration 120 --no_om --partition=3

[reminder: the circular_buffer_mode_example currently needs advanced memory usage turned off]

#11 Updated by Kurt Biery 2 months ago

Results of some tests on mu2edaq01...

I used the circular_buffer_mode_example configuration on the daqinterface feature/CircularBufferMode_NonUniqueTest branch, with one change: the nADCcounts in component03_hw_cfg.fcl was changed from 10 to 400.

Comparing the older and newer artdaq code (develop branch vs. bugfix/CFG_ExplicitlyTrackDataBufferSizeBytes branch):
  • the time to copy all of the 2000 small fragments into the Container fragment was about the same (350-500 msec)
  • however, the time to "getNext_()" each tiny fragment was much improved: ~1 msec down to ~60 usec
Comparing the older and newer artdaq_core code (develop branch vs. bugfix/ContainerFragmentLoader_MemoryLeak branch):
  • the time to build the Container fragment goes down from ~900 msec to ~350-500 msec per event with the new code
Comparing the older and newer artdaq_utilities code (develop branch vs. feature/AggregateMetricsInMetricManager branch):
  • there isn't much observable difference in the time to build the Container fragment (about 350 msec in both cases).

[sh ./run_demo.sh --config circular_buffer_mode_example --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/circular_buffer_mode_example/boot.txt --comps component01 component03 --runduration 100 --no_om --partition=4]

#12 Updated by Kurt Biery 2 months ago

  • Related to Feature #21717: It would be nice for ContainerFragmentLoader/Fragment/QuickVec to use fewer memcpy's when adding lots of fragments to a Container added

#13 Updated by Kurt Biery about 2 months ago

As one more validation of these changes, I ran the protoDUNE system with a small (4) number of Trigger fragments and a large number of fragments (1200), both with the old and new code. (The event rate with the small number of fragments was 30 Hz; it was 1 Hz with the large number of fragments.) The fragment size was 375-550 bytes.

I didn't see any errors, and the data files on disk were comparable between the old and new code, so I'm going to go ahead and merge this branch to develop.

#14 Updated by Eric Flumerfelt about 2 months ago

  • Status changed from Work in progress to Resolved
  • Co-Assignees Kurt Biery added

#15 Updated by Eric Flumerfelt about 2 months ago

  • Status changed from Resolved to Reviewed


Also available in: Atom PDF