Project

General

Profile

Feature #21641

artdaq - Bug #21267: Problems seen with large, and non-unique, request windows

Improve performance of metric reporting subsystem

Added by Eric Flumerfelt 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
artdaq-utilities
Start date:
01/08/2019
Due date:
% Done:

50%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

While performing tests with a BoardReader generating small Fragments at high rate (DAQInterface's circular_buffer_mode_example simple_test_config), it was noticed that the metric reporting thread (MetricManager::sendMetricLoop_) was taking large amounts of CPU resources. Further investigation showed that this was due to the large number of metric calls being processed. Similar effects had been noted before (commit:daa4fbcd64), and had been mitigated from the caller.

MetricManager's metric caching was redesigned to perform metric aggregation from the callee, to present a cleaner interface to callers. This change should also significantly reduce the memory usage of MetricManager in situations where metrics are being reported at a high rate.

Changes are on artdaq-utilities/feature/AggregateMetricsInMetricManager, and have been tested using circular_buffer_mode_example. Further testing/validation and code review are still needed.

RCGUI_Jan18C.png (315 KB) RCGUI_Jan18C.png Screen capture of protoDUNE Run Control strip charts Kurt Biery, 01/18/2019 12:49 PM
RCGUI_Jan18B.png (398 KB) RCGUI_Jan18B.png Kurt Biery, 01/18/2019 12:52 PM

Related issues

Related to artdaq - Feature #21717: It would be nice for ContainerFragmentLoader/Fragment/QuickVec to use fewer memcpy's when adding lots of fragments to a ContainerClosed2019-01-18

History

#1 Updated by Eric Flumerfelt 4 months ago

  • Status changed from New to Resolved

#2 Updated by Kurt Biery 4 months ago

  • Related to Feature #21717: It would be nice for ContainerFragmentLoader/Fragment/QuickVec to use fewer memcpy's when adding lots of fragments to a Container added

#3 Updated by Kurt Biery 4 months ago

I haven't had time to investigate this yet, but I noticed that when I used the code on this branch at protoDUNE, some of the metrics in the Run Control strip chart looked different. I'm going to try to attach an image that shows this effect. The first run in the strip charts used this code. The second run used the code on the develop branch of artdaq_utilities.

#4 Updated by Kurt Biery 4 months ago

Unfortunately, that image got ruined by a pop-up window. Here is a different view. It's the last two runs that are of interest here.

#5 Updated by Kurt Biery 3 months ago

On 23-Jan, I found a bug in MetricPlugin.hh in which a 'count' variable was being initialized to zero outside of a loop, when instead it should be set to zero each time through the loop. I believe that this was the cause of the difference in reported metrics that I saw in earlier testing at protoDUNE and the artdaq-demo. I committed a fix to the feature/AggregateMetricsInMetricManager branch.

At this time, I believe that we can merge the changes on the feature/AggregateMetricsInMetricManager branch into the develop branch once someone validates the bug fix that I made.

I should have reported on this earlier, but I see a noticeable reduction in the amount of CPU that the BoardReader takes when using the code on the feature/AggregateMetricsInMetricManager branch (compared to develop). For example, in artdaq-demo tests on mu2edaq01 (using "sh ./run_demo.sh --config circular_buffer_mode_example --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/circular_buffer_mode_example/boot.txt --comps component01 component03 --runduration 100 --no_om --partition=4" and the component03 config file modified to use 400 ADC counts instead of 10), I see the following "top" output during a run when using the code on the develop branch:

top - 12:58:13 up 54 days, 35 min, 40 users,  load average: 1.12, 3.33, 2.96
Threads: 1438 total,   2 running, 1431 sleeping,   4 stopped,   1 zombie
%Cpu(s):  4.9 us,  1.0 sy,  0.0 ni, 93.9 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32746204 total, 10752956 free,  5619432 used, 16373816 buff/cache
KiB Swap:  8388604 total,  8344184 free,    44420 used. 24246392 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                         
 3102 biery     20   0 37.523g  92164  11832 R 96.7  0.3   0:48.82 BoardReaderMain                                                                 
 3107 biery     20   0 37.523g  92164  11832 S  6.6  0.3   0:03.44 BoardReaderMain                                                                 
 3104 biery     20   0 37.523g  92164  11832 S  4.6  0.3   0:01.89 BoardReaderMain                                                                 
29333 ron       20   0  648484 219852   5184 S  2.0  0.7  42:19.68 kwalletd                                                                        
 1758 biery     20   0 30.098g 224940  88296 S  1.7  0.7   0:02.48 art                                                                             
 9303 biery     20   0 2578460  13860   4176 S  1.3  0.0   0:00.88 python                                                                          
  254 root      39  19       0      0      0 S  1.0  0.0   1001:13 kipmi0                                                                          
 2241 ganglia   20   0  283868   3676   2700 S  1.0  0.0 100:49.97 gmetad                                                                          
 6031 biery     20   0  157164   3572   1512 R  1.0  0.0   0:00.06 top                                                                             

With the code from the feature/AggregateMetricsInMetricManager branch, I see the following:

top - 13:07:08 up 54 days, 44 min, 40 users,  load average: 2.81, 4.45, 3.63
Threads: 1448 total,   2 running, 1441 sleeping,   4 stopped,   1 zombie
%Cpu(s):  1.2 us,  1.3 sy,  0.0 ni, 97.4 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32746204 total, 10597148 free,  5562364 used, 16586692 buff/cache
KiB Swap:  8388604 total,  8344184 free,    44420 used. 24288772 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                         
25933 biery     20   0 37.523g  39732   7616 S  7.6  0.1   0:04.44 BoardReaderMain                                                                 
25928 biery     20   0 37.523g  39732   7616 S  3.6  0.1   0:03.71 BoardReaderMain                                                                 
25931 biery     20   0 37.523g  39732   7616 S  3.0  0.1   0:01.72 BoardReaderMain                                                                 
30731 biery     20   0 2578460  13868   4176 S  1.7  0.0   0:01.03 python                                                                          
  254 root      39  19       0      0      0 S  1.3  0.0   1001:20 kipmi0                                                                          
24098 biery     20   0 30.098g 178908  42116 R  1.3  0.5   0:02.70 art                                                                             
24746 biery     20   0 26.103g 170732  40228 S  1.0  0.5   0:02.02 art                                                                             
24747 biery     20   0 26.103g 172472  40292 S  1.0  0.5   0:02.11 art                                                                             
28804 biery     20   0  157196   3604   1516 R  1.0  0.0   0:00.23 top                                                                             

#6 Updated by Eric Flumerfelt 3 months ago

  • Status changed from Resolved to Reviewed
  • Co-Assignees Kurt Biery added

I have checked Kurt's suggested code change and merged the branch into develop.

#7 Updated by Eric Flumerfelt 3 months ago

  • Target version set to artdaq_utilities v1_04_10
  • Status changed from Reviewed to Closed
  • Category changed from Needed Enhancements to artdaq-utilities
  • Project changed from artdaq to artdaq Utilities


Also available in: Atom PDF