In advanced memory usage, DAQInterface should provide enough space when small fragments are used without wasting it
Eric observed that when advanced memory usage (Issue #23979) was used in runs with very small fragments, its algorithm of adding 10% overhead to the sum of fragment sizes to accommodate events was insufficient when small fragments were used due to the init message, configs, etc. The workaround that currently exists on the develop branch is that none of the simple_test_config boardreaders have a max_fragment_size_bytes less than 100k; however, this isn't intuitive (why should a 20-ADC count fragment need 100 KB of space?) and the real issue isn't a lack of space for fragments between boardreaders and eventbuilders, it's a lack of space for assembled events.
This was discussed during today's meeting. Eric's provided me with code to set a floor on the event size, and I plan to replace the 100 KB values for max_fragment_size_bytes with smaller, more "realistic" values.
JCF: Issue #24155: fix up some max_fragment_size_bytes values in the simple_test_configs
#1 Updated by John Freeman 9 months ago
- % Done changed from 0 to 100
- Status changed from New to Resolved
Resolved with commit 798a4ff82c99e13ee5c35081dd4255bbaa4de061 at the head of feature/24155_floor_on_event_size. The max_fragment_size_bytes settings are now more reasonable (e.g., if a ToySimulator has 10 ADCs per fragment, the max_fragment_size_bytes is set to 100). However, the floor on the max event size is 102400 bytes. This can be witnessed if, e.g., you run on the demo config with advanced memory usage.
#2 Updated by Kurt Biery 9 months ago
I'm looking at the behavior of this new code with the pdune_swtrig_DFO sample config...
Here are the max_*_size_bytes parameter values from a run with this code:
[biery@mu2edaq13 18]$ grep max_ * | grep bytes
felix01.fcl: max_fragment_size_bytes: 12000
felix02.fcl: max_fragment_size_bytes: 12000
felix03.fcl: max_fragment_size_bytes: 12000
ssp01.fcl: max_fragment_size_bytes: 1000
ssp02.fcl: max_fragment_size_bytes: 1000
ssp03.fcl: max_fragment_size_bytes: 1000
swtrig.fcl: max_fragment_size_bytes: 200
In this sample config (that has subsystems), the swtrig BR sends data to the DFO; and the DFO, the SSPs, and the FELIX BRs send data to the EBs.
I understand the max_event_size_bytes value of 102400 for the DFO - the max_fragment_size_bytes of the swtrig BR is under the 100k floor, so the DFO gets the minimum value of 102400.I'm not sure how the max_event_size_bytes values of 145304 were determined for the EBs, though.
- 39000 plus 102400 doesn't equal 145304, nor does 1.1 times (39000 plus 102400)
I must be missing something obvious, sorry about that, but I'd appreciate learning how 145304 was determined.
#3 Updated by John Freeman 9 months ago
I'll go through what happens step-by-step below, but the general algorithm for calculating the max event size for a subystem is:
- Take 1.1*(the sum of the fragment sizes for that subsystem's boardreaders) and then pad the number so it's evenly divisible by 8
- Add in the max event sizes of any parent subsystems
- Set the max event size of the subsystem in question to be the greater of 102400 or the calculated figure. Needless to say, it's only in subsystems without parents that you'd ever have to resort to the 102400 bytes.
Here's what happens in the concrete case of pdune_swtrig_DFO:
Subsystem 1's eventbuilder receives a fragment from 1 source: its swtrig boardreader
Subystem 2's eventbuilders receive fragments from 7 sources: the parent subsystem 1 (specifically subsystem 1's "DFO" eventbuilder), and the three ssp and three felix boardreaders in subsystem 2.
Subsystem 1's calculation is simple: the 200-byte max fragment size * 1.1 is 220 bytes, rounded up to 224. This is much less than 102400, so the max event size for subsystem 1 is 102400
Subsystem 2 involves taking 3 ssps of 1000 bytes each and 3 felixes of 12000 bytes each, and multiplying their total size of 39000 bytes by 1.1. This comes out to 42900, which needs to be padded to 42904. This value is then added to the size of the fragment coming from subsystem 1, which is 102400, for a total of 145304.
#4 Updated by Kurt Biery 9 months ago
Great, thanks, that definitely helps.
I've run artdaq-demo tests with both pdune_swtrig_DFO and mediumsystem_with_routing_master configs on the feature/24155_floor_on_event_size branch, and the calculated sizes of max_event_size_bytes look good, as do the reported sizes of actual events compared to buffer sizes ("tshow | grep Releasing | more").
I've modified the max_fragment_size_bytes values in the SSP and FELIX standard config files in the pdune_swtrig_DFO config to take into account the settings of request_window_width and timestamp_scale_factor in those files. I've also slightly decreased the max_fragment_size_bytes values in component_standard.fcl in the mediumsystem_with_routing_master since we can do that with the new book-keeping calculations. I'll commit those changes to the feature/24155_floor_on_event_size branch momentarily.
#5 Updated by Kurt Biery 9 months ago
Do the generated_fragments_per_event or sends_no_fragments parameters play a role in advanced memory management?
I suspect not. So at protoDUNE, we'll need to explicitly set max_fragment_size_bytes: 0 in the WIB config files. And, we'll need to multiply the single FELIX fragment size by five for the ohFelix BoardReaders, which generate 5 fragments per event. Does this sound right?
#7 Updated by Kurt Biery 9 months ago
In parallel, I've confirmed that generated_fragments_per_event: 5 in the ohFelix config files does have an effect.
The part that tripped me up was having to specify a max_fragment_size_bytes value at all in the WIB*.fcl when generated_fragments_per_event:0 or sends_no_fragment: true were set. But, that was trivial to add, and that's now included in my "next candidate" config.
#8 Updated by Kurt Biery 9 months ago
Based on my tests with the artdaq-demo on the mu2edaq cluster and the tests at protoDUNE, my sense is that these code changes are ready to be merged into develop.
Should I mark this issue Reviewed and do that merge, or do you want to review the changes that I made to the pdune_swtrig_DFO and mediumsystem_with_routing_master changes first?