Project

General

Profile

Idea #23979

The standard DAQInterface settings file should set advanced_memory_usage to true

Added by John Freeman 9 months ago. Updated 8 months ago.

Status:
Reviewed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
01/31/2020
Due date:
% Done:

100%

Estimated time:
Experiment:
-
Duration:

Description

One of Ron's ideas. Basically, right now advanced_memory_usage (covered in the DAQInterface manual at https://cdcvs.fnal.gov/redmine/projects/artdaq-utilities/wiki/_memory_management_details_) is set to false in the sample $DAQINTERFACE_SETTINGS file; furthermore, many of the simple_test_configs aren't configured to work with advanced_memory_usage (but not all - demo, e.g., will work). The sample settings file should set advanced_memory_usage to true, and the simple_test_configs of course should be made to work with this new setting.

Associated revisions

Revision 8591c216 (diff)
Added by John Freeman 9 months ago

JCF: Issue #23979: in the sample settings file, toggle advanced_memory_usage from false to true

Revision 0cc35217 (diff)
Added by John Freeman 9 months ago

JCF: Issue #23979: modify the simple_test_configs so they work with advanced memory usage

Revision 0ea7e8a6 (diff)
Added by John Freeman 9 months ago

JCF: Issue #23979: perform the fhicl-dump on the documents before bookkeeping, rather than after

The motivation behind this is that advanced memory usage was failing
on mediumsystem_with_routing_master, since component_standard.fcl -
max_fragment_size_bytes for component01 via the line:

component01_standard.fragment_receiver.max_fragment_size_bytes: 8192

...but DAQInterface's AI isn't sophisticated enough to understand that
this override is intended solely for component01. By performing the
fhicl-dump before the bookkeeping, the X.Y.Z type qualification gets
stripped away, and bookkeeping will do what you want it to do.

Revision 9707ba60 (diff)
Added by John Freeman 9 months ago

JCF: Issue #23979: since the standard $DAQINTERFACE_SETTINGS file now has advanced_memory_usage set to true, eliminate the now-confusing max_event_size_bytes from the simple_test_configs

Revision 0d5cff3e (diff)
Added by John Freeman 9 months ago

JCF: Issue #23979: deprecate use of max_event_size_bytes in FHiCL documents when advanced_memory_usage is set to true, since this parameter is misleading as it will get overwritten with the value DAQInterface calculates

Revision 7ad92ec1 (diff)
Added by John Freeman 8 months ago

JCF: Issue #23979: come up with a more robust way to bookkeep the sender and receiver ranks in a RoutingMaster now that fhicl-dump occurs before, rather than after, the bookkeeping

Revision 2c25c433 (diff)
Added by John Freeman 8 months ago

JCF: Issue #23979: add tests for the enclosing table bookkeeping functions

Revision 443a52b5 (diff)
Added by John Freeman 8 months ago

JCF: Issue #23979: get enclosing_table_range and enclosing_table_name to use the same logic

Revision d9ae51bc (diff)
Added by John Freeman 8 months ago

JCF: Issue #23979: fix the logic in enclosing_table_range so it works with complex cases such as the RootDAQOut table in mediumsystem_with_routing_master

History

#1 Updated by John Freeman 9 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved
Resolved with the current head of feature/23979_advanced_memory_usage_default, commit 0d5cff3e9869b8b0555bd134c2da51444595e3e8. A couple of comments:
  • I've added a requirement to advanced memory usage that max_event_size_bytes not appear in any of the FHiCL documents. The existence of this parameter would be misleading as DAQInterface will calculate and add this parameter where needed
  • The simple_test_config configurations now work with advanced_memory_usage, and would need to be modified in order to work without advanced_memory_usage
  • For reasons described in commit 0ea7e8a6ce3f31f32ca6a79afa6cb981907f6db1's comment, fhicl-dump is now run before, rather than after, the bookkeeping

#2 Updated by Kurt Biery 8 months ago

I ran a validation test with mediumsystem_with_routing_master on the mu2edaq cluster, and the book-keeping of the RoutingMaster1.fcl file didn't seem to work.

I used an artdaq-demo system based on v3_07_02.

The command that I used to run the demo was the following:
  • sh ./run_demo.sh --config mediumsystem_with_routing_master --bootfile `pwd`/artdaq-utilities-daqinterface/simple_test_config/mediumsystem_with_routing_master/boot.txt --comps component01 component02 component03 component04 component05 component06 component07 component08 component09 component10 --runduration 40 --partition 0 --no_om

The differences between the resulting RoutingMaster1.fcl files with the develop and feature/23979_advanced_memory_usage_default branches were the following:

[biery@mu2edaq13 run_records]$ diff 12/RoutingMaster1.fcl 13/RoutingMaster1.fcl
5a6

19,21c20,21
< 10,
< 11,
< 12
---

2,
3

29,37c29
< 1,
< 2,
< 3,
< 4,
< 5,
< 6,
< 7,
< 8,
< 9
---

1

48a41

Not sure why there were different numbers of sender_ranks and receiver_ranks between the two different versions of the artdaq-utilities-daqinterface software...

#3 Updated by John Freeman 8 months ago

Good catch. This is fixed with commit 7ad92ec119a96bb756304b1728d6cfb01a4095ed on feature/23979_advanced_memory_usage_default. Essentially what was happening is that since the fhicl-dump is now being run before, rather than after, bookkeeping, the sender_ranks and receiver_ranks lists in the RoutingMaster FHiCL got spread across multiple lines, meaning that the bookkeeping substitution was failing.

#4 Updated by Kurt Biery 8 months ago

Thanks for the fix.

With the updated code, I noticed that the raw data files are being written to /tmp instead of the directory that I've specified in the data_directory_override parameter in the settings_example file.

I can run some additional tests to try to help narrow down the problem, but I wanted to mention this now in case it might be obvious what is happening.

#5 Updated by John Freeman 8 months ago

The situation is that the enclosing_table_range function from utilities.py, called to find the enclosing table around the line "module_type: RootDAQOut", isn't robust enough to work as advertised on the fhicl-dumped version of the RootDAQOut module's table (here, taken from the demo config):

  normalOutput: {
      compressionLevel: 3
      fastCloning: false
      fileName: "/tmp/artdaqdemo_r%06r_sr%02s_%to_%#.root" 
      fileProperties: {
         maxRuns: 1
         maxSubRuns: 1
      }
      module_type: "RootDAQOut" 
   }

...whereas it was able to deal with the non-fhicl-dumped version:
  normalOutput: {
    module_type: RootDAQOut
    fileName: "/tmp/artdaqdemo_r%06r_sr%02s_%to_%#.root" 
    fileProperties: { maxSubRuns: 1 maxRuns: 1  }
    fastCloning: false
    compressionLevel: 3
  }

I'll figure out how to get it to handle the fhicl-dumped case...

#6 Updated by John Freeman 8 months ago

Fixed with commit d9ae51bc635f8c025636ee3023a665befb41709d. Now the enclosing_table_range function is able to handle scenarios more complex than "there are no tables lying between the token and the open brace of its enclosing table". I've gotten it to work both with the normalOutput tables for demo and for mediumsystem_with_routing_master. It can also be tested via:

python $ARTDAQ_DAQINTERFACE_DIR/rc/control/utilities.py <name of FHiCL file> <name of token found inside of a table in the FHiCL file>

In a practical sense, among other things this means that the data directory override will work correctly.

#7 Updated by Kurt Biery 8 months ago

Great, thanks.
Testing so far looks good.
One thing that I noticed is that, when using the mediumsystem_with_routing_master sample config, the multicast_interface_ip parameter in the component01.fcl file is no longer book-kept (with these advMemMgmt code changes).
Is this by-design?
Thanks,

#8 Updated by Kurt Biery 8 months ago

John and I talked about the book-keeping of the component01::multicast_interface_ip parameter in the mediumsystem_with_routing_master sample configuration, and we agreed that the observed behavior with the code on this branch is actually an improvement: since component01 is a push-mode BoardReader, it won't be receiving Data Requests from the EventBuilders, so it doesn't need the multicast_interface_ip parameter to be set.

I consider this issue to now be verified, and I'll merge the code to the develop branch later this afternoon.

One request: please double-check the existing Wiki documentation on memory management. My sense is that the Non-Advanced Memory Usage section implies that the max_event_size_bytes parameters in the EB, DL, Dispatcher config files will be automatically filled in when advanced memory usage is turned off, but I suspect that this behavior has changed with the new code.

#9 Updated by Kurt Biery 8 months ago

  • Status changed from Resolved to Reviewed


Also available in: Atom PDF