Project

General

Profile

Necessary Maintenance #18997

Verify that the run and subrun level data products are written during the proper art phase

Added by Gianluca Petrillo over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Data products
Target version:
-
Start date:
02/14/2018
Due date:
% Done:

100%

Estimated time:
Spent time:
Experiment:
LArSoft
Duration:

Description

Data products associated to the run and subrun may be written (that is, put() into the art principal) either at the beginning or at the end of their reference period (run/subrun).
Each data product is associated to a range of events: the data products that are written at the beginning of the period are associated to the entire period, while the ones written at the end are associated to the exact range of events actually visited.
While the preferred practice is to write the product at the end of the period, there may be reasons to choose either of the behaviour.

Also, the range of associated events matters when trying to process multiple files, in which each has its own version of the data product, but multiple of them refer to the same period. An example may be a summary data product collecting the POT on a run: one file [A] might contain a summary for events 1 to 50 of run 1, another file [B] a summary for events 51-100 of the same run 1, and a third [C] again events 1 to 50 of run 1.
When processing [A] and [B] together, the summary data should be aggregate in a record covering run 1 from event 1 to 100, while when processing file [A] and [C] together the two records should be identical, and either should be used (although in this example the safer choice might be to forbid processing those two files together). If the information were saved as associated to the whole run, there would be no chance to actually do that.

This ticket is about verifying that the products in run and subrun from LArSoft code are put into art at the right time.


Related issues

Related to LArSoft - Bug #18943: Problem with sumdata::RunData aggregationClosed02/09/2018

Related to uBooNE code - Necessary Maintenance #18998: Verify the aggregation criteria for data product MuCS::MuCSDTOffsetNew02/14/2018

History

#1 Updated by Gianluca Petrillo over 2 years ago

  • Related to Bug #18943: Problem with sumdata::RunData aggregation added

#2 Updated by Gianluca Petrillo over 2 years ago

  • Status changed from New to Assigned
  • Assignee set to Gianluca Petrillo

I have identified the following run/subrun products:

sumdata::RunData run larcoreobj:source:larcoreobj/SummaryData/RunData.h
sumdata::POTSummary subrun larcoreobj:source:larcoreobj/SummaryData/RunData.h
std::vector<MuCS::MuCSDTOffset> run uboonecode:source:uboone/MuCS/MuCSDTOffset.h

The first two data products are in LArSoft. Proper aggregate() methods have been defined as solution to issue #18943.
The other data product is effectively a std::vector, so there is no way to add an aggregate() function directly to the object. An aggregation is provided by default by art, that is to concatenate the vectors. The author of the best class there's ever been, MuCS::MuCSDTOffset (Matt Bass), should consider if this is the desired behaviour. If not, further directions will be needed by art experts on how to direct art to a custom aggregation free function.
That data product is written at the end of the run by MuCSDT module.

#3 Updated by Gianluca Petrillo over 2 years ago

  • % Done changed from 0 to 70

The object sumdata::POTSummary is put() into each subrun by the modules GENIEGen (larsim:source:larsim/EventGenerator/GENIE/GENIEGen_module.cc), TGMuon (argoneutcode:source:TGMuon/TGMuon_module.cc) and BeamData (uboonecode:source:uboone/BeamData/BeamData_module.cc).
In all cases, put() call happens at the end of the subrun.

No further action is needed.

#4 Updated by Gianluca Petrillo over 2 years ago

#5 Updated by Gianluca Petrillo over 2 years ago

  • % Done changed from 70 to 100

The data product sumdata::RunData is written by 24 modules. All of them write it at the beginning of the run. In this way, the data product will be associated with the whole run.
This is an acceptable solution because the purpose of the data product is to capture configuration that is never changed during each run.

The aggregation method requires that the configuration of the different fragments being aggregated is consistent. This is also the correct behaviour, as having two different configurations in the same run is incorrect. If this causes problems with the Monte Carlo simulation, either this choice or the experiment workflow should be reconsidered.

#6 Updated by Gianluca Petrillo over 2 years ago

All the instances of run and subrun products in LArSoft have been checked, and no further action was required.
I have also checked the experiment code, and opened a ticket for the only such data product found (issue #18998).

This concludes the work on this issue.

#7 Updated by Gianluca Petrillo over 2 years ago

  • Status changed from Assigned to Resolved

#8 Updated by Gianluca Petrillo over 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF