Project

General

Profile

Feature #18319

Two requests regarding the file index field in output filenames

Added by Kurt Biery over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
I/O
Target version:
Start date:
11/16/2017
Due date:
% Done:

100%

Estimated time:
8.00 h
Spent time:
Scope:
Internal
Experiment:
-
SSI Package:
art
Duration:

Description

Based on our experience with the protoDUNE DAQ, we would like to make two requests:
  1. we would like the ability to zero-pad the file index field in a filename
    • (e.g. artdaqdemo_r000032_sr01_*0019*_dl1.root using "%04#" in the filename pattern)
  2. we would like the file index to reset to one at the start of each run

I've attached modified art source code files (based on art tag v2_06_03) that implement these two changes for your consideration.

I believe that the changes are straightforward, but I'll be glad to meet to discuss any part of the requests or the sample implementation.

One note is that I was unable to verify the typical art behavior of adding "_1" to the end of the filename when an accidental attempt is made to write a file with an identical name to an existing file. This may be due to the fact that I don't remember what we decided on how duplicate output files should be handled.

Thanks,
Kurt

rootOutputConfigurationTools.cc (2.94 KB) rootOutputConfigurationTools.cc in art/Framework/IO/Root/detail Kurt Biery, 11/16/2017 12:58 PM
FileStatsCollector.h (3.1 KB) FileStatsCollector.h in art/Framework/IO Kurt Biery, 11/16/2017 12:58 PM
FileStatsCollector.cc (2.82 KB) FileStatsCollector.cc in art/Framework/IO Kurt Biery, 11/16/2017 12:58 PM

History

#1 Updated by Christopher Green over 2 years ago

  • Status changed from New to Feedback

Would you want the file index resetting to be an option, or a behavior change? If the latter, we will need to bring the matter to the stakeholders. The formatting option should be doable without changing existing behavior.

As to your final comment, the index-addition behavior (by request) only occurs when an output file would overwrite a previous output file of the same name in the same art invocation. Output files from previous jobs (_e.g. for tests and development cycle activities) are overwritten.

#2 Updated by Christopher Green over 2 years ago

  • Category set to I/O

#3 Updated by Kurt Biery over 2 years ago

Hi Chris,
It is fine for the resetting of the file-index number to be an option (presumably enabled with a configuration parameter).

I believe that the tests that I ran which did not demonstrate the "_1" filename suffix were within a single art job. I don't know how my candidate code changes might have affected that behavior, but I'll try to run some more tests with unmodified vs. modified code.
Thanks!
Kurt

#4 Updated by Kyle Knoepfel over 2 years ago

Kurt, your last statement is correct--we entirely removed the automatic addition of a '_1' suffix if a file of the same name already exists. However, as a safe-guard against losing data, if file-switching has been enabled (via specifying the fileProperties table in the RootOutput configuration), it is a configuration error to not specify the %# pattern in the filename.

#5 Updated by Kyle Knoepfel over 2 years ago

There is an issue regarding resetting the index of the file that is worth bringing up. Consider the situation where events are presented out of order. If the output module were configured to switch to a new file after 1000 events, reasonable behavior would be to create output files that look like:

2000 events from Run 1:
r1_1.root
r1_2.root

2000 events from Run 2:
r2_1.root
r2_2.root

2000 more events from Run 1:
r1_3.root
r1_4.root

Note that the filenames for the "2000 more events from Run 1" do not have an index reset to 1. Providing this behavior does require extra caching that would not be necessary if events were presented in order. The extra caching would basically be to keep a table of the prefixes (e.g. "r1" or "r2") and their associated indices. Even if you had 1000 output files in the job, this would not lead to significant bloat. However, the question is would such caching be acceptable to artdaq, or would you prefer an option to disable the extra caching under the assumption that events from a given Run/SubRun will always be together?

#6 Updated by Kurt Biery over 2 years ago

Hi Kyle,
I believe that the run number caching that you describe would not be strictly needed in the online since there should be external controls against earlier run numbers being re-used.
However, if such a caching mechanism will be implemented for non-online use cases, it seems like a useful safety net that we would likely take advantage of in online environments.
As to whether it would need a disable option - that doesn't seem to be necessary.
Thanks,
Kurt

#7 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to Kyle Knoepfel
  • Target version set to 2.10.00
  • Estimated time set to 8.00 h

#8 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100
  • SSI Package art added

Implemented with commits:

#9 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF