Two requests regarding the file index field in output filenames
- we would like the ability to zero-pad the file index field in a filename
- (e.g. artdaqdemo_r000032_sr01_*0019*_dl1.root using "%04#" in the filename pattern)
- we would like the file index to reset to one at the start of each run
I've attached modified art source code files (based on art tag v2_06_03) that implement these two changes for your consideration.
I believe that the changes are straightforward, but I'll be glad to meet to discuss any part of the requests or the sample implementation.
One note is that I was unable to verify the typical art behavior of adding "_1" to the end of the filename when an accidental attempt is made to write a file with an identical name to an existing file. This may be due to the fact that I don't remember what we decided on how duplicate output files should be handled.
#1 Updated by Christopher Green almost 3 years ago
- Status changed from New to Feedback
Would you want the file index resetting to be an option, or a behavior change? If the latter, we will need to bring the matter to the stakeholders. The formatting option should be doable without changing existing behavior.
As to your final comment, the index-addition behavior (by request) only occurs when an output file would overwrite a previous output file of the same name in the same art invocation. Output files from previous jobs (_e.g. for tests and development cycle activities) are overwritten.
#3 Updated by Kurt Biery almost 3 years ago
It is fine for the resetting of the file-index number to be an option (presumably enabled with a configuration parameter).
I believe that the tests that I ran which did not demonstrate the "_1" filename suffix were within a single art job. I don't know how my candidate code changes might have affected that behavior, but I'll try to run some more tests with unmodified vs. modified code.
#4 Updated by Kyle Knoepfel almost 3 years ago
Kurt, your last statement is correct--we entirely removed the automatic addition of a
'_1' suffix if a file of the same name already exists. However, as a safe-guard against losing data, if file-switching has been enabled (via specifying the
fileProperties table in the
RootOutput configuration), it is a configuration error to not specify the
%# pattern in the filename.
#5 Updated by Kyle Knoepfel almost 3 years ago
There is an issue regarding resetting the index of the file that is worth bringing up. Consider the situation where events are presented out of order. If the output module were configured to switch to a new file after 1000 events, reasonable behavior would be to create output files that look like:
2000 events from Run 1:
2000 events from Run 2:
2000 more events from Run 1:
Note that the filenames for the "2000 more events from Run 1" do not have an index reset to 1. Providing this behavior does require extra caching that would not be necessary if events were presented in order. The extra caching would basically be to keep a table of the prefixes (e.g. "r1" or "r2") and their associated indices. Even if you had 1000 output files in the job, this would not lead to significant bloat. However, the question is would such caching be acceptable to
artdaq, or would you prefer an option to disable the extra caching under the assumption that events from a given Run/SubRun will always be together?
#6 Updated by Kurt Biery almost 3 years ago
I believe that the run number caching that you describe would not be strictly needed in the online since there should be external controls against earlier run numbers being re-used.
However, if such a caching mechanism will be implemented for non-online use cases, it seems like a useful safety net that we would likely take advantage of in online environments.
As to whether it would need a disable option - that doesn't seem to be necessary.