Feature #18058
Add to TFileService a mechanism to switch file during processing
Description
I would say that the general use case would be simply a trigger (event number periodicity, a flag in the data, a signal coming from some control unit) that makes TFileService clone the current stuff, save it in the open file, close the file, open a new one, maybe reset histos&co.
A higher level description is that we have an online monitor made of a chain of producers and analyzers, and we want to switch to a new file every now and then instead of waiting for the run to finish to have one single file properly closed.
Our current approach in the data logger is to switch to a new file every 1000 events, and RootOutput allows it easily. A similar, or even more generic, behaviour would be desirable also outside of the data path, i.e. for online monitoring histograms and event display.
We would need this feature rather urgently, for ProtoDUNE ColdBox tests, and we are willing to help to implement it.
History
#1 Updated by Kyle Knoepfel about 3 years ago
Tonino, are you able to discuss this issue by video conference? If so, I propose sometime Monday morning (FNAL time)--perhaps 9 am--I believe that would be 3 pm CERN time once you take into account daylight savings ending this weekend in Europe but not the US.
#2 Updated by Antonino Sergi about 3 years ago
Hi Kyle,
since the usual daq meeting has been cancelled, it would be better at 10am, if it's ok for you.
#3 Updated by Kyle Knoepfel about 3 years ago
10 am should be okay.
Connection information
Join from PC, Mac, Linux, iOS or Android: https://fnal.zoom.us/j/6766408743
Or iPhone one-tap :
US: +16699006833,,6766408743# or +16465588656,,6766408743#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 669 900 6833 or +1 646 558 8656
Meeting ID: 676 640 8743
International numbers available: https://fnal.zoom.us/zoomconference?m=n6YR0dydyhMd-ezhEBk0Imqh2uEuXhSI
Or an H.323/SIP room system:
H.323:
162.255.37.11 (US West)
162.255.36.11 (US East)
221.122.88.195 (China)
115.114.131.7 (India)
213.19.144.110 (EMEA)
202.177.207.158 (Australia)
209.9.211.110 (Hong Kong)
64.211.144.160 (Brazil)
69.174.57.160 (Canada)
Meeting ID: 676 640 8743
#4 Updated by Antonino Sergi about 3 years ago
The meeting I had in 10 minutes was cancelled, so if you prefer 9am I'm available. I'm already connected
#5 Updated by Kyle Knoepfel about 3 years ago
- Priority changed from Normal to Urgent
#6 Updated by Rob Kutschke about 3 years ago
I won't be able to make the meeting but here is my input.
Please think about the use case that only some modules in the job want this behaviour. Is it practical to support that or is this an all or none proposition?
I recommend that you minimize the scope of work done by art and maximize the scope of work done by user called that is registered with art and is called by art at the appropriate times - this gives end users maximum flexibility.
Here is one example of minimizing work within art. There can be many triggers for the close-file-open-new-file sequence. A few are straightforward to define and can be implemented within art: assert the trigger every N events, every Run, every Subrun, every input file, every output file on the output module with a specified modulelabel. If people want more complex triggers they should go into user written registered callbacks, not into art infrastructure. An example of this might be writing output files and resetting histograms when some particular histogram reaches a certain number of entries or when a statistically significant anomaly is discovered.
A second example of minimizing work within art. I think that art's job should be limited to opening and closing files; and poking user written callbacks to do the rest. I think it's far too much work to develop a configuration language that explains to art which of the possible actions it should take on which histograms in which modules.
As much as possible things like making copies of histograms in the new file or resetting histograms etc should be done by callbacks to user code. If possible the trigger should also be asserted by registering callbacks to user code.
#7 Updated by Kyle Knoepfel about 3 years ago
- Status changed from New to Assigned
- Assignee set to Kyle Knoepfel
We believe we can provide an implementation that places reasonable demands on the user without introducing unreasonable entanglements within art
.
#8 Updated by Kyle Knoepfel about 3 years ago
- Category set to Application
- Estimated time set to 16.00 h
- SSI Package art added
#9 Updated by Kyle Knoepfel about 3 years ago
- % Done changed from 0 to 90
This feature has been largely committed with commit art:5c502b4c. Some remaining issues may need to be discussed with the stakeholders and with the art team.
Documentation regarding how it can be enabled is forthcoming.
#10 Updated by Kyle Knoepfel about 3 years ago
- File CMakeLists.txt CMakeLists.txt added
- File TFileDirectory.cc TFileDirectory.cc added
- File TFileDirectory.h TFileDirectory.h added
- File TFileService_service.cc TFileService_service.cc added
- File TFileService.h TFileService.h added
- Scope changed from Internal to External
Please see the attached files that include versions of TFileService
and TFileDirectory
, which support the file-switching behavior you seek. In addition, there is a CMakeLists.txt
file that gives you the appropriate library dependencies, assuming you're using mrb
/cetbuildtools
.
There will likely be some adjustments required whenever the updated TFileService
is released with art
2.10.00. For that reason, I encourage you to rename your copy of the service to something else. This will avoid unintentional collisions with the officially-released version.
We spent considerable time trying to find a way to allow TFileService
to seamlessly switch to a new input file without users needing to change their C++ source code. Unfortunately, since TFileService
exposes various cached objects and allows users to cache pointers to ROOT objects, it was not feasible to do this given ROOT's memory management. Even though it may be possible to retain the pointer values for anything associated with a TFileService
file, whenever a new file is opened, the directory pointer cached by the TObject
s would be incorrect. Adjusting that pointer is difficult as is recreating a new TFile
object at the same memory address as the old file without causing memory corruption issues.
For that reason, we have added new interface to TFileService
that allows users to specify which actions should be performed after a file switch has occurred. This gives users the opportunity to update their pointers at the right time, but it still conceals the file-switching mechanics that are necessary for the framework to handle. See below for documentation.
Job configuration¶
To configure the TFileService
to switch to a new output file for a given condition, the optional fileProperties
table can be specified, just as it can be specified for any RootOutput
module:
## Any parameters prefaced with '#' are optional. TFileService: { closeFileFast: true # default fileName: <string> tmpDir: "<parent-path-of-filename>" # default # fileProperties: { # # maxEvents: 4294967295 # default # # maxSubRuns: 4294967295 # default # # maxRuns: 4294967295 # default # # maxInputFiles: 4294967295 # default # # ## Maximum size of file (in KiB) # # maxSize: 2130706432 # default # # ## Maximum age of output file (in seconds) # # maxAge: 4294967295 # default # # ## The 'granularity' parameter specifies the level at which # ## an output file may be closed, and thereby the granularity # ## of the file. The following values are possible: # ## # ## Value Meaning # ## ======================================================= # ## "Event" Allow file switch at next Event # ## "SubRun" Allow file switch at next SubRun # ## "Run" Allow file switch at next Run # ## "InputFile" Allow file switch at next InputFile # ## "Job" File closes at the end of Job # ## # ## For example, if a granularity of "SubRun" is specified, but the # ## output-module has reached the maximum events written to disk (as # ## specified by the 'maxEvents' parameter), the output module will NOT # ## switch to a new file until a new SubRun has been reached (or # ## there are no more Events/SubRuns/Runs to process). # # granularity: "Event" # default # } }
The comment needs to be updated so that it does not refer to the "output module", but to a "service" instead.
N.B. If the fileProperties
parameter is specified, then the only modules using TFileService
that can be used in the job are those that implement the changes described in the next section.
Usage in modules¶
If you want your module to be usable whenever a user enables TFileService
file-switching, then it must provide a file-switch callback. For example:
TestTFileService::TestTFileService(Parameters const& p)
: EDAnalyzer{p}
{
ServiceHandle<TFileService> fs;
fs->registerFileSwitchCallback(this, &TestTFileService::setRootObjects);
setRootObjects();
}
void
TestTFileService::setRootObjects()
{
ServiceHandle<TFileService> fs;
h1_ = fs->make<TH1F>("test1", "test histogram #1", 100, 0., 100.);
}
In this example, we have provided the 'void setRootObjects()'
function as a callback to the TFileService
. After a file-switch has occurred, the callback will be invoked, resetting h1_
to a new histogram.
N.B. The callbacks are invoked only after a file switch has occurred--they are not invoked after the constructor of the module has been called. Hence why we have called setRootObjects
in the constructor as well as provided it as a callback to the TFileService
.
If the job is configured to switch TFileService
files, and one of the modules does not register a callback, you will see an error like (e.g.):
---- Configuration BEGIN A TFileService error occured while attempting to make a directory or ROOT object. File-switching has been enabled for TFileService. All modules must register a callback function to be invoked whenever a file switch occurs. The callback must ensure that any pointers to ROOT objects have been updated. No callback has been registered for module 'a1'. Contact artists@fnal.gov for guidance. ---- Configuration END
An unfortunate consequence¶
Because art
does not directly interact with any objects created via the TFileService
, art
cannot be smart about when to open a ROOT file--art
must be cautious. To that end, a ROOT file is created (a) whenever the TFileService
is constructed, and (b) immediately after a file has been closed during a file switch. This means that in some cases, it is possible to get an extra file that is not necessarily desired. For example:
TFileService
is configured to switch after 10 events- The job is configured to process only 10 events
- After the 10th event is processed,
TFileService
closes one file and opens a new one - The
endSubRun
andendRun
calls are made to all the modules - The newly opened file is then closed, even though the histograms may be empty if none of them were filled during
endSubRun
orendRun
.
This consequence is largely unavoidable. It may be possible to improve the situation, but there is no apparent way to do so right now without imposing additional restrictions on TFileService
usage.
Future directions¶
There was a request for the art
team to be cognizant of user-defined conditions that would trigger a file-switch. Although we have not exposed any interface to allow that at this time, it would be straightforward to add the feature in a non-breaking way in the future.
#11 Updated by Kyle Knoepfel about 3 years ago
- Target version set to 2.10.00
#12 Updated by Kyle Knoepfel about 3 years ago
- Status changed from Assigned to Resolved
- % Done changed from 90 to 100
#13 Updated by Kyle Knoepfel about 3 years ago
- File TFileDirectory.p TFileDirectory.p added
The directory-setting when invoking callbacks was not being done properly. This has been fixed--please see the attached patch file.
#14 Updated by Kyle Knoepfel about 3 years ago
- Status changed from Resolved to Closed