Project

General

Profile

Proposal for data slimming in MCC84 Nov 2017

Motivation

In order to reduce storage volume and drastically decrease the time needed to process data and MC samples, MicroBooNE is planning to drop several data products at the Reco2 stage. The current event sizes and computing model will not allow for the processing of events in time for Neutrino2018.

Brief version of the proposal

The following data products will be removed from the Reco2 data files: Raw digits and Reco1 Filtered Raw Digits.

The following data products will be removed from the Reco2 MC files: Raw digits, Reco1 Filtered Raw Digits from Reco1, MCHits, and SimChannels from Detsim.

In addition to dropping those products from Reco2 MC files, associations (along with metadata) will be created between the reco hit and MCParticles that can be used within AnalysisTree and MCTruthMatching_module in lieu of the backtracker which will no longer work.

Specific names of data products and fcl parameters specified at the bottom of this page.

Data Product volumes and reduction factors

All numbers are qouted in MiB/evt.

MCC Version Def Reco2 Size Raw Digit SimChan G4 MCHit Reco1 Filtered Raw Digit Slim Reco 2 Ratio
MCC 8.3 190.2 40.9 32.3 32.0 27.3 58.6 0.308
MCC 8.4 188.9 40.8 31.4 31.0 27.2 58.5 0.309
Data 8.4 85.6 45.1 N/A N/A 24.7 (filtered raw) 15.7 0.184

Data storage model for MicroBooNE

Right now these are just the thoughts of Michael Kirby. The current resources that are available to MicroBooNE for writing data to tape storage are T10KD drives and tapes. There are 19 total drives of that type on site. The current status (read/write/seek) of drives can be seen here:

http://www-stken.fnal.gov/cgi-bin/active_volumes.sh (only available on site or through FNAL VPN)

The expectation is that these drives can read or write at approximately 125 MB/s and that MicroBooNE should not expect to have more than 4 drives available steady state, but could request more access for campaigns. For running at 1 Hz, the data volume for the raw and swizzled data is based upon 32 MiB/evt ubdaq and 30 MiB/evt for swizzled files. This is 62 MiB/s at 1 Hz DAQ readout and occupies a drive half time. If we add in running reconstruction on those same events (85 MiB/evt), the steady state swizzling and reconstruction will occupty one tape drive completely at 147 MiB/s. This leaves 3 drives for reading and writing files. Expect that during reprocessing the files there will be 1 tape drive occupied staging files, and 1 tape drive writing newly processed files back to tape (3 out of 4 drives accounted for now). The additional drive would be used to write MC samples back into permanent storage. For an event size of ~190 MiB/evt, this will have a throughput of no greater than 1 evt/s with a storage rate of 86400 evts/day. Event if the number of available drives goes to 8 drives for MicroBooNE and therefore 5 drives for MC, the time to process and write 10 Million MC events is 23 days. This is 2 months for the full reprocessing of the MCC8.0 samples into MCC8.4.

Data Keep Up

Data Stream Evt Rate Evt Size Data Volume Tape Drives
Raw DAQ 1 Hz 32 MiB 32 MiB/s 0.26
Raw Swizzle 1 Hz 30 MiB 30 MiB/s 0.25
Reco Keep up 1 Hz 86 MiB 86 MiB/s 0.688
Total Data 1 Hz 146 MiB 146 MiB/s 1.17

Data Reco

Sample Num Evts Evt Size Data Volume Days to R or W with 1 Drive
BNB 5E19 Read 547616 40 MiB 46547360 1.90
BNB 5E19 Write 547616 85 MiB 46547360 4.31
Slim BNB 5E19 Write 547616 15 MiB 46547360 0.79
Run 1 BNB Read 2001827 40 MiB 80154369 7.42
Run 1 BNB Write 2001827 85 MiB 170155295 15.8
Slim Run 1 BNB Write 2001827 15 MiB 30027405 2.78
Run 2 BNB Read 5937997 30 MiB 178139910 16.5
Run 2 BNB Write 5937997 85 MiB 504729745 46.7
Slim Run 2 BNB Write 5937997 15 MiB 504729745 8.6

MC

Sample Num Evts Evt Size Data Volume Days to R or W with 1,2, or 3 Drives
BNB + Cosmics DetSim Read 10000000 120 MiB 1200000000 111, 55.6, 37.3
BNB + Cosmics Reco2 Write 10000000 190 MiB 1900000000 176, 88, 59.7
Slim BNB + Cosmics Reco2 Write 10000000 58 MiB 1900000000 54, 27, 18

Proposed dropped data products from MCC8.4 Reco2 files

For MC files

outputs:
{
 out1:
 {
   module_type: RootOutput
   fileName:    "drop_wires.root" #default file name, can override from command line with -o or --output
   dataTier:    "reconstructed" 
   outputCommands: ["keep *_*_*_*",  "drop raw::RawDigits_daq__Detsim", "drop sim::SimChannels_largeant__G4", "drop sim::MCHitCollections_mchitfinder__McRecoStage1", "drop raw::RawDigits_wcNoiseFilter__McRecoStage1"]
   compressionLevel: 1
 }
}

For Data files

outputs:
{
 out1:
 {
   module_type: RootOutput
   fileName:    "drop_wires.root" #default file name, can override from command line with -o or --output
   dataTier:    "reconstructed" 
   outputCommands: ["keep *_*_*_*",  "drop raw::RawDigits_daq__Swizzler", "drop raw::RawDigits_wcNoiseFilter__DataRecoStage1"]
   compressionLevel: 1
 }
}