A std::bad_alloc exception is being thrown on data/mc files
When trying to produce output ROOT data files (but not histogram files) with NOVASOFT, a std::bad_alloc exception is thrown:
%MSG-s ArtException: PostCloseFile 31-Oct-2011 17:10:17 CDT PostEndRun
cet::exception caught in art
---- EventProcessorFailure BEGIN
An exception occurred during current event processing
---- ScheduleExecutionFailure BEGIN
---- BadAlloc BEGIN
A std::bad_alloc exception occurred during a call to the module
ExploringData/ExploringData run: 12110 subRun: 0 event: 10
The job has probably exhausted the virtual memory available to the
---- BadAlloc END
Exception going through path mcana
---- ScheduleExecutionFailure END
cet::exception caught in EventProcessor and rethrown
---- EventProcessorFailure END
This can be replicated in NOVASOFT by doing the following:
nova -c job/trackmergejob.fcl /nova/data/mc/S11.07.27/singlenu/ndos/sim_singlenu_numubar_cc_ndos_nhc_none_10000_0.root
This is a MC file from an earlier tag. It has also been reported that the following accompanies the exception when trying to open files with the EventDisply: no dictionary for class art::Ptr<rb::CellHit> is available
such as: nova -c evd.fcl /nova/data/novaroot/NDOS/S11.07.27/000121/12117/cosmic/reco_r00012117_s01_t02_cosmic_S11.07.27.root using the following evd_services.fcl configured file: /nova/app/users/gsdavies/art/evd_services.fcl
#1 Updated by Marc Paterno almost 8 years ago
- Category set to Infrastructure
- Status changed from New to Assigned
- Assignee set to Marc Paterno
- Target version set to 1.00.04
Gavin, Walter, Philippe and Marc have isolated what seems to be the source of the problem, a bug in Root. Philippe is working on a solution, and Walter and Marc are working toward an integration test (which needs to be run against an old data file, which we have generated) that will verify that the upcoming fix works.
We have good evidence that the bug, which manifests through a std::bad_alloc exception when run on NOvA's virtual machine and as a segmentation violation on oink (which has more memory than NOvA's VM), is the result of reading the size of a collection (PtrVector<rb::CellHit>); the apparent size was 0x400000e6 (more than 1 billion).
#2 Updated by Walter E Brown almost 8 years ago
The good news: As of a few minutes ago, Philippe reports that he has been able to correct the Root code that gave rise to this issue.
The bad news: The corrected code has led to the discovery of another problem in Root, and it, too, must be addressed as part of this issue. We are awaiting its resolution.
#3 Updated by Christopher Green almost 8 years ago
- Status changed from Assigned to Resolved
- Target version changed from 1.00.04 to 1.00.05
- % Done changed from 0 to 100
1.00.05 works around this problem, so I will mark this resolved. At such time as we move to a ROOT version which has this bug fix, we will remove the workaround and test.