Project

General

Profile

Bug #2087

A std::bad_alloc exception is being thrown on data/mc files

Added by Gavin Davies almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
Infrastructure
Target version:
Start date:
11/01/2011
Due date:
% Done:

100%

Estimated time:
Occurs In:
Scope:
Internal
Experiment:
-
SSI Package:
Duration:

Description

When trying to produce output ROOT data files (but not histogram files) with NOVASOFT, a std::bad_alloc exception is thrown:

%MSG-s ArtException: PostCloseFile 31-Oct-2011 17:10:17 CDT PostEndRun
cet::exception caught in art
---- EventProcessorFailure BEGIN
An exception occurred during current event processing
---- ScheduleExecutionFailure BEGIN
ProcessingStopped.
---- BadAlloc BEGIN
A std::bad_alloc exception occurred during a call to the module
ExploringData/ExploringData run: 12110 subRun: 0 event: 10
The job has probably exhausted the virtual memory available to the
process.
---- BadAlloc END
Exception going through path mcana
---- ScheduleExecutionFailure END
cet::exception caught in EventProcessor and rethrown
---- EventProcessorFailure END
%MSG

This can be replicated in NOVASOFT by doing the following:

source /grid/fermiapp/nova/novaart/novasoft/setup/setup_novasoft_nusoft.sh

nova -c job/trackmergejob.fcl /nova/data/mc/S11.07.27/singlenu/ndos/sim_singlenu_numubar_cc_ndos_nhc_none_10000_0.root

This is a MC file from an earlier tag. It has also been reported that the following accompanies the exception when trying to open files with the EventDisply: no dictionary for class art::Ptr<rb::CellHit> is available

such as: nova -c evd.fcl /nova/data/novaroot/NDOS/S11.07.27/000121/12117/cosmic/reco_r00012117_s01_t02_cosmic_S11.07.27.root using the following evd_services.fcl configured file: /nova/app/users/gsdavies/art/evd_services.fcl

History

#1 Updated by Marc Paterno almost 8 years ago

  • Category set to Infrastructure
  • Status changed from New to Assigned
  • Assignee set to Marc Paterno
  • Target version set to 1.00.04

Gavin, Walter, Philippe and Marc have isolated what seems to be the source of the problem, a bug in Root. Philippe is working on a solution, and Walter and Marc are working toward an integration test (which needs to be run against an old data file, which we have generated) that will verify that the upcoming fix works.

We have good evidence that the bug, which manifests through a std::bad_alloc exception when run on NOvA's virtual machine and as a segmentation violation on oink (which has more memory than NOvA's VM), is the result of reading the size of a collection (PtrVector<rb::CellHit>); the apparent size was 0x400000e6 (more than 1 billion).

#2 Updated by Walter E Brown almost 8 years ago

The good news: As of a few minutes ago, Philippe reports that he has been able to correct the Root code that gave rise to this issue.

The bad news: The corrected code has led to the discovery of another problem in Root, and it, too, must be addressed as part of this issue. We are awaiting its resolution.

#3 Updated by Christopher Green almost 8 years ago

  • Status changed from Assigned to Resolved
  • Target version changed from 1.00.04 to 1.00.05
  • % Done changed from 0 to 100

1.00.05 works around this problem, so I will mark this resolved. At such time as we move to a ROOT version which has this bug fix, we will remove the workaround and test.

#4 Updated by Christopher Green almost 8 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF