Project

General

Profile

Bug #1151

Navigation within file of mixed runs and subruns

Added by Mark Messier over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Urgent
Category:
Navigation
Target version:
Start date:
04/05/2011
Due date:
% Done:

100%

Estimated time:
Occurs In:
Scope:
Internal
Experiment:
-
SSI Package:
Duration:

Description

I'm getting crashes when trying to navigate within a file which contains events from multiple runs and subruns. For example, an input source "skipEvents(-1)" fails with the message:

/archives/nova/novasoft/art_suite/v0_06_00/source/art/art/Framework/Core/DecrepitRelicInputSourceImplementation.cc:297: virtual std::auto_ptr<art::EventPrincipal> art::DecrepitRelicInputSourceImplementation::readEvent(boost::shared_ptr<art::SubRunPrincipal>): Assertion `srp->run() result->run()' failed.
Aborted

Also, an attempt to skip to a selected event using code like:

art::EventID id(art::SubRunID::invalidSubRun(art::RunID(targetRun), targetEvent);
fInputSource->readEvent(id);

produces:

/archives/nova/novasoft/art_suite/v0_06_00/source/art/art/Framework/Core/DecrepitRelicInputSourceImplementation.cc:297: virtual std::auto_ptr&lt;art::EventPrincipal&gt; art::DecrepitRelicInputSourceImplementation::readEvent(boost::shared_ptr&lt;art::SubRunPrincipal&gt;): Assertion `srp->run() result->run()' failed

Both of the above work fine when the file contains events from only one run and subrun.


Related issues

Related to art - Necessary Maintenance #1197: Unit / regression tests for RootInput random access.Accepted04/14/201109/30/2013

Associated revisions

Revision 29c9a5b8 (diff)
Added by Christopher Green over 8 years ago

Fix for issue #1151: support random access for event display.

Revision 318a3e95 (diff)
Added by Christopher Green over 8 years ago

Fix for remaining aspect of issue #1151.

Revision 197a4e29 (diff)
Added by Christopher Green over 8 years ago

Fix for remaining aspect of issue #1151.

Revision d96239f9 (diff)
Added by Christopher Green over 8 years ago

Bump version for further fix to issue #1151.

History

#1 Updated by Christopher Green over 8 years ago

  • Category set to Navigation
  • Status changed from New to Feedback
  • Assignee set to Christopher Green

Please provide the location of this composite data file, details of how you were reading this file and also how to re-generate said file. This will allow us to investigate the problem fully most efficiently.

#2 Updated by Mark Messier over 8 years ago

The file I'm using is:

/nova/app/users/messier/data/filterevents.root

which you should be able to see from any of the novagpvm's. The file was created by filtering the output of many input files and sending the results to a single file. I don't know the exact command but it would have been something like "nova -c canajob.xml numi_files_*.root". If you need more details presumably you can find out from the provenance information.

To produce the crash, start the event display:

nova -c evd.fcl filterevents.root

The first event in the file is run/event = 10893/314724. The second event in the file is 10999/456138; third is 11107/39469.

I can make this crash in three ways, all of which seem to be a symptom of the same underlying problem.

#1: Click "Next" then "Previous"
#2: Open up any of the dialogs listed under "Edit" and click "Apply"
#3: Try to go to event 11107/39468 by typing the run and event into the boxes in the upper right

#1 triggers a skipEvents(-2), #2 triggers a skipEvents(-1), and #3 triggers a readEvent(...).

All three of these work fine when using files that contain events from a single run and subrun.

#3 Updated by Christopher Green over 8 years ago

We have examined the input file and it appears to have a somewhat corrupted metadata structure. We really need to be able to reproduce this file. If you can point us to the numi_file*.root files (or similar) that you used, we would be very grateful.

#4 Updated by Mark Messier over 8 years ago

I've reduced this to what I think is the simplest case possible. Using the NOvA software, run the following program:

% nova -c runeventfiltjob.fcl /nova/app/users/messier/novasoft/art-work/Utilities/ndos*.root

Using this event list file:

/nova/app/users/messier/novasoft/art-work/Utilities/event-list.txt

This picks out the first event in the three runs listed above and writes them to a single file called "runevtfilt-evt.root". Now look at that file with the event display:

% nova -c evd.fcl runevtfilt-evt.root

and try any of the three cases above:

1. Try "Next" followed by "Previous"
2. Try Edit->[anything]->Apply, then "Next"
3. Try going to run/event 11886/1

All three of these will crash.

#5 Updated by Christopher Green over 8 years ago

  • Status changed from Feedback to Resolved
  • Target version set to 0.06.02
  • % Done changed from 0 to 100
This issue has been fixed. Necessary updates to EventDisplay.cxx will be synchronized with updating the NOvA software to use v0_06_02, but the relevant points are:
  1. RootInput is now used directly (as a result of dynamic_cast<>) instead of indirectly through InputSource's virtual interface.
  2. The post-event routine now looks like this:
      if (NavState::Which() == kNEXT_EVENT) {
        // Contrary to appearances, this is *not* a NOP: it ensures run and
        // subRun are (re-)read as necessary if we've been doing random
        // access. Come the revolution ...
        //
        // 2011/04/10 CG.
        fInputSource->seekToEvent(0);
      }
      else if (NavState::Which() == kPREV_EVENT) {
        fInputSource->seekToEvent(-2);
      }
      else if (NavState::Which() == kRELOAD_EVENT) {
        fInputSource->seekToEvent(evt.id());
      }
      else if (NavState::Which() == kGOTO_EVENT) {
        art::EventID id(art::SubRunID::invalidSubRun(art::RunID(NavState::TargetRun())), NavState::TargetEvent());
        if (!fInputSource->seekToEvent(id)) { // Couldn't find event
          std::cout << "Unable to find " 
                    << id
                    << " -- reloading current event." 
                    << std::endl;
          // Reload current event.
          fInputSource->seekToEvent(evt.id());
        }
      }
      else abort();
    

#6 Updated by Mark Messier over 8 years ago

Thanks, Chris, for the quick attention to this.

#7 Updated by Mark Messier over 8 years ago

  • Status changed from Resolved to Feedback

Uh oh. This might not be resolved. I am again having trouble navigating through a file produced from a filter job. Here's the coordinates:

I produced this file: /nova/app/users/messier/novasoft/art-work/Commissioning/cana-evt-save.root

using the command:

% nova -c canajob.fcl -S short.txt [reference the same directory above]

which when I try to look at with the event display produces an error:

% nova -c evd.fcl cana-evt-save.root

% nova: /nusoft/app/externals/art_suite/v0_06_02/source/art/art/Framework/IO/Root/RootInputFile.cc:715: boost::shared_ptr<art::SubRunPrincipal> art::RootInputFile::readSubRun(cet::exempt_ptr<const art::ProductRegistry>, boost::shared_ptr<art::RunPrincipal>): Assertion `fileIndexIter_->getEntryType() == FileIndex::kSubRun' failed.
Aborted

after advancing 6 events into the file. The file has 22 events in it.

Sorry, but again this is urgent...

#8 Updated by Christopher Green over 8 years ago

  • Status changed from Feedback to Assigned
  • Priority changed from High to Urgent

Analyzing now.

Chris.

#9 Updated by Christopher Green over 8 years ago

  • % Done changed from 100 to 70

#10 Updated by Christopher Green over 8 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 70 to 100

The partially-incomplete fix in version v0_06_02 has been made complete with commit d96239f. My apologies for the omission.

Lynn is preparing release v0_06_03 as we speak. No changes to NOvA or ArgoNeuT software are required. Please be sure to do a clean build, though.

#11 Updated by Mark Messier over 8 years ago

Thanks again for the quick attention to this. I hope we can take it into our release ASAP.

I'll close this issue with a question and suggestion.

What sort of unit testing of this navigation is there? I think this is something that one has to pay very rigorous and strict attention to. FWIT, in FMWK I had test executables which would generate standard files and then perform all the permutations of navigation (skipping ahead, skipping back, trying to read past the end of a file, trying to read past the begin of a file, skipping to a run, skipping to an event, skipping to a file, skipping across a file boundary forward, and backward, etc. etc.) and test that the target event numbers were exactly correct or that the correct exceptions were thrown. This stuff really needs to be bullet proof.

#12 Updated by Christopher Green over 8 years ago

The framework as inherited from CMS had no unit-testing of the random access software and this feature had not appeared previously on our radar as something that didn't, "just work."

The fixes to this issue (both the initial one and the amendment in d96239f) were tested entirely using the NOvA event display and the files you provided to us exhibiting the problem, in the interests of addressing your the issue most quickly. Our intention is certainly to do as you suggest: we have the ability already to create an event containing an arbitrary set of runs, subRuns and events. To that we would add unit tests utilizing a service to to random access in much the same way as the NOvA event display and an analyzer to verify events are indeed read in the expected order.

#13 Updated by Christopher Green over 8 years ago

  • Target version changed from 0.06.02 to 0.06.03

#14 Updated by Christopher Green over 8 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF