Project

General

Profile

Bug #7530

MixFilter can't handle mix files that have had events filtered out

Added by Christopher Backhouse over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
High
Category:
I/O
Target version:
Start date:
12/17/2014
Due date:
% Done:

100%

Estimated time:
3.00 h
Spent time:
Occurs In:
Scope:
Internal
Experiment:
NOvA
SSI Package:
art
Duration:

Description

When using a trivial MixFilter module I run into this error:

%MSG-w ScheduleExecutionFailure:  PostProcessPath p1 17-Dec-2014 17:07:00 CST  run: 16838 subRun: 0 event: 96 
an exception occurred and all paths for the event are being skipped: 
---- ScheduleExecutionFailure BEGIN
  ProcessingStopped.

  ---- ProductNotFound BEGIN
    While processing products of type lem::PIDDetailss for merging: a secondary event was missing a product.
    cet::exception going through module LEMMixer/lemmix run: 16838 subRun: 0 event: 96
  ---- ProductNotFound END
  Exception going through path p1
---- ScheduleExecutionFailure END
%MSG

The "main" file is
/nova/ana/users/bckhouse/tmp/fardet_genie_geojittered_fhc_swap_none_1000_r00016838_s00_FA14-10-28_v1_20141012_162044_iberis67.farm.particle.cz_1413162956_21517_0_sim.pidpart.root
and the file to be mixed is
/nova/ana/users/bckhouse/tmp/fardet_genie_geojittered_fhc_swap_none_1000_r00016838_s00_FA14-10-28_v1_20141012_162044_iberis67.farm.particle.cz_1413162956_21517_0_sim.lempart.root

Everything is fine until event 96, when the above message appears, and repeats for every subsequent event. No products are mixed in for those events.

I think that what happens is that event 96 was filtered out of these files at an earlier stage of processing (the files are generated with 1000 events, but I only get 999 from Events->GetEntries()).

The expected behaviour is for the mixing logic to skip over this event when drawing from the input file, but apparently that doesn't happen.

This may be a regression, but we also may never have tried mixing filtered files before.
The oldest version of our software we have that can still read these files uses art v1_11_02.

The mixer module is at /nova/app/home/novasoft/slf6/novasoft/releases/development/LEM/LEMMixer_module.cc with fcl /nova/app/home/novasoft/slf6/novasoft/releases/development/LEM/LEMMixer.fcl
To run you would specify the mixer file in fileNames and set FetchLempartFromSAM: false

This bug is holding up important NOvA processing. We can't recombine our PID files in any case where events have been filtered out (which sometimes happens in our MC simulation).

Associated revisions

Revision f3f10691 (diff)
Added by Christopher Green over 5 years ago

Test for fix to issue #7530.

Revision 15a4a1b4 (diff)
Added by Christopher Green over 5 years ago

Actually fix issue #7530.

History

#1 Updated by Christopher Green over 5 years ago

  • Category set to I/O
  • Status changed from New to Assigned
  • Assignee set to Christopher Green
  • Target version set to 1.12.05
  • Estimated time set to 3.00 h
  • SSI Package art added
  • SSI Package deleted ()

Depending on what you mean by, "skip over this event when drawing from ..." that solution is not as straightforward as one might like: the mixing system handles each product to be handled separately, as a, "MixOp." One or more products may have already been mixed from secondary and their output placed in the primary before one encounters a missing product in a subsequent mix operation for the same primary. We could only nix the whole secondary event by compiling the list of required products and checking them for each secondary at the time the EventIDs are drawn, which would be more effort to implement, and time-consuming on a per-event basis.

We propose that the default behavior of the system be simply to leave a nullptr in the corresponding slot in the sequence of product pointers passed to the user's mixing function in the case of a product missing from a particular secondary product. The user's mixing function though, must check that the product pointer is non-null before attempting to access it. We will also provide an option to the MixFilter template, "compactMissingProducts" which, if set true, will cause the missing product to be omitted entirely, causing the sequence to be compacted. One should be aware though, that there will no longer be an exact correspondence between the EventID sequence provided to the detail object's processEventIDs function and the sequence of products for any given mix operation.

#2 Updated by Christopher Backhouse over 5 years ago

  • Assignee deleted (Christopher Green)
  • Target version deleted (1.12.05)

I misdiagnosed the problem. It is indeed a missing product like you're talking about, and not a filtered (ie rejected by EDFilter) Event like I was assuming. So you can ignore the stuff about skipping over the event.

Passing a null pointer sounds ideal. Hopefully this is an easy thing to add?

Another option would be to not call the MixOp at all, but then downstream modules would have to deal with missing products. Perhaps one could provide a default value to be filled in this case. This sounds like it's getting complicated, so maybe it's not a good option.

I didn't understand how compactMissingProducts will work. If I have two products in the input, A and B, and some of them are missing:

Event 1: A1 B1
Event 2: A2
Event 3: A3 B3
Event 4: B4
Event 5: A5 B5

Then the mixer would see (A1, B1), (A2, B3), (A3, B4), (A5, B5) ???

For my purposes any kind of getting out of sync is bad. I'm not sure where you'd want the behaviour I sketched above, I've probably misunderstood it.
If there were only the A products then giving A1, A2, A3, A5 with no gap would make sense, though again not in my case.

#3 Updated by Rob Kutschke over 5 years ago

Is the underlying issue that an upstream module decided to put no data product in the event instead of putting an empty data product in the event?

#4 Updated by Christopher Backhouse over 5 years ago

Yeah. I'm checking on exactly why, but it turns out to have happened way back in the simulation.

It looks like there's a fairly easy method to work around this (handing the mixer a null rather than throwing the exception) so hopefully we can patch the files up.

#5 Updated by Christopher Green over 5 years ago

  • Target version set to 1.12.05

#6 Updated by Christopher Green over 5 years ago

  • Status changed from Assigned to Resolved
  • Assignee set to Christopher Green
  • % Done changed from 0 to 100
Chris, if I have 5 secondaries as you describe in your example, then mixing those five secondaries into a single primary:
  • The A mixer would see a sequence [ A1, A2, A3, nullptr, A5 ] (no compaction) or [ A1, A2, A3, A5 ] (compaction).
  • The B mixer would see a sequence [ B1, nullptr, B3, B4, B5 ] (no compaction) or [ B1, B3, B4, B5 ] (compaction).
    Using the normal mixing tools (e.g. from CollectionUtilities.h) and the PtrRemapper should work in either case, but with compaction the correspondence with the sequence of EventID provided to the mixer would be off, assuming you care.

This fix has been implemented with commits 272c5cd and 0016eaa.

#7 Updated by Rob Kutschke over 5 years ago

In the previous update Chris Greeen described a scenario that is unacceptable to Mu2e. We have downstream code that relies on event identity of mix-ins being preserved across mixing.

Mu2e's preference is that a missing product be a hard error - it means that we have an upstream error that we need to fix and we would prefer to know about it at the earliest possible opportunity. We appreciate the Chris Backhouse is developing a fix for an issue that originated in upstream code and he needs the tools to do that job but we prefer that default behaviour not change; we do not want art compacting mix-in event streams.

#8 Updated by Christopher Green over 5 years ago

  • Status changed from Resolved to Feedback

The compaction of missing products is not the default behavior; however the default behavior in the just-cut 1.12.05 is to not throw on a missing product. With this release, the user's mixer would need to detect a missing product and deal with it accordingly.

We could provide another parameter (e.g. errorOnMissingProdut), but it is true that the previous behavior was problematic: the ProductNotFound would cause the primary event to be skipped (unless --rethrow-all was used) and in the case of sequential event selection, the same secondary would cause all subsequent primaries to be skipped for the same reason.

Rob, please advise on how we should proceed. I am in FCC today, but I think a conversation with Marc would be in order.

#9 Updated by Rob Kutschke over 5 years ago

Mu2e is OK with this behaviour.

We will add a function template that can be called as the first line of each mixop member function to throw an exception in the event of a null_ptr.

#10 Updated by Christopher Green over 5 years ago

  • Status changed from Feedback to Resolved

#11 Updated by Christopher Green over 5 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF