Project

General

Profile

Feature #7852

A module failing to put() a product it produces() should be an error

Added by Christopher Backhouse over 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
I/O
Target version:
Start date:
02/12/2015
Due date:
08/01/2015
% Done:

100%

Estimated time:
8.00 h
Spent time:
Scope:
Internal
Experiment:
NOvA
SSI Package:
art
Duration: 171

Description

As discussed in today's stakeholders' meeting:

The idea is that if an EDProducer declares in its constructor that produces<Foo>() and then the produce() file ever fails to evt.put() such a product then an exception should be thrown by the framework, with the default behaviour that this exception ends the job.

Such a pattern is almost always a mistake by the producing code, but isn't discovered until some downstream module falls victim to it. Possibly even in a seperate job, if the products were written to a file in the mean time.

There probably needs to also be a fcl parameter that can be set to return to the old behaviour, in case of modules that do this that experiments want to continue to tolerate until they're fixed.

History

#1 Updated by Christopher Green over 5 years ago

  • Category set to I/O
  • Status changed from New to Accepted
  • Estimated time set to 8.00 h
  • SSI Package art added
  • SSI Package deleted ()

This change in behavior was discussed and viewed favorably at a stakeholder meeting.

#2 Updated by Kyle Knoepfel over 5 years ago

  • Target version set to 521

#3 Updated by Marc Paterno over 5 years ago

  • Target version changed from 521 to 1.18.00

#4 Updated by Christopher Green over 5 years ago

  • Due date set to 08/01/2015

#5 Updated by Kyle Knoepfel about 5 years ago

  • Subject changed from A module failing to put() a product it decalares it produces() should be an error to A module failing to put() a product it declares it produces() should be an error
  • Assignee set to Kyle Knoepfel
  • Experiment NOvA added
  • Experiment deleted (-)

#6 Updated by Kyle Knoepfel about 5 years ago

For this feature to be implemented, the BranchID s of "put" products must be compared with the BranchID s of products that were declared to the system through produces. This requires changing the container type of DataViewImpl::putProducts_ from std::vector< std::pair<EDProduct*, BranchDescription const* > > to something like std::unordered_map<BranchID, PMValue>, where PMValue contains the key and value types of the std::pair element in the vector.

Along the way, it was determined that handling a bare EDProduct pointer was not a good idea, so I changed the ownership semantics so that PMValue claims ownership of the product before the product is placed into the principal.

Now that the necessary maintenance has been taken care of, implementing the issue proper can commence. The estimated time, however, for this issue is too low.

#7 Updated by Kyle Knoepfel about 5 years ago

  • % Done changed from 0 to 30

#8 Updated by Kyle Knoepfel about 5 years ago

  • % Done changed from 30 to 70

#9 Updated by Rob Kutschke about 5 years ago

What is the expected behaviour of the new system for Run and SubRun data products. I may call run.put() on a data product in the beginRun or endRun. Suppose that the "error" is in the beginRun code, which did not call run.put() when it should have; art can;t know that it failed to call run.put() until after the end of endRun. Is this OK?

To my mind this is OK but my opinion may change as use cases develop.

#10 Updated by Kyle Knoepfel about 5 years ago

Good question, Rob. Currently, the feature will be implemented only for Event-level products, thus avoiding the difficulty you mention with SubRuns and Runs. If there is a desire to implement put-product checking at the SubRun and Run levels, then the stakeholders and artists should discuss the various conceptual issues involved.

#11 Updated by Kyle Knoepfel about 5 years ago

  • % Done changed from 70 to 90

The implementation is primarily complete. I am awaiting feedback from the users re. which configurability options they prefer. This is the email I circulated to the stakeholders earlier today:

Feature #7852 stipulates that the default behavior for failure to put an expected product onto the event should result in an error. In addition, a request has been made that such behavior can be overridden via a user’s configuration so that the old behavior (i.e. no exception being thrown) can be retained. There are several options for achieving this configurability:

1) Provide a global parameter that, when set, overrides/defines the behavior for all modules/sources.
2) Provide a parameter that can be specified per module, allowing finer granularity of behavior handling.
3) Provide a global parameter and allow a per-module parameter. This would allow for maximum flexibility, but one would need to decide how to resolve conflicts between the global setting and local settings.

We would appreciate your feedback as to which option you would find most valuable.

#12 Updated by Kyle Knoepfel about 5 years ago

  • Subject changed from A module failing to put() a product it declares it produces() should be an error to A module failing to put() a product it produces() should be an error

#13 Updated by Kyle Knoepfel about 5 years ago

  • Status changed from Accepted to Resolved
  • % Done changed from 90 to 100

New configuration parameter for EDProducers and EDFilters

For any EDProducer or EDFilter module, the errorOnFailureToPut parameter may be specified. By default, this parameter is set to true, per stakeholder input, and thus need not be specified by the user. If the user wishes to disable the put-product checking, then the following should be invoked:

moduleLabel : {
   module_type: MyModule
   errorOnFailureToPut: false
}

A global flag "services.scheduler.errorOnFailureToPut" is also provided. The behavior is as follows:

  1. If the global flag is true, the individual module instances can override the behavior by setting the local flag "moduleLabel.errorOnFailureToPut" to false, as shown above.
  2. If the global flag is false, then any attempt to override the global flag at the per-module level will be ignored.

The reason for the asymmetry is motivated by the following use case. Consider the configuration:

services.scheduler.errorOnFailureToPut: true

physics.producers: {

    p1: @local::experiment.p1
    p2: { ... }

    t1: [p1,p2]    

}

In this example, "experiment.p1" is owned by the experiment. Should the experiment decide that, for that particular module, it is okay to allow a failure to 'put' without throwing an exception, then the user's job continues without incident.

However, if the flags were reversed--i.e. the global flag were set to false, and the "experiment.p1" flag were set to true, the user might be surprised to find that his/her job has failed. We believe this approach is the most user-friendly, and it is therefore incumbent on particular owners of modules or for software leads of experiments to determine reasonable policies for their own experiments.

Implemented with art:84e5e8e.

#14 Updated by Christopher Green almost 5 years ago

  • Target version changed from 1.18.00 to 1.17.00

#15 Updated by Kyle Knoepfel almost 5 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF