Project

General

Profile

ExceptionInformation

Do not read this yet. It is still being updated.

EDM Exception Analysis

Condition Reporting: Using the "out of band" channel or nonstandard path to report information to interested parties outside the context of the current active code. Examples of why reporting is necessary include announcing failures, anomalous conditions, or producing execution traces.

Goal

Define first the various types of condition reporting that code must do and how the users of the code will obtain this information and control the actions based on receipt of it.

  1. define the various types of reporting
  2. list some requirements for and define the exception interface, describe correct use of it, describe the output that will be produced, and show how it is be configured and used by the framework.
  3. the design and implementation of the exception processing subsystem.
  4. list requirements for the other condition reporting APIs and how they should be used.

Types of condition reporting

Here are the different ways that developer-written code uses to communicate information to external entities (within the same program or outside of it). This list reflects how things ought to be, not how they necessarily are in all circumstances. The guidelines for determining what category a particular encountered problem lies will be explained in later sections. The policy for handling information from each of these category will also be explained in a later section. The information in this document should cover the definition, nature, properties, and usage policy for each of these categories.

No change in current program flow

Debugging and progress announcements are included in this category. This is text announcing where the program is and includes state information. With appropriate compiler options, this code should be able to be able to be removed from the program. A configurable level should allow more or less information from appearing. It is likely true that the text coming out needs to be decorated in a similar way to the other reporting methods, but the interface and controls are likely to be different.

Nonlocal change of standard program flow possible

Pathologies or "important notices" are included in this category. This is the MessageFacility. When code discovers an interesting condition and decides to continue without issuing an exception, and perhaps producing a poorer quality or a truncated result, it will want to announce that this has happened. The error or message logger is the facility that captures this information and delivers it to a destination.

Local change of standard program flow

Exceptions issued locally are included in this category. The example is an exception throw. This is triggered by doing a throw in C++. An exception occurs when a piece of code cannot continue and cannot produce the things that it promised it would. The exception contains the reason why the algorithm could not continue.

Handling Exceptions

The places where something bad can happen boil down to:

  • user code - algorithms that developers write
  • infrastructure - exceptions that we know can happen
  • system level - things from standard libraries
  • external libraries
  • tools we know about - examples are root and boost (even though it is unlikely that ROOT will produce an exception)
  • lower-level utilities that developers use, external to us

These categories of exceptions correspond to what the framework will catch. Here is pseudo-code that depicts the exception catch.


 try {
   invokeSomeUserCode();
 }
 catch(art::Exception& e) { ... }
 catch(cet::Exception& e) { ... }
 catch(std::exception& e) { ... }
 catch(...) { ... }

In addition, the infrastructure software will catch "char*" and std::string and kill the program if one of these is caught.
The action or control-flow change associated with each of these exceptions is configured at runtime from the following set:

  • skip this event altogether
  • continue as though nothing happened
  • stop processing in the current path
  • rethrow the exception (stop event processing)

In the future we can consider additional features like running user-written functions that decide what to do.
The cet::Exception class is derived from std::exception. The art::Exception class is derived from cet::Exception.
Developers like to write out information in a form accepted by std::cout. It is useful to capture the information that is normally sent to std::cout in the exception and have that information also sent to the MessageFacility. The cet::Exception provides a std::stringstream to allow data to be added to it in a nice form (jbk - is this still true?)

The cet::Exception allows for an identification string to be used for runtime configuration of the action that is to take place as a result of catching the exception.
The art::Exception uses an integer identifier for a similar purpose to the cet::Exception identifier.
The constructor of cet::Exception takes a string category and optional string message.

The rules for using the exceptions:

  1. cet::Exception: Can be thrown as is without derivation. Derived types must give a string category that matches the derived class name. This is the type of exception that the user is allowed to propagate through a module boundary. Infrastructure code expects all other exceptions to be caught and dealt with within the developer code. If these exceptions are rethrown, new locate/state information will be appended. Unique actions can be assigned to the category.
  2. art::Exception: Similar to cet::Exception; the actions are distinct.
  3. std::exception - rethrown after printing additional state information
  4. ...: rethrow after printing additional state information

The developers propagate high-level announcements of what has happened. They should not throw any sort of resolution - this action is up to the user configuring the job, not up to the code developer.
Any cet::Exception that passes through infrastructure code will have context information added that include the module description and perhaps the event and any run/subrun information.
In summary, each cet::Exception should contain an category, specifying the recognized problem, contextual information containing things like module type and label, and a user supplied string. If the exception rippled through layers of infrastructure, then the user-supplied string will be a concatenation of the "what" information of the previously caught exceptions. In other words, the original exception object goes away and only the string data remains. The category of the root cause will be maintained in the thrown exception. Certainly the exception caught will contain the category that was last thrown.
Multi-line output (messages containing many newlines) poses a problem for automated tool inspecting output.
All exceptions will be caught by reference (nonconst).

Early design note retained for now: Should exception information be sent to the logger automatically upon construction or should it be sent by the catcher? The catcher must throw it because it is only then complete.

Examples of what to throw

As mentioned earlier, the system expects to see cet::Exceptions that are a diagnosis or recognition of a problem, not a prescription or remedy for an encountered problem.
Here is a list of examples:

  1. data corrupt
  2. too many hits in a detector
  3. failed to converge on a solution
  4. infeasible solution calculated
  5. invalid detector component

There is a problem of conflicting categories across different modules or even algorithms. The module context or description information can be used to disambiguate them. Using a string category can also minimize the chances of this happening.

Types of problems encountered

  1. Cannot continue
  2. Pathologies

Discussion with developers

Here is an old list of some opinions from algorithm developers about presentation of output and about when exceptions or logging will be used.

Labeling data where fishy conditions are observed is good, but so prescribed action.
Always run to some completion - build something, even if it is somewhat defective (example is algorithm does not converge). Use an auxiliary channel to note the defect or use provenance labeling for this purpose.
No calotower -> no jet collection. This is an exceptional condition. In this case, the framework event "get" will throw and exit this current algorithm.
Will need an example of how and when to log data from an exception and how and when to propagate exception information out of a module.

Interface summary

 throw cet::Exception(category) << "Something bad occurred: " << some_value << endl;

The interface permits derivation from this basic type and allows catching as the derived type. The category corresponds to a classifier for this exception. If the exception is a concrete derived type, then the classifier can just be the derived class name. Otherwise, the classifier is any string that can be used to determine an action based on configuration parameters at runtime.
See the documentation in the header file for examples of how to use the class cet::Exception.

art::Exception Use

This page discusses throwing exceptions and handling exceptions when using the art software framework.

Where to find things

  1. cetlib/exception.h: the main header for the cet::exception class.
  2. cetlib/coded_exception.h: A class template derived from cet::exception, permitting categories to be enumerations.
  3. art/Utilities/Exception.h: The main exception class art::Exception is defined here.

The exception inheritance hierarchy is small:

cetlib::coded_exception --> cet::exception --> std::exception

Most of the time the best and easiest design is to use the cet::exception class directly, although defining new exception types derived from it is allowed and occasionally more convenient.

cet::exception

This is the main exception class within art. The Framework can recognize information contained within this exception, add context or other information, print it out, and take an appropriate action. Any exception allowed to propagate from a processing module should be a cet::exception or something derived from it. The action that the framework takes when one of these exceptions is caught is based on a category string given in the constructor.
Example use:

   if(something wrong with data in the event)
      throw cet::exception("CorruptData")
         << "It seems as though something is dreadfully wrong.\n" 
         << "Unknown ID " << x << " found\n" 
   else if(too much time)
      throw cet::exception("Timeout")
         << "Taking too long to process " 
         << y << " number of hits\n";

The first argument in the constructor is the category. Category names should be short. The category name can be thought of as the general name of the problem. If you want to be able to configure a specific action for an exception or group of exceptions, then you should assign them a unique category (configuring actions is described below). Categories can also be useful for text-based searches of log files when using a tool like 'grep'.
This exception type (or anything derived from it) allows any object with a stream insertion operator (operator<<) defined to be added to the exception object directly from the constructor call as shown in the code segment.

If you want to establish an exception hierarchy and want to allow exceptions to leave your module code, the base class should be cet::exception. It is uncommon to create an exception class hierarchy. The documentation in the header file for this exception explains further how to use this class as a base class. You must propagate a category name to the base class for each unique derived class. One easy way to do this is to use the derived class name as the category name.

art::exception

Exceptions that are generated from calls to framework functions (e.g. access to products in the event) are of this type. The edm::Exception is actually a typedef for the class template CodedException. This template allows an enum to be used instead of strings for category names.

Exception handling rules

  1. Developers in general should not catch exceptions. As described below, the framework itself is responsible for catching exceptions in a configurable way.
  2. Developers should throw cet::exceptions whenever they think they will not be able to perform the task they were called to do (eg. produce an object to be put into the event)

What the framework catches

The framework catches a fixed set of exceptions at every important place in its call stack. The exceptions caught are:

  1. art::exception
  2. cet::exception
  3. std::exception

The important places these exceptions are caught include

  1. code surrounding a call to a processing module
  2. the schedule executor
  3. the event loop
  4. the art application

The cet::exception allows for nesting or concatenation of exception information. At each place mentioned above, the framework will throw a new cet::exception (if the corresponding action is to do so) with the caught exception contents plus new context information. New context information may include:

  1. event ID (includes relevant time stamps)
  2. active module type
  3. active module label
  4. current path
  5. product being operated on
  6. report on the action taken

depending on where the exception was caught. The final exception printout will contain a trace of all exceptions caught.

Altering framework flow

Currently only a filter module can change the flow of control in an EventProcessor. This is an event pass/no pass return code and is not considered an error condition. The only way to change the framework flow outside this specific case without terminating the job (actually exiting the EventProcessor) is to throw something that is a cet::exception. Private or vendor specific exceptions should not be allowed to escape out of a module because the framework will not know what to do with them and valuable context information may not be reported in a useful way. This is very rare, but if you do invoke code that will throw an unrecognized exception it should be caught and rethrown as the known exception, cet::exception.

The module developers propagate high-level announcements of what has happened. They should not throw any sort of resolution - this action is up to the user configuring the job, not up to the code developer.
All exceptions will be caught by reference (non-constant).

Currently understood actions

There is currently a fixed set of actions that can be assigned to any of the category names found delivered in a cet::exception. The framework currently understands the following actions.

  1. Rethrow: let the caller deal with the exception (This terminates the job with a non-zero return code).
  2. SkipEvent: stop further processing of this event and continue with the next event
  3. FailPath: stop processing in the path and mark it as failed, and continue witht he next path
  4. FailModule: stop the module and mark it as failed, and proceed with the next module
  5. IgnoreCompletely: pretend the exception never happened (if possible)

These actions apply for exceptions thrown while a module (e.g. an EDProducer, EDFilter, EDAnalyzer, or OutputModule) or input source is processing an event. Exceptions thrown at other times, such as when processing a begin or end Run, always result in a Rethrow action.
The above actions occur as stated if thrown during module execution on a path (as opposed to an endpath). If thrown during the execution of an input source, there is no path involved, so FailPath or FailModule is treated as SkipEvent. If thrown while executing a module on an endpath, FailPath or SkipEvent is treated as FailModule, so that other modules on the endpath are unaffected. (jbk - these statements need to be verified)

Parameter set options

Each of the exception categories can be assigned an associated action at runtime. The syntax for making the assignment is covered elsewhere. These options are also discussed at ART_framework_parameters

Framework categories

The art framework produces exceptions with the following category names. Next to each is the default action taken by the framework: The default action for any exception not found on this list is 'Rethrow'.

    enum ErrorCodes {
      OtherArt = 1,
      StdException,
      Unknown,
      BadAlloc,
      BadExceptionType,
      ProductNotFound,
      DictionaryNotFound,
      InsertFailure,
      Configuration,
      LogicError,
      UnimplementedFeature,
      InvalidReference,
      TypeConversion,
      NullPointerError,
      EventTimeout,
      DataCorruption,
      ScheduleExecutionFailure,
      EventProcessorFailure,
      EndJobFailure,
      FileOpenError,
      FileReadError,
      FatalRootError,
      MismatchedInputFiles,
      CatalogServiceError,
      ProductDoesNotSupportViews,
      ProductDoesNotSupportPtr,
      SQLExecutionError,
      InvalidNumber,
      NotFound
    };

Category Name Default Action
ProductNotFound Skip Event
DictionaryNotFound Rethrow (stops the job)
NoProductSpecified Rethrow (stops the job)
InsertFailure Rethrow (stops the job)
Configuration Rethrow (stops the job)
LogicError Rethrow (stops the job)
UnimplementedFeature Rethrow (stops the job)
InvalidReference Skip Event
NullPointerError Skip Event
NoProductSpecified Rethrow (stops the job)
EventTimeout Skip Event
EventCorruption Skip Event
FileInPathError Rethrow (stops the job)
FileOpenError Rethrow (stops the job)
FileReadError Rethrow (stops the job)
FatalRootError Rethrow (stops the job)
MismatchedInputFIles Rethrow (stops the job)
ProductDoesNotSupportViews Rethrow (stops the job)
ProductDoesNotSupportPtr Rethrow (stops the job)
NotFound Skip Event