Project

General

Profile

ArtTasksDiscussion01Sep2011

High-level Project areas

  1. art development (cleanup, features, applications)
  2. beyond art - workflow and data handling
  3. strategic - multi-threaded framework, new I/O system
  4. other factor - LSST DAQ

Goals

We need to think of what our short term (Oct) goals are, what our mid-term (Jan 2012) goals are, and what the longer term goals are.

Art task list

Documentation

  • set up the documentation system (I) (4 days)
  • expert class reference docs (F) (Assns, FindAll, PtrRemapper, ContainerUtilities, MixFilter) About 10 things here? 1 man-weeks?
    • common class reference docs (F) (Event, Run/Subrun, Handle, Ptr, PtrVector, View, ParameterSet, "Module", "service system", MapVector, "usable services", FindOne, FindMany) About 30 things here? 2 man-weeks?
  • message facility docs (unknown time guess)
  • tutorial docs (F) (adding pages to the tutorial document slide show) (1 week?)
  • release docs (per-package release notes, and notes for combinations of all packages, back to 0.7.0? more git scripts needed?) (F) (2 days)
  • internal documentation (for each class, minimally, a description of what the class is for) (I)
  • produce our own "best practice" document, for the guidelines we're going to follow (difficulty here is getting agreement) (I) (2 weeks)
  • #1062: Quick reference guide for documentation site
  • #1464: Why does schema evolution behaviour depend on how many events I run over?
  • #1075: Post on the tutorial web site some documentation for FileDumper
  • #943: Document in-use parameters for all framework code.
  • #917: Make FHICL language document pretty.
  • #916: Make a note in the cet-is Wiki on detail namespace / header / source file subdirectory.
  • #946: Review and provide precise definitions of each signal watchpoint.

Infrastructure

  • non-event object manipulations - histogram and TObjects (F)
    • obtain agreement on requirements from current stakeholders
    • document features and limitations of the current system
    • design the "new and improved" system
  • event filtering for Analyzers (F)
    • identify a package that will do the logical expression parsing we need
    • extract the current "event selection" from Output modules
  • #2246: Do-nothing art job reports over 500 megabytes memory usage
  • #2233: SimpleMemoryChecker needs to be able to cope with different procfs formats on different SLF versions.
  • #1338: Output histograms don't obey custom ROOT style
  • #1091: Review all signal emission in the light of sentry considerations and exception safety.
  • #942: Design mechanism for library versioning and consistency checks
  • #938: code in art should not be directly throwing cet::exception; use art::Exception
  • #920: Design and implement a mechanism for specifying obsolete parameters with info/warnings when seen.
  • #919: Investigate report of problem with floating point exception control service.
  • #918: Add architecture-specific capture of call stack at time of construction of cet::exception.

Integration and review

  • #908: Review structure, code and other aspects of cpp0x and cetlib
  • #974: Review non-running legacy tests to see if they are worth porting or should be removed.
  • #907: Review and rationalize use of exceptions across art.
  • #903: Create forwarding headers in each library with standard naming (fwd.h).
  • Audit art for:
    • Copyability/movability of classes
    • Applicability of smart pointers (e.g., possible replacement with boost::optional<>)
    • "Make changes as necessary to conform with rational design"

I/O

  • I/O revamp (I)
    • correct the Root modules to obey the rules of the state machine; get rid of DRISI
    • introduce abstracted i/o layer with which modules interact and which allows Root to "fast clone" more cleanly
  • #1759: Periodic refreshing of histogram files
  • #1470: Limit output file sizes?
  • #1316: Snapshot histogram file
  • #1191: file list comments
  • #1043: art::getFileFormatVersion and art::getFileFormatEra should be renamed
  • #896: Design and specify maxEvents for output streams

Metadata

  • sqlite integration (I,S) (1 day)
  • Reorganize metadata (I)
    • remove storage of ParameterSets from its current location, and move them to a ParameterSets table in sqlite (2 days)
    • determine what are the remaining distinct steps of the reorganization (1 week)
  • #1918: Thinking ahead to event mixing and art::Ptr to run product
  • #1026: Duplicated code for branch name element verification
  • #899: Should module_label continue to be injected into each module?

Navigation

  • unification of run/subrun/event (I)
    • this needs to be done after the metadata have been sorted out
    • this includes introduction of Run and SubRun fragments, and modification of several module base class interfaces to deal with the fragments and also to deal with a new concept of begin/end Run and begin/end SubRun
    • Also to be considered: should FileBlock be a similar entity, with products?
  • revamp of processing intervals (adding subrun and run header, trailers, etc.) (I)
    • write down design proposal
    • review the new functionality with stakeholders, determine what backwards compatibility is necessary
  • #1246: Reco on demand not finding products in the file
  • #1632: -e command line option ignored
  • #2135: make FindMany and FindOne return art::Ptr for the found products
  • #1463: Configuration blocks should enforce contents
  • #1214: Ptr to Run or SubRun product
  • #1197: Unit / regression tests for RootInput random access.
  • #1127: Re-write art::Group::getProductType() to not use product()
  • #1025: end_paths and trigger_paths are now unnecessary and should be retired.
  • #1000: Run/event range
  • #911: Investigate storage of exceptions in Handle and friends and whether it can be improved.
  • #910: Investigate whether Event::fillView_ should use reflex_cast instead of static_cast.
  • #909: Event::getView should return false (per design) rather than throw in specified cases.
  • #2352: ProductList to allow same module label, instance name and product type for different branch types.

User code

  • art event display program (F)
    • gather requirements for what an event display does (2 days)
    • define an interaction model we can support (2 weeks)
    • See also Event Display Notes
  • #1340: Allow creation of subdirectories with TFileService

Validation

  • add an external testing/example product (I)
    • create tests that can be used to verify an installation of art is complete, without us needing to build an experiment's code
    • provide a place to store data files usable for easier backwards-compatibility tests
    • provide explanatory examples for how to use features of art.
  • #897: Institute validation tests for an installed art package.

Unclassified

  • #2275: On-demand loading of module libraries
  • #1144: Defining categories for the error logger
  • #1758: Histograms in shared memory
  • #1460: Dump configuration by process name from data file.
  • #1057: Specifying maximum output file size does not work
  • #1014: some parameter sets in the ServicesManager do not have service_type defined
  • #956: Testing of signal emission is needed
  • #939: Print cumulative timing information for each module at end of job when Timing service is used

Tasks

This is a brainstorm list of tasks that might be undertaken. We need to assign a relative priority and a complexity factor to each item.
Each item is classified as: strategic (S), internal (I), or feature (F).

For documentation: doxygen-like stuff within the code.

Our first task will be to break down the tasks into smaller units.

Really important tasks

"Parallel ART" - introduction of multiple Schedules each of which is run in its own thread (S)

Preparatory work

  • Deploy GCC v4.6.1 on cluck (SLF6) (1 day)
  • Move to C++11 compilation of ART and all dependent code (S,F)
  • Deploy Intel ArBB and/or TBB on cluck via UPS (1-2 days)
  • Deploy Intel CnC on cluck via UPS (1 day)
  • Deploy an MPI 2.x implemention on cluck usable via setup (0.5 day)
  • Clearly define the constraints put on this development experiment (2 days)
  • Evaluate the work involved in limited harmonization of Run, SubRun and Event (0.5 day).
  • Determine the requirements for parallel handling of run/subrun data

4-day retreat.

  • Connect input system to MPI event builder (1 day).
  • Excise the output system (0.25 day).
  • Excise code not needed for basic test of parallel functionality.
  • Audit remaining code for basic safety w.r. parallelism (services, statics, etc).

Longer-term strategic goals

  • Modify the use of the input system to feed multiple Schedules (2 days)

Data-driven trigger - NOvA (S,F)

  • formalize requirements for the trigger program
  • implement these requirements in the current framework (not multithreaded!)
  • migrate to use of the multi-schedule art
  • try to migrate Andrew Norman's trigger algorithm to a "new technology"

Ancillary tasks

These are tasks that distract from our direct progress with art, but are related to the development of art.
Some involve other groups.

  • clean-up of messagefacility package
    • introduce unit tests
    • clean up dependency problems
    • consider request to add the ability to direct specific severity levels to specific destinations
    • clarify issue of who will maintain this package
    • clarify usage by NOvA online and NOvA offline
  • UPS enhancement
    • manage prerequisites according to our specification (no unsetup of product before setup)
    • make it work on Mac OS X
  • Build system enhancement
    • Write a utility to generate a UPS table file to allow use of a non-UPS product (0.125 day).
    • add more automation, so more people can create builds and releases
    • make it work on Mac OS X
    • #890: Review linking of shared libraries.
    • #1027: simple_plugin to extend underscore checking to include path within project.
    • Review of the Message Analyzer code

New SAM data handling tasks

These still need to be worked out.

  • SAM protocol handling within the I/O subsystem (ROOT file sequence? As a service?)
  • sqlite handling in ROOT
  • sample use of sqlite use in ROOT (make tables, insert and select data)
  • command-line tool for using the sqlite data within the ROOT file (and optionally extracting it)
  • exception callbacks available in services
  • record all protocol interactions and arguments from SAM into sqlite metadata
  • find and understand the HTTP transactions for SAM (available on their wiki)
  • Data handling interface with SAM (F)
    • discover interaction protocol for communication with SAM
    • make sure to understand anomalous conditions and failure modes
    • determine division of responsibilities between SAM and art and workflow
    • determine how we will be able to test against SAM
    • design how we make art interact with SAM's protocol
    • discover interaction protocol with workflow system