Project

General

Profile

TasksDiscussion26June2012

This list was initially populated with art tasks from 1 Sept 2011, and then we added more.

See also Framework FWP

High-level Project areas

  1. Tasks for art
  2. Tasks for Framework FWP
  3. Tasks for artdaq
  4. Any other products we should be working on?
  5. Lower-level tools: build system, administration of machines, packaging
  6. Lower-level packages: fhicl, message logger

Goals

We need to think of what our short term (end of August) goals are, what our mid-term (end of Nov 2012) goals are, and what the longer term goals (May 2013) are.

Art task list

Infrastructure

  • question from Perevalov (use of art service in standalone scripts not depending on art). #2751
  • introduce the ability to create a service of a concrete type known to the service system as the implementation of an interface; allow configuration of only one concrete type that implements that interface
    • #1014: some parameter sets in the ServicesManager do not have service_type defined
  • Make service libraries load on demand, rather than loading at the start of the program.
  • non-event object manipulations - histogram and TObjects (F). This has to do with the TFileService, and enhancing what it does.
    • Features #1758 #1759 #1316
    • We need to have a clean line of division between what we do in art, and allowing those who want to do so to make use of Root, as much as they like.
    • Do users want to store THistogram and such in subruns and runs? Or do they want a Root-flavored store of such things, in the same file or another, in which one can store THistograms and such things?
    • obtain agreement on requirements from current stakeholders
    • document features and limitations of the current system
    • design the "new and improved" system
  • event filtering for Analyzers (F) and enhancement of output modules
    • identify a package that will do the logical expression parsing we need
    • extract the current "event selection" from Output modules
    • add the ability to execute user-defined code in association with arbitrary output modules.
  • #2233: SimpleMemoryChecker needs to be able to cope with different procfs formats on different SLF versions. The SimpleMemoryChecker needs a thorough review, to consider the means of reporting and the values reported.
  • #1338: Output histograms don't obey custom ROOT style. Chris thinks he knows how to solve this easily. If it is not easy, this should remain at low priority.
  • #1091: Review all signal emission in the light of sentry considerations and exception safety.
  • #942: Design mechanism for library versioning and consistency checks
  • #938: code in art should not be directly throwing cet::exception; use art::Exception. This carries through to FHiCL as well. We should make sure that art emits only art exceptions.
  • #920: Design and implement a mechanism for specifying obsolete parameters with info/warnings when seen. We do not need to diagnose user code without their help.
  • #2672 #2664 #2445: we need more contextual information in some exceptions; #918: Add architecture-specific capture of call stack at time of construction of cet::exception.
  • #919: Investigate report of problem with floating point exception control service. This problem was solved in late 2010 in the CMS code; look at what they did.
  • Under some circumstances, the moduling timing aggregation is garbage; CMS has solved this problem. Look at what they did. Overhaul how the TimingService writes its output, so that a multithreaded jobs doesn't record garbage, and so that single-threaded jobs don't require grepping of output to see the timing results. Of course, the output should be readable in R easily.
    • #939: Print cumulative timing information for each module at end of job when Timing service is used
  • Finish (for Mu2e) the removal of the problematic throwing of an exception by Assns when one of the Ptrs it needs can't be dereferenced.
  • There is no user-accessible mechanism for connecting to a user-defined SQLite database.
  • Design and document the design of the overhauled signals-and-slots mechanism, probably massively modifying or even removing the ActivityRegistry.
  • Remove all the asynchronous behavior from EventProcessor. Do the analysis to see what asynchronous behavior is actually required for the EventProcessor, so that it can be used in online system. This does NOT include having the EventProcessor be the application object in an event display program. This DOES include the EventProcessor being part of a larger DAQ application, and the controls and monitoring necessary for that DAQ application. Need to report states, turn off and on diagnostics, and pause and restart.
    • #1759: Periodic refreshing of histogram files needs to be coordinated.
    • #1316: Snapshot histogram file
    • #1758: Histograms in shared memory
  • The TriggerResults class needs to be overhauled and simplified. This also meens looking closely at the module that makes this object. Consider making this data be a piece of the Event header, rather than a data product.
  • The EventProcessor's state machine needs to be reworked. Most of the requirements is was implemented to fulfill are no longer required.
    • #956: Testing of signal emission is needed
  • Re-implement EmptyEvent by use of ReaderSource.
  • Consider re-implementing RootInput by use of ReaderSource. Does this help move us towards the great revamping of Root I/O that we want to do?

Integration and review

  • Move build of art to C++ 2011.
    • Change all use of boost::shared_ptr to std::shared_ptr; boost::regex to std::regex, etc.
    • We need the full toolchain of GCC 4.7.x
    • We need to verify Root operation under C++ 2011: make release of art and subpackages.
    • We need to verify operation of Geant4.
    • Build against Mu2e and NOvA and ArgoNEUT.
  • cling
    • Audit Reflex use.
    • Change everything we can to use of TClass interface. Report anything that can't be done with TClass to the Root team.
    • If the July preview release of Root doesn't have the genreflex wrapper, we need to pressure Root team to give us a new preview as soon as possible.
    • Explore integration issues with first version of Root6 (should be in November).
    • #910: Investigate whether Event::fillView_ should use reflex_cast instead of static_cast.
  • #908: Review structure, code and other aspects of cpp0x. Seriously consider removal of cpp0x.
  • Perform code review of cetlib.
  • #974: Review non-running legacy tests to see if they are worth porting or should be removed. Look at each, and if it isn't important enough to fix now, remove it.
  • Replace use of boost::program_options in favor of another solution.
  • Audit art for:
    • Copyability/movability of classes
    • Applicability of smart pointers (e.g., possible replacement with something like boost::optional<>)
    • "Make changes as necessary to conform with rational design"

I/O

  • I/O revamp (I)
    • introduce abstracted i/o layer with which modules interact and which allows Root to "fast clone" more cleanly.
    • Look at CMS and ATLAS use of TTreeCache. Construct use cases that demonstrate the relevance to IF experiments. We need evidence that this is relevant before we do any work in this direction. This must be requirements-driven.
  • #1191: file list comments
  • #1043: art::getFileFormatVersion and art::getFileFormatEra should be renamed (to note that they are Root-file specific).

Metadata

  • Reorganize metadata (I)
    • Finish enumeration of all stored metadata (start of document is in art/doc directory)
    • Design table structure for storing these data
    • Save the tables in SQLite database
    • Rework code to use data from the relational tables
    • #2352: ProductList to allow same module label, instance name and product type for different branch types.
    • #1460: Dump configuration by process name from data file.
  • #1026: Duplicated code for branch name element verification.

Navigation

  • unification of run/subrun/event (I)
    • this needs to be done after the metadata have been sorted out
    • this includes introduction of Run and SubRun fragments, and modification of several module base class interfaces to deal with the fragments and also to deal with a new concept of begin/end Run and begin/end SubRun
    • Also to be considered: should FileBlock be a similar entity, with products?
      • #1470: Limit output file sizes
      • #1057: Specifying maximum output file size does not work
      • #896: Design and specify maxEvents for output streams
      • Consider an emergency exception throw before allowing a file to become so large that it damages grid nodes.
      • #1918: Thinking ahead to event mixing and art::Ptr to run product
      • #1214: Ptr to Run or SubRun product
  • revamp of processing intervals (adding subrun and run header, trailers, etc.) (I)
    • write down design proposal
    • review the new functionality with stakeholders, determine what backwards compatibility is necessary
  • Overhaul of on-demand reconstruction system.
    • #1246: Reco on demand not finding products in the file; a user has requested a different behavior
    • Audit the tests of the on-demand system to ensure that the functionality is sufficiently tested; augment tests as necessary.
    • Analyze and re-design as needed for parallel execution of modules within a Schedule.
  • Explore the feature request in #1632 (specification of where to begin reading). Also analyze the request #1000: Run/event range
    • Evalute the time needed for the solution.
    • Specify the behavior precisely.
  • #1463: Configuration blocks should enforce contents. We should make sure things that are described (in the configuration) as an analyzer really is an analyzer, etc. This will require changes to the configuration reading mechanism in Schedule.cc and to the module factory, and possibly to the module declaration macros (to make different macros for each type of module).
    • Modify the specification of the workflow to specify each type of data only once. This include #1025: end_paths and trigger_paths are now unnecessary and should be retired.
  • Document behavior of getView; fix any cases that need to be fixed
    • #909: Event::getView should return false (per design) rather than throw in specified cases.

User code

  • art event display program (F)
    • gather requirements for what an event display does (2 days)
    • define an interaction model we can support (2 weeks)
    • See also Event Display Notes

Validation

  • add an external testing/example product (I)
    • create tests that can be used to verify an installation of art is complete, without us needing to build an experiment's code
    • provide a place to store data files usable for easier backwards-compatibility tests
    • provide explanatory examples for how to use features of art.
    • #897: Institute validation tests for an installed art package.

MessageFacility

  • #1144: Defining categories for the error logger. We should make good use in art, and define a recommended policy for experiments. It will be up to them to follow best practice.

Documentation

  • Describe how the service system works, how to use it well. What is required of the T in ServiceHandle<T>? What is the lifetime? How are they constructed? In what order will the services that register for a particular callback get invoked?
  • Describe paths (in such a way that it makes sense when we move to on-demand). Describe when you should use a filter, analyzer, producer. These descriptions should be for the reworked version of analyzer that allows filtering in the manner of output modules.
  • set up the documentation system (I) (4 days)
  • expert class reference docs (F) (Assns, FindAll, PtrRemapper, ContainerUtilities, MixFilter) About 10 things here? 1 man-weeks?
    • common class reference docs (F) (Event, Run/Subrun, Handle, Ptr, PtrVector, View, ParameterSet, "Module", "service system", MapVector, "usable services", FindOne, FindMany) About 30 things here? 2 man-weeks?
  • message facility docs (unknown time guess)
  • tutorial docs (F) (adding pages to the tutorial document slide show) (1 week?)
  • release docs (per-package release notes, and notes for combinations of all packages, back to 0.7.0? more git scripts needed?) (F) (2 days)
  • internal documentation (for each class, minimally, a description of what the class is for) (I)
  • produce our own "best practice" document, for the guidelines we're going to follow (difficulty here is getting agreement) (I) (2 weeks)
  • #1062: Quick reference guide for documentation site
  • #1464: Why does schema evolution behaviour depend on how many events I run over?
  • #1075: Post on the tutorial web site some documentation for FileDumper
  • #943: Document in-use parameters for all framework code.
  • #917: Make FHICL language document pretty.
  • #916: Make a note in the cet-is Wiki on detail namespace / header / source file subdirectory.
  • #946: Review and provide precise definitions of each signal watchpoint.

Tasks

This is a brainstorm list of tasks that might be undertaken. We need to assign a relative priority and a complexity factor to each item.
Each item is classified as: strategic (S), internal (I), or feature (F).

For documentation: doxygen-like stuff within the code.

Our first task will be to break down the tasks into smaller units.

Really important tasks

"Parallel ART" - introduction of multiple Schedules each of which is run in its own thread (S)

Preparatory work

  • Deploy Intel TBB on cluck via UPS (this can be a "product stub" if necessary).
  • Deploy OpenMPI usable via setup (product stub).
  • Deploy MVAPICH (1) usable via setup (product stub).

Data-driven trigger - NOvA (S,F)

  • Implement OpenMP version of Hough transform.

Ancillary tasks

These are tasks that distract from our direct progress with art, but are related to the development of art.
Some involve other groups.

  • clean-up of messagefacility package
    • introduce unit tests
    • clean up dependency problems
    • consider request to add the ability to direct specific severity levels to specific destinations
    • Resolve any issue of maintaining multiple versions of MessasgeFacility. Make sure NOvA is not using a fork.
  • Build system enhancement
    • We need a "find package" CMake system so that the same templating system that creates the UPS metadata makes what is needed for packages built with cetbuildtools can find the necessary headers and libraries.
    • Make cetbuildtools be able to set up for development of several packages simultaneously. The system must assure that user-installed products are used in preference to experiment-installed packages. The system must enforce consistency of the build and setup of products. Review the scripts Adam has written, make sure we can execute the workflow he needs. We may need more than what he specified.
    • make it work on Mac OS X (but don't spend too long on this because if it is hard we don't want it so much)
    • #1027: simple_plugin to extend underscore checking to include path within project. This is done for the name, but not for the subdirectory path.

SAM data handling tasks

These still need to be worked out.

  • SAM protocol handling within the I/O subsystem (ROOT file sequence? As a service?)
  • command-line tool for using the sqlite data within the ROOT file (and optionally extracting it)
  • exception callbacks available in services
  • record all protocol interactions and arguments from SAM into sqlite metadata
  • find and understand the HTTP transactions for SAM (available on their wiki)
  • Data handling interface with SAM (F)
    • discover interaction protocol for communication with SAM
    • make sure to understand anomalous conditions and failure modes
    • locate the documentation or document the division of responsibilities between SAM and art and workflow
    • determine how we will be able to test against SAM
    • design how we make art interact with SAM's protocol
    • discover interaction protocol with workflow system

FWP tasks

Making art parallel

  • Test to see if new kernel/glibc releases solve the problem with OpenMP and multiple libraries with thread-local storage.
  • Look into Google protocol buffers for serialization and Oracle Berkeley DB file format for thread-parallel i/o.
  • Get Philippe to write up an explanation of how to use the MPI & multithreaded profiling tool(s) he is using. Make sure they're install on the grunts and cluck.

Multi-schedule art

  • Services and ActivityRegistry need to have their in-progress changes completed and documented.
    • Simplified ActivityRegistry scheme needs to be completed
      • Revisit template in the light of improved variadic templates in GCC 4.7.1.
    • Service registration by category (global/schedule) including factory changes, as necessary
    • Introduce NUMA control into code (at least for testing). Determine how to configure using FHiCL.
    • Make existing framework services operate correctly. Some services may need to be broken up (e.g., Timing service).
  • Modify EventProcessor to have multiple schedules and to feed them events in parallel. Note that no output is necessary.
  • Design the reduction step to combine non-event data (e.g., histograms).
  • Make sure that all TClass::GetClass calls are made before we enter multithreaded running.
  • Finish the design to allow the factory system to make multiple instances of the same modules (multiple instances made by the factory, not cloning or copying); need to reconcile the ProductRegistry with this plan.
  • Make sure interactions with MessageFacility do not rely on thread ID, but rather on Schedule identification. Our use of OpenMP task parallelism means that thread-local storage is not a solution to our problems.
  • Investigate use of TBB, in place of OpenMP. Check to see if we can replace use of OpenMP in 3 days.
  • Investigate parallel task execution of all analyzers. Identify some analysis tasks to be our use cases. Can we avoid use of Root in these analyzers? What tool can be used for histogramming? Maybe this is something to which we can attach the NIU effort.

artdaq/MPI issues

  • Briefly investigate use of process-parallel HDF5 for storing DS50 data; if it fails, look at parallel netCDF.
  • Decouple from direct dependence on MVAPICH2; introduce abstraction layer to allow use of other MPI implementations.
  • Investigate the need (and document) running algorithms on fragments before the event-building process.

GPU tasks

  • Obtain a sample of realistic hits (from NOvA simulation) and write them to a textual data format that allows us to do the algorithm work outside of art.
  • Implemention of Hough transform on GPU, using CUDA; need not be in the art environment.
  • Identify and learn out to use the tools to evaluation the efficiency of our CUDA code. Ask Qiming what he has so far.

Multithreaded algorithms

  • Produce multithreaded (using OpenMP) version of Hough Transform.
  • Implement run-length encoder with Rice compression.
  • Implement some simple zero-suppression algorithm for DS50 data. [Maybe this will be moved out of FWP work]

artdaq productization tasks

We'll write this list when and if we get the KA-15 funding.

  • Consider production of an application object with a state machine to allow asynchronous control of an artdaq application.