High-level Project areas¶
- art development (cleanup, features, applications)
- beyond art - workflow and data handling
- strategic - multi-threaded framework, new I/O system
- other factor - LSST DAQ
We need to think of what our short term (Oct) goals are, what our mid-term (Jan 2012) goals are, and what the longer term goals are.
Art task list¶
- set up the documentation system (I) (4 days)
- expert class reference docs (F) (Assns, FindAll, PtrRemapper, ContainerUtilities, MixFilter) About 10 things here? 1 man-weeks?
- common class reference docs (F) (Event, Run/Subrun, Handle, Ptr, PtrVector, View, ParameterSet, "Module", "service system", MapVector, "usable services", FindOne, FindMany) About 30 things here? 2 man-weeks?
- message facility docs (unknown time guess)
- tutorial docs (F) (adding pages to the tutorial document slide show) (1 week?)
- release docs (per-package release notes, and notes for combinations of all packages, back to 0.7.0? more git scripts needed?) (F) (2 days)
- internal documentation (for each class, minimally, a description of what the class is for) (I)
- produce our own "best practice" document, for the guidelines we're going to follow (difficulty here is getting agreement) (I) (2 weeks)
- #1062: Quick reference guide for documentation site
- #1464: Why does schema evolution behaviour depend on how many events I run over?
- #1075: Post on the tutorial web site some documentation for FileDumper
- #943: Document in-use parameters for all framework code.
- #917: Make FHICL language document pretty.
- #916: Make a note in the cet-is Wiki on detail namespace / header / source file subdirectory.
- #946: Review and provide precise definitions of each signal watchpoint.
- non-event object manipulations - histogram and TObjects (F)
- obtain agreement on requirements from current stakeholders
- document features and limitations of the current system
- design the "new and improved" system
- event filtering for Analyzers (F)
- identify a package that will do the logical expression parsing we need
- extract the current "event selection" from Output modules
- #2246: Do-nothing art job reports over 500 megabytes memory usage
- #2233: SimpleMemoryChecker needs to be able to cope with different procfs formats on different SLF versions.
- #1338: Output histograms don't obey custom ROOT style
- #1091: Review all signal emission in the light of sentry considerations and exception safety.
- #942: Design mechanism for library versioning and consistency checks
- #938: code in art should not be directly throwing cet::exception; use art::Exception
- #920: Design and implement a mechanism for specifying obsolete parameters with info/warnings when seen.
- #919: Investigate report of problem with floating point exception control service.
- #918: Add architecture-specific capture of call stack at time of construction of cet::exception.
Integration and review¶
- #908: Review structure, code and other aspects of cpp0x and cetlib
- #974: Review non-running legacy tests to see if they are worth porting or should be removed.
- #907: Review and rationalize use of exceptions across art.
- #903: Create forwarding headers in each library with standard naming (fwd.h).
- Audit art for:
- Copyability/movability of classes
- Applicability of smart pointers (e.g., possible replacement with
- "Make changes as necessary to conform with rational design"
- I/O revamp (I)
- correct the Root modules to obey the rules of the state machine; get rid of DRISI
- introduce abstracted i/o layer with which modules interact and which allows Root to "fast clone" more cleanly
- #1759: Periodic refreshing of histogram files
- #1470: Limit output file sizes?
- #1316: Snapshot histogram file
- #1191: file list comments
- #1043: art::getFileFormatVersion and art::getFileFormatEra should be renamed
- #896: Design and specify maxEvents for output streams
sqlite integration (I,S) (1 day)
- Reorganize metadata (I)
remove storage of ParameterSets from its current location, and move them to a ParameterSets table in sqlite (2 days)
- determine what are the remaining distinct steps of the reorganization (1 week)
- #1918: Thinking ahead to event mixing and art::Ptr to run product
- #1026: Duplicated code for branch name element verification
- #899: Should module_label continue to be injected into each module?
- unification of run/subrun/event (I)
- this needs to be done after the metadata have been sorted out
- this includes introduction of Run and SubRun fragments, and modification of several module base class interfaces to deal with the fragments and also to deal with a new concept of begin/end Run and begin/end SubRun
- Also to be considered: should
FileBlockbe a similar entity, with products?
- revamp of processing intervals (adding subrun and run header, trailers, etc.) (I)
- write down design proposal
- review the new functionality with stakeholders, determine what backwards compatibility is necessary
- #1246: Reco on demand not finding products in the file
- #1632: -e command line option ignored
- #2135: make FindMany and FindOne return art::Ptr for the found products
- #1463: Configuration blocks should enforce contents
- #1214: Ptr to Run or SubRun product
- #1197: Unit / regression tests for RootInput random access.
- #1127: Re-write art::Group::getProductType() to not use product()
- #1025: end_paths and trigger_paths are now unnecessary and should be retired.
- #1000: Run/event range
- #911: Investigate storage of exceptions in Handle and friends and whether it can be improved.
- #910: Investigate whether Event::fillView_ should use reflex_cast instead of static_cast.
- #909: Event::getView should return false (per design) rather than throw in specified cases.
ProductListto allow same module label, instance name and product type for different branch types.
- art event display program (F)
- gather requirements for what an event display does (2 days)
- define an interaction model we can support (2 weeks)
- See also Event Display Notes
- #1340: Allow creation of subdirectories with TFileService
- add an external testing/example product (I)
- create tests that can be used to verify an installation of art is complete, without us needing to build an experiment's code
- provide a place to store data files usable for easier backwards-compatibility tests
- provide explanatory examples for how to use features of art.
- #897: Institute validation tests for an installed art package.
- #2275: On-demand loading of module libraries
- #1144: Defining categories for the error logger
- #1758: Histograms in shared memory
- #1460: Dump configuration by process name from data file.
- #1057: Specifying maximum output file size does not work
- #1014: some parameter sets in the ServicesManager do not have service_type defined
- #956: Testing of signal emission is needed
- #939: Print cumulative timing information for each module at end of job when Timing service is used
This is a brainstorm list of tasks that might be undertaken. We need to assign a relative priority and a complexity factor to each item.
Each item is classified as: strategic (S), internal (I), or feature (F).
For documentation: doxygen-like stuff within the code.
Our first task will be to break down the tasks into smaller units.
Really important tasks¶
"Parallel ART" - introduction of multiple Schedules
each of which is run in its own thread (S)¶
Deploy GCC v4.6.1 on cluck (SLF6) (1 day)
- Move to C++11 compilation of ART and all dependent code (S,F)
- Deploy Intel ArBB and/or TBB on cluck via UPS (1-2 days)
- Deploy Intel CnC on cluck via UPS (1 day)
- Deploy an MPI 2.x implemention on cluck usable via setup (0.5 day)
- Clearly define the constraints put on this development experiment (2 days)
- Evaluate the work involved in limited harmonization of Run, SubRun and Event (0.5 day).
- Determine the requirements for parallel handling of run/subrun data
- Connect input system to MPI event builder (1 day).
- Excise the output system (0.25 day).
- Excise code not needed for basic test of parallel functionality.
- Audit remaining code for basic safety w.r. parallelism (services, statics, etc).
Longer-term strategic goals¶
- Modify the use of the input system to feed multiple Schedules (2 days)
Data-driven trigger - NOvA (S,F)¶
- formalize requirements for the trigger program
- implement these requirements in the current framework (not multithreaded!)
- migrate to use of the multi-schedule art
- try to migrate Andrew Norman's trigger algorithm to a "new technology"
These are tasks that distract from our direct progress with art, but are related to the development of art.
Some involve other groups.
- clean-up of messagefacility package
- introduce unit tests
- clean up dependency problems
- consider request to add the ability to direct specific severity levels to specific destinations
- clarify issue of who will maintain this package
- clarify usage by NOvA online and NOvA offline
UPS enhancement manage prerequisites according to our specification (no unsetup of product before setup) make it work on Mac OS X
- Build system enhancement
- Write a utility to generate a UPS table file to allow use of a non-UPS product (0.125 day).
- add more automation, so more people can create builds and releases
- make it work on Mac OS X
- #890: Review linking of shared libraries.
- #1027: simple_plugin to extend underscore checking to include path within project.
- Review of the Message Analyzer code
New SAM data handling tasks¶
These still need to be worked out.
- SAM protocol handling within the I/O subsystem (ROOT file sequence? As a service?)
sqlite handling in ROOT
- sample use of sqlite use in ROOT (make tables, insert and select data)
- command-line tool for using the sqlite data within the ROOT file (and optionally extracting it)
- exception callbacks available in services
- record all protocol interactions and arguments from SAM into sqlite metadata
- find and understand the HTTP transactions for SAM (available on their wiki)
- Data handling interface with SAM (F)
- discover interaction protocol for communication with SAM
- make sure to understand anomalous conditions and failure modes
- determine division of responsibilities between SAM and art and workflow
- determine how we will be able to test against SAM
- design how we make art interact with SAM's protocol
- discover interaction protocol with workflow system