Room for Improvement

This is a collection of issues that have arisen in the CMS framework which may have a simpler, more maintainable, easier to use, or otherwise superior solution.

Navigation infrastructure is too complicated

Frequent questions come up in the CMS hypernews lists about how to deal with the navigation class templates (edm::Ptr, edm::Ref, etc.) Related are the various user-invented contain class templates, and also the edm::View template and its relatives. These seem to be too complicated for many people to bother understanding.

Perceived need for separate ntuple format

The CMS event model was designed with the intent to avoid user-defined ntuples. The hope was that users could use the EDM format directly as their ntuple format. There is common resistance to doing so. The main reason named is the file size overhead because of the metadata stored in the file. Some users would be satisfied only with zero overhead.

Redundant metadata

The stored form of the data contains some redundancies. In other words, the data is not in the equivalent of third normal form.

Inflexibility of metadata storage

Backwards compatibility is difficult to achieve (as can be seen by the frequency with which backward compatibility bugs appears) because the metadata class structure is complex. Root schema evolution limits what sort of changes can be made to the metadata storage.

Reasoning about metadata storage is difficult

The metadata storage system is based on a set of classes influenced by a fairly large group of developers over several years. The design is no longer very clear. Because this is a special-purpose system, it does not rely on any well-established methodology. This description is meant to contrast it with the relational model and more specifically with relational calculus as a means for reasoning about the organization of the data.

Too many string operations

The use of string labels to identify data in the CMS EDM has lead to the code that does lookups of data products being complained about as too slow. Profiling of the CMS application shows that a significant fraction of the program time is spent in string comparisons. A significant effort is still underway to reduce the number of string comparisons being done. Additional effort has gone into replacing some uses of std::string with const char*, in places that profiling shows the creation and destruction of std::string instances is common.

Problems with fast copy mode

CMS's fast copy mode (in which branches in an input ROOT file which are not being read and reconstituted into C++ objects are carried unmodified into the output file) is a frequent source of errors. This seems to be because of a combination of the complexity of the code that handles it, the imperfect separation of the handling from the other reading of data (and especially interaction with the delayed-reading code), and frequently-encountered bugs in ROOT itself.