Project

General

Profile

AnalysisToolsDiscussion2010

Introduction

I have been asked to organize a discussion about data analysis tools.
In particular, the goal is to explore the features of R that we find
useful and where we are going with this tool, and to explore the
features of the new version of ROOT. This is an information gathering
meeting to help determine the direction we will head as we move into
the future. There are other tools that are in use, such as Maxima,
Mathematica, Matlab, and IDL. We can discuss whether or not these tools
will be worthwhile discussing in a future meeting.

The goal of this discussion will be to formulate a list of necessary
and useful features, and list how ROOT and R satisfy those needs.
It is also meant to introduce people to other systems that might be of
use for analysis task such as R.

Scope

We will be focused on the needs of the scientists and others in our department.
We will not be looking to make a decision in the same way that CDF and D0 did
and choose a tool that everyone will march forward with.

Due to the time constraints, we will first focus on ROOT and R.

We want input from the following people: Jim Simone, Adam Lyon, Erik Gottschalk from elsewhere,
Steve Mrenna, Daniel Elvira, and Rob Kutschke, and others from the ADS department.

Organization

Some questions to think about when preparing the feature list and for our discussion are:

  1. where are the current tools that you use inadequate? (graphical or analysis)
  2. what is too slow? (performance, will the tool carry out the task in a timely fashion and utilize the available computing power)
  3. what to difficult to do? (programming is too awkward or data inaccessible in a convenience way)

For presenting material, I want to know the following:

  1. what is the community of users?
  2. how are releases and add-on packages distributed and managed?
  3. what sorts of problems are addressed by the tool by the community of users and what is the tool best at?
  4. how does the tool satisfy our needs at Fermilab?
  5. how is the tool being used at Fermilab?
  6. what is the data analysis language and what makes it good or interesting for us?
  7. how does the tool work in the environment we are in? (GRID, licensing, file formats)

Target date: August.
Time needed: 2 hours.

Format:

  1. Presentation of high-level needs for analysis. (must have hard limit on time and focused on the first three questions above)
  2. Brief talks showing R and future-ROOT. (using the second set of questions as a guide)
  3. Open Discussion

Because of the number of things that can be discussed on this topic, this might be the first of
a couple sessions.

Marc input and discussion at CSR meeting

questions to data analysts

  1. data formats - what form is your data in? (need examples here, including informal ones)
  2. what tools do you use to manipulate data? (C++, mathematica, ...)
  3. how easy is it to exchange data or obtain and share data from various experiments or simulations?
  4. how easy is it to exchange data or obtain and share data from a multitude of tools?
  5. what facilities are there for creating and maintaining larger analysis codes?
  6. is the workspace an interesting aspect of analysis? (mathcad, mathematica)
  7. is integration with documentation tools interesting? (literate programming aspects)
  8. is "reproducible research" aspects important?

there are two aspects: (1) data manipulation and (2) graphics for plotting and other visualization including
interactions.