Project

General

Profile

Making HighStat AnalysisTrees

Creating analysis trees from a samweb definition with a specific number of events

At a certain point you'll probably want to look at some analysis trees with a decent number of statistics.
In order to do that, first you would have to figure out which sample you want to use.

You can find MCC8 Physics data here: https://www-microboone.fnal.gov/at_work/AnalysisTools/data/ub_datasets_mcc8_val.html
And MCC8 MC here: https://www-microboone.fnal.gov/at_work/AnalysisTools/mc/mcc8.0/details.html

Click on the 'Describe' button depending on the type of sample and stage you want, and copy the definition.
E.g.: Since we want analysis trees for, let's say, bnb numu events, we can copy: prodgenie_bnb_nu_uboone_mcc8_ana

To get a merged analysis tree, we'll need to create a smaller subset of sample, get the file path to those files, pre-stage those files to disk (unless you want to wait forever) and once the files are prestaged, you can finally merge them. Let's go step by step:

samweb list-definition-files prodgenie_bnb_nu_uboone_mcc8_ana

will show you all the names of the files in the definition.

By copying the name of one of the files, we can also get an idea of the number of events per file.

samweb get-metadata ana_hist_e8aa9915-63ce-42a2-964b-a2e5a6313e27.root

which tells us there's 50 events in this particular file.

We're interested in knowing how many files and events there's in the definition, so we can run the flag --summary to get a summary:

samweb list-definition-files --summary prodgenie_bnb_nu_uboone_mcc8_ana

Which tells us that there's 20k files, with 1 million events for 111GB total size.

Let's say we want 50k events in total, if we have 50 events in a file, that means we need 1000 files.
We can finally select a sub-sample of the definition and create a new definition representing it with the following command:

samweb create-definition sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k defname:prodgenie_bnb_nu_uboone_mcc8_ana with limit 1000

You will have to change sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k with the name you want to give to your own definition.
If you get some error at this stage about permission you may want to try running the following command:
kx509

to update your certificate.

Let's make sure the definition contains the number of events we're interested in:

samweb list-definition-files --summary sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k

which confirms we have 50k events in our definition (and a more managable total size of 5GB).

Now we can merge those files, but first we have to prestage them.
That means copying the files of a sam dataset or dataset definition from tape to dCache.
If you'll try to access them interactively while on tape, that will probably take you forever (not that pre-staging is incredibly fast, but it's fast_er_).
This can take a while, so it wouldn't hurt doing it using screen or nohup.

samweb prestage-dataset --defname=sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k

Great! Now we just have to pass them to hadd (ROOT utility to merge trees).
We can do that by providing a list containing all the paths to the files we want to merge.
Getting the path to samweb file is not exactly a smooth process, since you're provided with the name of the file and the directory, but not both.
There's probably better ways of doing this, but unless you know how to do it, you can use this command.
First pass your definition to an environmental variable:

MYDEFNAME=sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k

and remember to change sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k to your definition name.

Then you can run this:

( samweb list-definition-files ${MYDEFNAME} | while read FILENAME; do PREFIX='enstore:'; SUFFIX='(.*)'; DIR_TO_FILE=$(samweb locate-file ${FILENAME} | sed -e "s@${PREFIX}@@" -e "s@${SUFFIX}@@"); echo ${DIR_TO_FILE}/${FILENAME}; done ) 2>&1 | tee list.list

which will save all the paths to a file called list.list.

Almost there, time to merge the files. Run:

hadd myMergedAna.root @list.list

myMergedAna.root can be whatever name you want for your output file.


In short now, all the commands you need, once you know what you are doing, are the following:

MYORIGINALDEF=prodgenie_bnb_nu_uboone_mcc8_ana
MYDEFNAME=sdporzio_prodgenie_bnb_nu_uboone_mcc8_ana_50k
MYNFILES=1000

samweb create-definition $MYDEFNAME defname:MYORIGINALDEF with limit MYNFILES
samweb prestage-dataset --defname=$MYDEFNAME
( samweb list-definition-files ${MYDEFNAME} | while read FILENAME; do PREFIX='enstore:'; SUFFIX='(.*)'; DIR_TO_FILE=$(samweb locate-file ${FILENAME} | sed -e "s@${PREFIX}@@" -e "s@${SUFFIX}@@"); echo ${DIR_TO_FILE}/${FILENAME}; done ) 2>&1 | tee list.list
hadd myMergedAna.root @list.list


Script

There's a script which does most of what has been outlined above but with a little less effort. It's located in

/uboone/app/users/alister1/ub_tools/getFileList/getFileList.sh

Usage:

This script takes an input sam definition, prestages a selected number of files and returns a list of files to the user.
Usage is ./getFileList <options> with:
-i: input SAM definition 
-n: number of files 
-o: output file list

Then follow instructions above to hadd anlysistrees together, if that's what you want.