Project

General

Profile

Making Concats

Concats are essentially combined CAF files that contain events only passing certain selections ("decafing") or have some unnecessary branches turned off ("reduction"). They are used extensively in the oscillation analyses to speed up processing, given that our CAFs are now quite large and even with relatively straightforward selections, one can thin them down significantly. The process to make them is relatively straightforward and just involves some wrapper job submission scripts over a CAFAna macro that does the CAF reduction/decafing.

Making reduce macros

One can write a custom reduce macro to make concats for personal use. Examples are provided in the 3flavor context, where the reduce macros are often made once analysis selections are finalized. For the 2020 Analysis, the macros are

 CAFAna/3flavor/reducer/reduce_prod5_nue.C
 CAFAna/3flavor/reducer/reduce_prod5_numu.C

The main cuts used for decafing are :

kNue2020NDDecafCut, kNue2020FDDecafCut stored in CAFAna/Cuts/NueCuts2020.h
kNumu2020NDDecafCut, kNumu2020FDDecafCut stored in CAFAna/Cuts/NumuCuts2020.h

These cuts follow the respective analyses' quality and containment cuts at ND. At FD, the nue appearance analysis just applies a basic quality cut and a loose PID cut of 0.5. This is because the analysis utilises events that fail the core containment criteria ("peripheral events"). For the numu disappearance analysis at FD, there's also an additional loose cosmic rejection cut to manage the size of the output concats.

An additional complication is the decomposition procedure at ND to constrain the backgrounds in the nue appearance analysis at FD. One of the techniques to constrain the beam nue backgrounds, called BEN, utilises contained and uncontained numu events to set constraints on the pi/K ratio of the flux. To ensure that one can use the nue ND concats to do decomposition on the fly, one must store these events as well which can blow up the size quite a bit, since that comes to almost a few million extra events. To select these events first, two other cuts are used in the nue ND decafing

kNumuContainNDDecafCut
kNumuUncontainNDDecafCut

both stored in CAFAna/Cuts/BeamNueCuts.h

These cuts follow the current BEN selections but also include for the possibility of applying more modern numu selections to do the BEN decomposition. For these BEN events however, not all the CAF information is required. Only just a few numu energy variables are needed to do the decomposition, so a large majority of the branches can be turned off. This allows the nue ND concat sizes to be reasonable.

Important to keep in mind however:
Turning off branches that are used in applying the nominal nue ND selections can allow events that wouldn't pass the nue ND selections in the CAFs but would do so in the concats! For example an analysis selection that requires say a cut on hitsperplane < 8 would allow events where that branch is turned off and values reset to -5, even though it shouldn't. Therefore care must be taken not to turn off branches that are used in the analysis cuts. All this is done using a custom reduce function :

ReduceForBEN2020Decaf stored in CAFAna/Decomp/BENDecomp.cxx

Submitting concat jobs

The main job submission script that handles concatting on the grid is `submit_concat_project.py` stored in `NovaGridUtils/bin`. One can either pass your custom macro as an argument to the script or use analysis-specific keywords that it already knows about. For the 2020 3flavor analysis, the options are "numu2020" and "nue2020". For newer analyses, one can just add to the dictionary below and commit the changes to NGU.

DECAF_MACROS={
    "validation"         : cafana_dir + "/CAFAna/nus/reduce_nue_or_numu_or_nus.C",
    "concat"             : "$NOVAGRIDUTILS_DIR/bin/concat_dataset.C",
    "nueSA"              : cafana_dir + "/CAFAna/nue/reduce_nue_sa.C",
    "nue2017"            : cafana_dir + "/CAFAna/nue/reducer/reduce_bendecomp2017.C",
    "nue2018"            : cafana_dir + "/CAFAna/nue/reducer/reduce_nue_2018.C",
    "nue2019"            : cafana_dir + "/CAFAna/nue/reducer/reduce_nue_2018.C",
    "numuSA"             : cafana_dir + "/CAFAna/numu/FirstAnalysis/reduce_numu_fa.C",
    "numu2017"           : cafana_dir + "/CAFAna/numu/Analysis2017/reduce_numu_ana2017.C",
    "numu2018"           : cafana_dir + "/CAFAna/numu/Analysis2018/reduce_numu_ana2018.C",
    "numu2019"           : cafana_dir + "/CAFAna/numu/Analysis2018/reduce_numu_ana2018.C",
    "nus"                : cafana_dir + "/CAFAna/nus/reduce_nus.C",
    "nus2019"            : cafana_dir + "/CAFAna/nus/reduce_nus_ana2019.C",
    "numu2020"           : cafana_dir + "/CAFAna/3flavor/Ana2020/reducer/reduce_prod5_numu.C",
    "nue2020"            : cafana_dir + "/CAFAna/3flavor/Ana2020/reducer/reduce_prod5_nue.C",
    "nus2020"            : cafana_dir + "/CAFAna/nus/reduce_nus_ana2020.C",
    "nue_or_numu_SA"     : cafana_dir + "/CAFAna/nue/reduce_nue_or_numu_sa.C",
    "nu_or_numu_or_nus"  : cafana_dir + "/CAFAna/nus/reduce_nue_or_numu_or_nus.C",
}

Given that one has to make concats for all the ND and FD CAF files including the nonswap/fluxswap/tauswap versions for nominal and special systematic samples as well as ND, FD cosmic and beam data, an additional script is provided to handle so many concat job submissions at once. They are stored in `NovaGridUtils/bin/extra_concat_scripts`. To use them, first make a txt file which contains the list of CAF definitions to make concats over, along with the number of output concat files needed, separated by a comma thus :

prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_ckv-proton-shift-down_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_ckv-proton-shift-down_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_lightmodel-lightup-calibdown_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_lightmodel-lightup-calibdown_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_lightmodel-lightdown-calibup_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_lightmodel-lightdown-calibup_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_calib-shift-fd-func_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_calib-shift-fd-func_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_calib-shift-fd-xyview-pos-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_calib-shift-fd-xyview-pos-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_nonswap_fhc_nova_v08_full_calib-shift-fd-xyview-neg-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_fluxswap_fhc_nova_v08_full_calib-shift-fd-xyview-neg-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_calib-shift-fd-xyview-pos-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_calib-shift-fd-xyview-neg-offset_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_calib-shift-fd-func_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_lightmodel-lightup-calibdown_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_lightmodel-lightdown-calibup_v1,100
prod_caf_R17-03-01-prod3reco.j_fd_genie_tau_fhc_nova_v08_full_ckv-proton-shift-down_v1,100

To make 100 concats for each of the above CAF definitions, run :

$NOVAGRIDUTILS_DIR/bin/extra_concat_scripts/submit_multiple_concats.sh $OUTPUTDIR $RELEASE $DECAF $CAFSETS
submit concat project for many different caf files in one go

OUTPUTDIR is the output pnfs scratch directory of the concat project
RELEASE is the novasoft release in which the concats will be processed
DECAF is the analysis decaf variable passed to submit_concat_project.py
CAFSETS is a comma-separated text file containing the caf definitions from which the concats are going to be made and the number of concat files to be produced
see datasets.txt in NovaGridUtils/bin/extra_concat_scripts/datasets.txt for example

For example, in the 3flavor case one can just do :

$NOVAGRIDUTILS_DIR/bin/extra_concat_scripts/submit_multiple_concats.sh "/pnfs/nova/scratch/users/${USER}/" "development" "nue2020" nue_fd_datasets.txt
$NOVAGRIDUTILS_DIR/bin/extra_concat_scripts/submit_multiple_concats.sh "/pnfs/nova/scratch/users/${USER}/" "development" "numu2020" numu_fd_datasets.txt

and so on for ND as well. This script will run `submit_concat_project.py` multiple times for each CAF definition in the txt file. It's a relatively simple script that doesn't try to handle too many cases but is useful for analysis-specific concat-making.

For the 2020 Analysis, I'd suggest 100 jobs for each CAF definition, nue or numu, ND or FD, MC or beam/cosmic data. Although the FD ones can be smaller, 100 files is around the right number for ND and keeping the same number of files for FD and ND is useful when submitting cafana jobs for the analysis.

FTS and Making SAM definitions

For custom concats, one can simply just make SAM definitions out of the output concats by using SAM4Users : https://cdcvs.fnal.gov/redmine/projects/nova_sam/wiki/User_Datasets and they're ready to go.

For analysis-specific purposes which are used by others, one needs to use the File Transfer Service (FTS) to copy them over to persistent dCache and declare them to the samweb database. This requires having production-level permissions and needs to be done from the novapro account. One can just ask either the production conveners to do this next step or if you expect to be doing this often, its easy to get them as well. To copy and declare the concat files, another helper script is provided in `NovaGridUtils/bin/extra_concat_scripts/cp_dropbox.sh`.
If you have production permissions, simply run :

setup_fnal_security -p --force
$NOVAGRIDUTILS_DIR/bin/extra_concat_scripts/cp_dropbox.sh $CONCATDIR "nue2020" nue_fd_datasets.txt

where $CONCATDIR is the top-level directory containing all the outputs of the concat jobs from before.

Once the concat files are in persistent dCache and declared to SAM, the only remaining step is to make SAM definitions out of them. To do that, simply run :

$NOVAGRIDUTILS_DIR/bin/extra_concat_scripts/make_definitions.sh "nue2020" nue_fd_datasets.txt

This script tries to find the files declared to SAM in the above step by matching up some key metadata parameters. It tries to ensure that all files that are expected actually exist in SAM and will abort if its not the case, so it should be pretty safe to use. It will also take snapshots of the definitions once they're made and save the concat definitions in a text file in the current directory.

The End

It'd be helpful for different analyzers to be able to refer to the concat definitions in a wiki somewhere, so once those are made, please ensure to post them on a wiki page and notify the relevant slack channels. Happy concatting!