Project

General

Profile

Running Cosmic Filtering

Cosmic filtering is a two-stage process. First, CPU jobs must be submitted to preprocess the raw2root'ed files to prepare the inputs for the cosmic CVN network. Second, a GPU-based job will run to actually run the network and filter the hits from timeslices which lack neutrinos. In order to avoid creating a large duplicate of all our cosmic data on tape, the output from the first stage job goes to the temp data tier, which means it will only be stored on scratch. This means the follow-up jobs must be submitted relatively quickly to ensure that the files aren't lost in the mean time.

Preprocessing

These jobs are relatively lightweight (~3000 s/file and 1800 MB of ram), the real limit will be the prestaging of the input data. Of course the rate these jobs can be run will be limited by the rate that cosmic data can be prestaged from tape. However, it will also be limited by how quickly the output can be consumed. If the files sit around for too long before being used by the GPU process they will be lost and will need to be recovered.

Here is an example submission configuration, which includes the optional running in Singularity containers on the OSG.

#########################
# This job specifically #
#########################

-f production.inc

#Type the name of your job here
--jobname cosfiltprep_<period/epoch>
--defname prod_artdaq_<release>_fd_cosmic_<period>

--njobs <jobs>

####################################
# Let's try running in singularity #
####################################

-f offsite_singularity.inc
# Could also be everywhere.inc

#######################################
# Configure cosmic filtering prep job #
#######################################

-c prod_cosfilterprep_job.fcl
--tag R19-10-03-cosfilter.c

#This adds in stashcache, currently only a couple sites support this, but it should be harmless to leave in
--export IFDH_COPY_XROOTD=1

# Requested memory in MB
--mem 1800
--disk 5000

# Time in seconds per file
--dynamic_lifetime 3000

# Copyout
--outTier=out1:temp

GPU Processing

The actual cosmic filtering uses specialized resources available from a few OSG sites (Syracuse/SU-ITS, Nebraska/Omaha, and UCSD) which have GPUs on the nodes. Additional resources are being explored as well. These jobs must run in a singularity container, and a number of special changes are need in the submission configuration. Very likely these jobs can always run from this simple draining definition defined as:

prod_cosfilterprep_draining "data_tier temp minus isparentof:(data_tier cosfilt)" 

While jobs are out on the grid, you can continue the previous project. Once that project runs out of files, start again with a new submission against this draining definition. These datasets are likely to end up having a very large number of files, so it will be useful to keep the draining definition itself quite simple.

Here's an example configuration for running on GPUs:

#This is a template to help you configure you mc generation jobs on the grid.

#########################
# This job specifically #
#########################

-f production.inc

#Type the name of your job here
--jobname cosfiltergpu
--defname prod_cosfilterprep_draining

--njobs 200

######################################
# Configure cosmic filtering GPU job #
######################################

#Leave this as mcgen, required for submit_nova_art.py handle job correctly
-c prod_cosfilteronly_job.fcl
--tag R19-10-03-cosfilter.c

# Launch the python job
--earlyscript /cvmfs/nova.opensciencegrid.org/externals/novaproduction/$NOVAPRODUCTION_VERSION/NULL/bin/start_python_cvncosmicrej.sh

#This adds in stashcache, currently only a couple sites support this, but it should be harmless to leave in
--export IFDH_COPY_XROOTD=1

# Requested memory in MB -- try 4000, see if we still get O(100) nodes
--mem 4000 
--disk 5000

# Time in seconds
--dynamic_lifetime 2500

# GPU only available offsite
--offsite_only
--singularity /cvmfs/singularity.opensciencegrid.org/novaexperiment/el7-tensorflow-gpu:latest
--gpu
--disable_cvmfs_version_matching
--export NOVASOFT_BYPASS_OS_CHECK=1

# Copy out
--outTier=out1:cosfilt

Preparing Cosmic Datasets

Dealing with cosmic datasets is always a challenge since they are generally so large. There is the additional complexity that the release used for raw2root processing has changed over time. They are:

Release Periods First Beam Run Last Beam Run Definition
S15-03-11 1-4 129421 23670 prod_artdaq_fd_cosmic_S15-03-11_all
S16-11-02 5 24614 25412 prod_artdaq_fd_cosmic_S16-11-02_all
S17-02-21 6 25413 26685 prod_artdaq_fd_cosmic_S17-02-21_all
S17-10-30 7-now 28037 - prod_artdaq_fd_cosmic_S17-10-30_all

There is also a common joint definition prod_artdaq_fd_cosmic_combo_all based on def_snapshot of the above separate release definitions:

prod_artdaq_fd_cosmic_combo_all:
"def_snapshot 'prod_artdaq_fd_cosmic_S15-03-11_all' or def_snapshot 'prod_artdaq_fd_cosmic_S16-11-02_all' or
def_snapshot 'prod_artdaq_fd_cosmic_S17-02-21_all' or def_snapshot 'prod_artdaq_fd_cosmic_S17-10-30_all'" 

This definition has been further sub-divided into 50 subsets using stride and offset, and then draining definitions have been constructed from these based only on the release, rather than data tier, so they are generic for making either temp files or going directly to cosfilt files:

prod_artdaq_fd_cosmic_combo_X_of_50:
"defname: prod_artdaq_fd_cosmic_combo_all with offset X stride 50" 

prod_artdaq_fd_cosmic_combo_cosfiltdrain_X_of_50 
"defname: prod_artdaq_fd_cosmic_combo_X_of_50 minus isparentof:(nova.release R19-10-03-cosfilter.c)" 

The above definitions snapshot in reasonable periods of time and each contain ~18k files as of the time of writing.