- Table of contents
- Using Docker with NOvA Software
- Step 1. Install OSX Fuse (https://osxfuse.github.io) if you are using macOS. Skip this step for Linux.
- Step 2. Install CVMFS
- Step 3. Install Docker
- Step 4. Run the docker container
- Step 5. Setup NOvA software environment, e.g.
- Step 6. Get a valid voms proxy
- Step 7. Make addpkg_svn to work properly
- Step 8. Create a test release and build a package
- Step 9. Running NOvA EventDisplay.
- Feldman-Cousins Corrections in Docker
- Feldman-Cousins Corrections in Docker without cvmfs
- Building docker images for NERSC
- Run the container interactively at NERSC
- hadd ROOT files in parallel at NERSC
- Feldman-Cousins with DIY
Using Docker with NOvA Software¶
Contact Pengfei or join nova-docker slack channel for any questions.
Step 1. Install OSX Fuse (https://osxfuse.github.io) if you are using macOS. Skip this step for Linux.¶
Step 2. Install CVMFS¶
Follow the instructions at https://cernvm.cern.ch/portal/filesystem/quickstart to install CVMFS on macOS or Linux (Window is not supported yet). After installation, please do the following to configure CVMFS properly.
sudo wget http://home.fnal.gov/~dingpf/cvmfs.tar.gz sudo rm -rf /etc/cvmfs/* sudo tar zxvf cvmfs.tar.gz; sudo mv cvmfs/* /etc/cvmfs/ sudo mkdir /cvmfs/nova.opensciencegrid.org sudo mkdir /cvmfs/fermilab.opensciencegrid.org sudo cvmfs_config reload sudo mount -t cvmfs nova.opensciencegrid.org /cvmfs/nova.opensciencegrid.org sudo mount -t cvmfs fermilab.opensciencegrid.org /cvmfs/fermilab.opensciencegrid.org
Alternatively, here is a Wiki page with more detailed instructions.
Step 3. Install Docker¶
- For macOS, refer to https://store.docker.com/editions/community/docker-ce-desktop-mac
- after the installation finished, in Docker Preferences, add “/cvmfs” to the “File Sharing” list, click “Apply & Restart” to restart docker.
- For Linux, e.g. Ubuntu please refer to https://docs.docker.com/install/linux/docker-ce/ubuntu/ for installation instructions. You can choose a different Linux distribution from the menu on the left side of this page.
Step 4. Run the docker container¶
docker run --rm -it -v /cvmfs:/cvmfs:cached -v $HOME:/scratch dingpf/slf6.7
Step 5. Setup NOvA software environment, e.g.¶
source /cvmfs/nova.opensciencegrid.org/novasoft/slf6/novasoft/setup/setup_nova.sh \ -e /cvmfs/nova.opensciencegrid.org/externals \ -5 /cvmfs/nova.opensciencegrid.org/novasoft/slf5/novasoft \ -6 /cvmfs/nova.opensciencegrid.org/novasoft/slf6/novasoft \ -r S18-02-25 -b maxopt
Step 6. Get a valid voms proxy¶
kinit YOUR_USER_NAME@FNAL.GOV # replace YOUR_USER_NAME with your fermilab user name # kx509 has been installed in the image, this is the recommended way of getting the certificate instead of using cigetcert directly. # use cigetcert -i "Fermi National Accelerator Laboratory" in case kx509 failed. kx509 voms-proxy-init --rfc --voms=fermilab:/fermilab/nova/Role=Analysis --noregen
Step 7. Make addpkg_svn to work properly¶
mkdir ~/.ssh # create ~/.ssh/config as the following (replace YOUR_USER_NAME with your fermilab user name) host cdcvs.fnal.gov User YOUR_USER_NAME ForwardX11 = no GSSAPIAuthentication yes GSSAPIDelegateCredentials yes
Step 8. Create a test release and build a package¶
Create your working directory under /scratch (note only files under mounted path to the container will be kept after shutting down the container.
newrel -t S18-02-25 testrel_s180225 cd testrel_s180225 addpkg_svn CAFAna S18-02-25 make all # The build will fail due to missing shared library links (we will fix it in the development release) # you will need to add the following line to “CAFAna/Core/GNUmakefile” # override CPPFLAGS += -I$(BOOST_INC) # Add similar thing to CAFAna/XSec/GNUmakefile, the line will be like: # override CPPFLAGS += -I$(NUTOOLS_INC) -I$(GENIE_INC)/GENIE/ -I$(BOOST_INC)
Step 9. Running NOvA EventDisplay.¶
- If you want to open an event display, start the docker container with:
docker run --rm -it -p 5900:5900 -v /cvmfs:/cvmfs:cached -v $HOME:/scratch dingpf/slf6.7
- Then run the following script to start the vnc server in the container:
/home/me/start-xvnc.sh &
- Once the vnc server is up and running, you can connect to the VNC session from your host machine via any VNC client. The address is vnc://localhost:5900. The password is password.
- For mac, you can start a VNC viewer by press Cmd+k in Finder, and connect to vnc://localhost:5900.
- In the VNC session, there is an xterm opened for you. You can do your usual software setup with "setup_nova" and get a valid voms proxy. You can use xrdcp to get any file from dCache to your container, for example, if you want to run EventDisplay on this file fardet_r00014054_s38_t00_R16-03-03-prod2reco.d_v1_data.pid.root from one of the official dataset, you will need to do the following:
# setup nova source /cvmfs/nova.opensciencegrid.org/novasoft/slf6/novasoft/setup/setup_nova.sh \ -e /cvmfs/nova.opensciencegrid.org/externals \ -5 /cvmfs/nova.opensciencegrid.org/novasoft/slf5/novasoft \ -6 /cvmfs/nova.opensciencegrid.org/novasoft/slf6/novasoft \ -r S18-02-25 -b maxopt # get the xrootd file access url fpath=`samweb get-file-access-url --schema=xroot fardet_r00014054_s38_t00_R16-03-03-prod2reco.d_v1_data.pid.root` # Get a valid voms-proxy kinit YOUR_USER_NAME@FNAL.GOV # replace YOUR_USER_NAME with your fermilab user name kx509 voms-proxy-init --rfc --voms=fermilab:/fermilab/nova/Role=Analysis --noregen # Do a xrootd copy of the file to your local disk in the container xrdcp $fpath ./ # start the EventDisplay on this file, the following command need to be run # INSIDE the xterm in the VNC session (together with the setup_nova). nova -c evd.fcl fardet_r00014054_s38_t00_R16-03-03-prod2reco.d_v1_data.pid.root
Feldman-Cousins Corrections in Docker¶
If you're crazy enough to want to run the 2017 Analysis Feldman Cousins corrections on a local machine, here's how you would do it. You cannot do batch submissions to the grid this way, but this is to illustrate how you might run a single FC job using the docker image: dingpf/sfl6.7.
Download Prerequisites¶
The 2017 FC script requires some input root files. These will need to be downloaded to the local machine, and can scp'd from /pnfs/nova/persistent/users/ddoyle/localnova.tar.gz
Unpack this to some /path/to/localnova.
Enter Docker¶
See previous instructions for setting up Docker on your local machine. Once installed, run:
sudo docker run --rm -it -v /path/to/localnova:/scratch -v /cvmfs:/cvmfs:cached dingpf/slf6.7 # set environment variable used in FC script export FCHELPERANA2017_LIB_PATH=/scratch/FCHELPERANA2017_LIB_PATH
The -v /path/to/localnova:/scratch
option mounts the localnova volume to the /scratch directory within the docker. It is important that localnova is mounted here for the code to find required files. Feel free to use this volume as storage as changes are not persistent within the docker.
Setup test release¶
Create a test release from S18-02-25 then add CAFAna from the FC-at-NERSC branch.
newrel -t S18-02-25 <mylocalfc> cd <mylocalfc> srt_setup -a addpkg_svn -b CAFAna FC-at-NERSC novasoft_build -t
Run the script¶
CAFAna/nue/Ana2017/joint_fit_make_fc_surf.C¶
void joint_fit_2017_make_fc_surf(int NPts, int bin, bool nh, int N, std::string plot)
This take NPts
, the number of experiments to throw at a bin, bin
which bin to throw to, a bool for specifying mass hierarchy, N
a bookkeeping parameter, and a string plot
specifying either "ssth23dmsq32" or "deltassth23" contours.
Example:
cafe -bq joint_fit_2017_make_fc_surf.C 10 10 true 0 ssth23dmsq32
CAFAna/nue/Ana2017/joint_fit_make_fc_slice.C¶
void joint_fit_2017_make_fc_slice(int NPts, int bin, bool nh, int N, std::string plot="delta")
The procedure for slices is similar. The options for plot
are "delta", "ssth23", or "dmsq32".
Example:
cafe -bq joint_fit_2017_make_fc_slice.C 10 10 true 0 ssth23
Feldman-Cousins Corrections in Docker without cvmfs¶
Follow the following instructions to run Feldman-Cousins Corrections in Docker if you do not have cvmfs in your local system.
# Get the standalone docker image with software and spectrums for running FC corrections scp novagpvm02.fnal.gov:/pnfs/nova/persistent/users/dingpf/dingpf--fc-on-nersc--S18-02-25-maxopt-Ana2017.tar . # Register it with docker daemon docker load -i dingpf--fc-on-nersc--S18-02-25-maxopt-Ana2017.tar # Get the script to run the docker image wget http://home.fnal.gov/~dingpf/run_fc.tar.gz tar zxvf run_fc.tar.gz cd run_fc # run FC experiments ./run_fc.sh --macro=CAFAna/nue/Ana2017/joint_fit_2017_make_fc_slice.C --npoints=2 --bin=1 --hierarchy=true --id=0 --plot=ssth23 ./run_fc.sh --macro=CAFAna/nue/Ana2017/joint_fit_2017_make_fc_surf.C --npoints=2 --bin=1 --hierarchy=true --id=0 --plot=ssth23dmsq32
Building docker images for NERSC¶
Currently the image building procedure includes the following steps:- Prepare a test release of CAFAna locally with the version of code you want;
- Run a generic docker container and mount the local test release and cvmfs repo to the container, and build the test release in the container;
- Also in the container, compile the CAFAna macro you want to run at NERSC.
- Gather files needed by CAFAna package from cvmfs, and pull those files from cvmfs repo to a local directory;
- Build the docker image with the local test release and local cvmfs directory;
- Push the image to NERSC and put it on Cori or Edison.
I've created scripts to locate and pull files from cvmfs, as well as build the docker image. They are under the following directroy on docker-bd.fnal.gov.
/home/dingpf/cvmfs_dev
Under this directory, you will see:
cvmfs_dev ├── build.sh ├── copy_cvmfs_dir.py ├── copy_cvmfs_file.py ├── dirs.list ├── Dockerfile_nova-fc-2018:fc-on-nersc:development-maxopt-base ├── Dockerfile_nova-fc-2018:fc-on-nersc:development-maxopt-v0.0 ├── image │ ├── cvmfs │ └── development ├── libs.list └── make_libs.sh
The test release is under
/home/cvmfs_dev/image/development
You should follow the directory structure to make the build script work.
Rebuilding the Test Release¶
At the top level execute the "build_dev.sh" script which will run the following docker command:
IMAGE=<path to image subdir in build directory> sudo docker run --rm -it -v $IMAGE/development:/development -v /cvmfs:/cvmfs dingpf/slf6.7 /development/ru n.sh make CAFAna.all
This is built against the full CVMFS repo.
Now you need to rebuild the CAFAna macro. To do this you need to run the macro once.
IMAGE=<path to image subdir in build directory> sudo docker run --rm -it -v $IMAGE/development:/development -v $IMAGE/cvmfs:/cvmfs dingpf/slf6.7 /development/ru n.sh cafe -bq <fullpath in image>/<macro_name> <options for macro>
Example Syntax:
sudo docker run --rm -it -v $IMAGE/development:/development -v /cvmfs:/cvmfs registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:fcinput-cvmfs /development/run.sh cafe -bq /development/CAFAna/nue/Ana2018/FitandFC/make_fc_slices_nersc_2018.C 1 0 0 true 0 10 5 ssth23 both false false
Rebuilding the Images and pushing to NERSC¶
To pull the files from cvmfs, run the following:
./copy_cvmfs_dir.py dirs.list image/cvmfs; ./copy_cvmfs_file.py libs.list image/cvmfs
To build the image, copy "Dockerfile_nova-fc-2018:fc-on-nersc:development-maxopt-v0.0" into "Dockerfile_nova-fc-2018:fc-on-nersc:development-maxopt-v${NEW_VERSION}", where ${NEW_VERSION}" is your desired version number, and then run
sudo ./build.sh Dockerfile_nova-fc-2018:fc-on-nersc:development-maxopt-v${NEW_VERSION}
This script will build the new image, and tag it. It will prompt you to push the image to NERSC registry at the end. Run the command that it prompts you do run to do the push. This must be done AS ROOT (not sudo):
ksu # become root # Push the image docker push registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:development-maxopt-v0.3
If you forget to increment the version number, the image can be retagged via:
sudo docker images # List the images # Then Retag with sudo docker tag ba7c53be347e registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:development-maxopt-<New Version>
Once the image is pushed to NERSC private registry, loing to Cori and/or Edison, and run the following to make the image available on the Supercomputers:
# ON CORI Load the Shifter Module $ module load shifter-registry # Login to the Registry $ shifterimg-beta login registry.services.nersc.gov registry.services.nersc.gov username: <NERSC username> registry.services.nersc.gov password: <Application Token> # Pull the Image in to the Shifter Registry $ shifterimg-beta pull registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:development-maxopt-v0.0
Run the container interactively at NERSC¶
Once the image is pulled to Cori, you can do the following to run it in the interactive queue.
dingpf@cori02:~> salloc -N 1 -C haswell -q interactive --image=docker:registry.services.nersc.gov/nova-nus18/median-sensitivity:development-maxopt-v0.5 -t 04:00:00 salloc: Pending job allocation 12513427 salloc: job 12513427 queued and waiting for resources salloc: job 12513427 has been allocated resources salloc: Granted job allocation 12513427 salloc: Waiting for resource configuration salloc: Nodes nid00050 are ready for job dingpf@nid00050:~> dingpf@nid00050:~> shifter --volume=$CSCRATCH:/output /development/run.sh cafe -bq -nr /development/CAFAna/nus/Nus18/MakeSurfaceMedian.C fhc th24vsdm41 systs both Running cafe -bq -nr /development/CAFAna/nus/Nus18/MakeSurfaceMedian.C fhc th24vsdm41 systs both ** NOvA Common Analysis Format Executor ** root -l -n -b -q /development/CAFAna/load_libs.C /development/CAFAna/nus/Nus18/MakeSurfaceMedian.C+("fhc","th24vsdm41","systs","both")
hadd ROOT files in parallel at NERSC¶
- Copy the "merge_template" directory to your scratch area, and rename it as "merge";
cp -r /global/project/projectdirs/m2612/merge_template $CSCRATCH/merge # You need to rename your existing "merge" directory first if you have one.
- Make a list file containing the ROOT files you want to merge and name the list file as "<plot_name>_<mass_hierarchy>.list" (or "<plot_name>.list"), e.g.
find /global/cscratch1/sd/burt/output/ssth23 -type f -size +7k -name "comb_ssth23_ih*.root"| tee ssth23_ih.list find $PLOT_OUTPUT -type f -name "comb_mass.root"> mass.list
- Start an interactive job on Cori as:
salloc -N 1 -C haswell -q interactive -t 04:00:00 # swap haswell to knl if you want KNL nodes.
- After you were logged into the interactive node, run the following to have ROOT files merged:
module swap PrgEnv-intel PrgEnv-gnu module load root $CSCRATCH/merge/run_hadd.sh ssth23_ih.list 10 # the 1st argument is the filelist name, the 2nd is number of parallel hadd jobs you want to run, 10 is a good number for a list of 2000 files.
- The script above will create a directory <plot_name>_<mass hierarchy> under $CSCRATCH/merge, and two subdirectories "hadd" and "list". The input filelist will be splitted in to N small file lists, where N is the number of parallel hadd jobs. The "list" subdirectory stores these splitted filelists. The "hadd" subdirectory stores the merged ROOT files for each splitted filelists.
- After the files in each splitted filelists got merged, you can either copy those files back to NOvA GPVM and do the hadd on those files there or you can do the following on Cori to merge them:
module swap PrgEnv-intel PrgEnv-gnu module load root hadd <plot_name>_<mass_hierarchy>.root hadd/*.root
Feldman-Cousins with DIY¶
"DIY is a block-parallel library for implementing scalable algorithms that can execute both in-core and out-of-core. The same program can be executed with one or more threads per MPI process, seamlessly combining distributed-memory message passing with shared-memory thread parallelism. The abstraction enabling these capabilities is block parallelism; blocks and their message queues are mapped onto processing elements (MPI processes or threads) and are migrated between memory and storage by the DIY runtime. Complex communication patterns, including neighbor exchange, merge reduction, swap reduction, and all-to-all exchange, are possible in- and out-of-core in DIY." - https://github.com/diatomic/diy
With proper implementation, DIY will enable us to unlock the full potential of the machines at NERSC and other supercomputing facilities. Our analysis is entirely ROOT based so proper implementation is difficult in practice (ROOT doesn't like to be multithreaded). For now, we are using DIY simply for labor distribution and job submission, ie. each DIY block manages the execution of our CAFAna macro and determines the correct arguments based on block-space topology and the parameter space of interest.
Setup¶
1. Make image layer containing scidac4 repo, DIY, and MPICH install
2. Build CAFAna image against this layer
3. Pull to NERSC
4. Make a job script
5. Profit
1. Make layer containing scidac4, DIY, MPICH¶
I have a directory containing everything you need to build this image on docker-bd.fnal.gov that you can use as an example or copy directly from.
home/ddoyle/cvmfs_dev_diy
The important subdirectory is the scidac4-hep-on-hpc. This is the cloned git repo that contains the code that will go into the image. You can clone this repo with
git clone git@bitbucket.org:jkowalkowski/scidac4-hep-on-hpc.git
You'll also need the mpich tarball, which can be found at http://www.mpich.org/
Make sure the scidac rep and mpich tarball are at the same level as the docker file and do
./build Dockerfile_nova-fc-2017\:nova-event-selection\:R16-07-15-secondana.a-maxopt-v0.1
The build script will print the command you can use to push the image to NERSC. Before doing so, it may be a good idea to test the image locally.
You'll notice a few extra things in the Dockerfile: devtoolset-2 and updated glibc libraries. Our image is built from Cent OS 6, which is pretty outdated. These extras update the c/c++ compiler so that we can use CRAY's much newer MPI.
Also notice that mpich is installed in a non-standard location. This is important because nersc machines will want to run their version of MPI so we don't want to install our libraries in the same place as theirs.
2. Build CAFAna image against this layer¶
Follow the steps above to create an image containing cvmfs, fcinput, and CAFAna but with one change to the Dockerfile
#FROM registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:fcinput-cvmfs ## delete this line FROM registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:diy-mpich-scidac4
then
./build.sh Dockerfile_nova-fc-2018\:fc-on-nersc\:development-maxopt-v1.0
and use the command printed at the end of the execution of the build script to push image to nersc
3. Pull image onto nersc¶
Instructions for pulling images onto nersc are at http://www.nersc.gov/users/software/using-shifter-and-docker/using-shifter-at-nersc/
On Cori, you can pull images under an existing name to update without changing the version number, however Edison does not allow you to do this. Every image pulled to Edison will need a unique name
To test that everything is working correctly and you're using CRAY's MPI, request an interactive session:
salloc -N 1 -C haswell -q interactive -t 01:00:00 --image=registry.services.nersc.gov/nova-fc-2018/fc-on-nersc:development-maxopt-v1.0
and run the following
ddoyle@nid00184:~> export MPICH_VERSION_DISPLAY=1 ddoyle@nid00184:~> srun -n 4 shifter /fc/fc MPI VERSION : CRAY MPICH version 7.6.2 (ANL base 3.2) MPI BUILD INFO : Built Wed Aug 23 17:23:37 2017 (git hash 34d433786) MT-G
If your build is correct, you will see the CRAY MPICH version display
4. Make a job script¶
Take a look at my example script at
/global/cscratch1/sd/ddoyle/diy-fc/sample_sub.sbatch
Developing¶
You can clone the scidac repository into your nersc directories and mount the fc code into the image to test and build updates to the code. Take a look at the README at https://bitbucket.org/jkowalkowski/scidac4-hep-on-hpc/src/master/fc/ to learn how to build the fc code from scratch.
Once you have clone the scidac repo, request an interactive session. The repo can be mounted into the image with
shifter --volume=/path/to/scidac4-hep-on-hpc/fc:/fc /bin/bash
Once in the image, you'll need to enable the correct compiler with
source /opt/rh/devtoolset-2/enable
Then you'll be able to compile the code as is done in the README mentioned above. You can then try running your compiled code with
srun shifter --volume=/path/to/scidac4-hep-on-hpc/fc:/fc /fc/fc -args -args -args
Then once you have something you're happy with, you can commit your changes to the scidac repo and rebuild the image. If you update the scidac repo on docker-bd.fnal.gov before building, the new image will have picked up your recent changes.