Project

General

Profile

Bug #4605

FarDet DATA reconstruction processing: memory leak

Added by Gavin Davies over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Immediate
Assignee:
-
Category:
-
Start date:
08/29/2013
Due date:
% Done:

0%

Estimated time:
Duration:

Description

Here I provide an example of an aborted reconstruction job that exceeded 4GB memory usage during GRID processing.

One can see how the memory usage changes (limited view) over time in the log file here:

/nova/data/condor-tmp/gsdavies/grid_run_caf.sh_20130826_175108_14324_0_1.log

The job can be reproduced interactively in development (I've added a toggle to Calibrator.fcl to UseMCcalib - Use MC calibration constants, so it's simply a fcl change.

setup_nova -b maxopt
cd <test_release>;
nova -c recoproductionjob.fcl -s /nova/data/novaroot/FarDet/S13-07-22/000109/10976/cosmic/fardet_r00010976_s00_t02_cosmic_S13-07-22_v1.data.daq.root

Before running the nova command, they will need to copy the .fcl to their test release and add the following line:

services.user.Calibrator.UseMCcalib: true

Someone will need to use google-perftools. There are some instructions here:
https://cdcvs.fnal.gov/redmine/projects/novaart/wiki/Performance_Checking

Another useful utility is the SimpleMemoryCheck service: https://cdcvs.fnal.gov/redmine/projects/art/wiki/SimpleMemoryCheck_service

History

#1 Updated by Gavin Davies about 6 years ago

use the following input file instead (unpacked in a new tag, but no changes to the daq2rawdigit):

/nova/data/novaroot/FarDet/S13-09-04/000109/10976/cosmic/fardet_r00010976_s00_t02_cosmic_S13-09-04_v1.data.daq.root

#2 Updated by Gavin Davies about 6 years ago

The recoproductionjob can run through to completion if I remove the KalmanTrack and KalmanTrackMerge modules pre- Nick's commit 6700

4.2s CPU/event, with discretetrack taking longest at 2.87s/event
Total time is 38571s --> ~10hours!!!

/nova/data/condor-tmp/gsdavies/data_10976/success/grid_run_reco_data.sh_20130904_122704_8936_0_1.*
for the log/err/out files.

Memory usage topped out at 1.71GB, well below the 4GB limit that we have been hitting.

Testing now with latest Kalman changes.

#3 Updated by Gavin Davies about 6 years ago

  • Status changed from New to Resolved

After recent KalmanTrack changes/face-lifts this "leak" no longer exists, so I will mark this as resolved/closed.

#4 Updated by Gavin Davies about 6 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF