Feature #9277

better pre-staging in SAM DAG jobs

Added by Dennis Box almost 6 years ago. Updated about 5 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Currently when a SAM dataset name is given as input, a diamond shaped dag is created with 3 steps:

1) ifdh startProject sam-dataset-name
2) N jobs consume files from project
3) ifdh endProject

If most of the files from the sam-dataset are on tape step 2) can end up running N load 0 jobs waiting for dcache to move the files to disk, so that they can then be transferred. It is possible to modify step 1) to poll how many of the dataset's files are staged to disk, and wait until a set number or percentage are staged before exiting. This would have the effect of holding one node in a load 0 state instead of N.

Details still being discussed are:
  • where step 1) should run (currently it runs on the same site step 2) runs on)
  • the optimum percentage of staged files


#1 Updated by Dennis Box almost 6 years ago

I started 3 projects yesterday, all on the same dataset with 33746 files to get a feel for how long it takes to stage to disk. There were 0 files staged when I started. This morning, 10 had been transferred to disk, which appears to be the limit for the cdf-caf station. This afternoon, only 8 remained staged
(about 10 lines down)
Per-project prefetched files: 10

#2 Updated by Dennis Box over 5 years ago

  • Assignee set to Dennis Box
  • Target version set to v1.1.9

#3 Updated by Dennis Box over 5 years ago

  • Target version changed from v1.1.9 to v1.2

#4 Updated by Dennis Box about 5 years ago

  • Status changed from New to Resolved

#5 Updated by Dennis Box about 5 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF