Feature #20020

generate a command or simple set of instructions to report how much of a dataset is staged to disk from enstore

Added by Michael Kirby almost 3 years ago. Updated almost 3 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Is it possible to create a samweb or fife_util command that will determine what fraction of a SAM dataset is staged to disk from enstore? Simply put, there have been several occasions where MicroBooNE analyzers have followed best practices to stage their dataset 24 hours before submitting processing jobs, but have still suffered poor grid job efficiency and/or job timeout waiting for data delivery. It would be excellent if we could have a command or simple instructions for analyzers and production groups to know what percentage of a dataset is staged to disk and what fraction is only located on tape. (assumption is that the two total to 100%, but who knows...)

There isn't a need to estimate when a dataset will be fully staged, that is beyond the scope of any functional command. It would be good to list the fraction, total number staged, bytes staged, number of files not staged, and bytes not staged. Please let me know if there is a need for additional clarification or details.

- Kirby


#1 Updated by Robert Illingworth almost 3 years ago

Here's an example of counting the files on disk in a sam query. If you try it you'll see it's quite a slow process.

$ samweb -e uboone list-file-locations --dimensions='snapshot_id 15594539' --filter-path=enstore | awk '{sub(/^enstore:/, "", $1); print ($1 "/.(get)(" $2 ")(locality)")}' | while read f; do cat "$f"; done | grep "ONLINE" | wc -l

Also available in: Atom PDF