Create a tool to mv files on dCache and update SAM
DUNE asked us to create a tool that will move files around and update sam at the same time, similar to SAM4Users.
#3 Updated by Vito Di Benedetto 8 months ago
Pengfei, we got this request from a meeting we had with you.
SAM4users tool already exist as part of fife_utils and can be used to help users to create SAM datasets and move/copy files on dCache.
Maybe I'm missing something here.
Is there some use case that could need a similar functionality in Project-py, rather than use SAM4Users tool directly?
#4 Updated by Pengfei Ding 8 months ago
The usage case I was thinking about was:
The user declared the previous stage's files to SAM for the next stage to run. But the files were in scratch area. The location of those files in scratch got added to sam database originally. But if the user want to move the file to another place, they will need to update the sam location by hand to make sure all the sam info are up-to-date.
Different from SAM4Users, this tool is dealing with files already declared to SAM. So it will do the following:
It runs as:
$ move_files_command source_dir destination_dir
The script will move all files from source_dir to destination_dir as you would expect from "mv";
but additionally, when doing the "mv", the script checks if any of the files under source_dir are tracked by SAM, if so, update the corresponding SAM location.
Similarly, you can think of a similar script for doing copies.
I talked with Brandon about SAM4Users and this tool can do exactly what we want: moving a dataset that already registered to SAM from one location to another and updating related SAM metadata.
Project-py has a "await_approval" status between two sequential stages if the user like to examine data before the next stage starts. So a user can use SAM4Users to move data around while in this status, then the user can "approve" to go to the next stage. We added this function as you requested. Please let us know if this is the desired process or not.
#6 Updated by Pengfei Ding 8 months ago
Actually, SAM4Users cannot do exactly what most people want to do here. The feature of SAM4Users you proposed to use takes arguments of SAM dataset name.
However most of the time, users are dealing with project.py's output directory in scratch interactively (via NFS mount of dCache). They very often move the whole output directory in scratch to the persistent area. The output directory contains not only files declared to SAM, but also log files. It makes sense the users want to preserve the whole output directory structure and all the log files.
SAM4Users tool only deals with a defined SAM dataset and only deals with files declared to SAM already. The usage case above, needs a tool similar to "mv", but it does the following:
1. move all files from source to destination
2. if files have been declared to SAM, update its SAM location.