Support #8935

How to prevent the number of files in chunk changing when submit jobs

Added by Qiulan Huang almost 6 years ago. Updated almost 6 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


In order to processing the huge amount of files, there are some options to do this.

1) start one project to process the big dataset;
2) split the big dataset into pieces and process them one by one.

I prefer to use the second option. However, the number of files for each chunk is changing.
The question is how to prevent the number of files changing for each chunk.


#1 Updated by Paola Buitrago almost 6 years ago

  • Assignee set to Paola Buitrago

#2 Updated by Paola Buitrago almost 6 years ago

  • Status changed from New to Resolved

When processing a large number of files, data handling experts recommend to do it by splitting the big dataset in small chunks of around ~20K files. In order to avoid having the initial big DS change it's size (as the processing advances and output files reach SAM) there are two options:

1) Define the big dataset not as a draining DS. Create the subsets from the big (non dynamic) definition.
2) Define the big DS as a draining DS and take a snapshot. Create the subsets from the snapshot.

The commnads to create small definitions from a predefined big definition are:

with limit n Limit the number of results to n
with offset n Skip the first n results

  • samweb list-files --help-dimensions

Also available in: Atom PDF