Project

General

Profile

Support #8935

How to prevent the number of files in chunk changing when submit jobs

Added by Qiulan Huang over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
Start date:
05/26/2015
Due date:
% Done:

0%

Estimated time:
Duration:

Description

In order to processing the huge amount of files, there are some options to do this.

1) start one project to process the big dataset;
2) split the big dataset into pieces and process them one by one.

I prefer to use the second option. However, the number of files for each chunk is changing.
The question is how to prevent the number of files changing for each chunk.

History

#1 Updated by Paola Buitrago over 4 years ago

  • Assignee set to Paola Buitrago

#2 Updated by Paola Buitrago over 4 years ago

  • Status changed from New to Resolved

When processing a large number of files, data handling experts recommend to do it by splitting the big dataset in small chunks of around ~20K files. In order to avoid having the initial big DS change it's size (as the processing advances and output files reach SAM) there are two options:

1) Define the big dataset not as a draining DS. Create the subsets from the big (non dynamic) definition.
2) Define the big DS as a draining DS and take a snapshot. Create the subsets from the snapshot.

The commnads to create small definitions from a predefined big definition are:

with limit n Limit the number of results to n
with offset n Skip the first n results

  • samweb list-files --help-dimensions


Also available in: Atom PDF