Coordination for a large data sample pre-staging¶
MicroBooNE requires the pre-staging of dataset larger than "dev" sample (4000 files or 10 TB file size) to be coordinated with production team.
- Before making pre-staging request, check pre-staging dataset status table if your requested sample has already been pre-staged. How long the pre-staging last for the dataset depends on various factors. A rule of thumb is if the dataset has been pre-staged within a couple of weeks, then most likely you don't have to pre-stage it.
- Check the dataset information with samweb commands (learn more about SAM) (More SAM commands)
$ samweb list-files --summary "defname:Your_Dataset_Name"
Here is an example:
samweb list-files --summary "defname:prodgenie_bnb_intrinsic_nue_cosmic_uboone_mcc8.7_reco2" File count: 12287 (file numbers) Total size: 34235530158207 (file size in byte) Event count: 614350 (available events)
Recommended Data and MC Samples for analysis (as of Aug. 20th, 2018)¶Data:
Status of Dataset Prestaging¶
|Dataset||# of files||Size||Status||Requested by||Approved||Notes|
Data and MC samples¶
Find full list of samples at MicroBooNE at Work
List of Production Requests¶
Find actual list of production requests to the team at List of Production Requests
For production team members¶
For users¶Before submitting large number of jobs to the grid, first make sure you have tested your workflow and it is correct. Also make sure your resources request setup follows the uB's Grid Best Practices
- Herb's Grid Best Practice
- Kirby and Wei's Best Practice
- Herb's real life example of how to use recursive dataset
- You can also check Matt's Summary of grid best practice
- Want to prestage and process a dataset with more than 4000 files or size > 10 TB
- Cancel large number (>100) of jobs on the grid
- Have questions about running jobs on the grid