As far as possible, analysers should utilise official samples.
Sometimes, however, you will need to generate your own special samples (for a high-statistics signal sample or a model variation, for example), and these will need to be stored somewhere.
There are some spaces mounted on the uboonegpvm's that you can consider using for storage of samples (for these, you should make your own folder under the users/ area):
- Firstly - never store files on /uboone/app. This space is for software builds and will fill up fast if you put data there.
- /uboone/data has some space for storage of data files, but will not hold a large sample of full artroot files.
- /pnfs/uboone/scratch is ideal for samples you are actively using on a short term basis -
- /pnfs/uboone/persistent is an option for longer-term storage of files not accessed so frequently
For any sample that you want to continue using on a long term basis, and is more than around 50-100 GB in size, you should consider using SAM4Users to move it onto tape-backed storage. This provides you with faster, easier access, and no quotas (although tape costs money, so don't go too crazy). For information on how to use this service, see https://microboone-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=6896.
I found the above a little hard to follow, after discussions with Kirby, these are the commands that I used to successfully pre-stage a dataset --Adam:
Declaring a dataset to SAM¶
- start a screen session
- setup uboonecode in the usual way.
- setup fife utilities
- Make sure your token wont expire
export KRB5CCNAME=FILE:/tmp/$USER`date +%s` setup kx509 ## Gets a full-week renewable token kinit -r7d ## setup background loop to refresh kerberos cred every 12 hours for a week (for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do sleep 43200; kinit -R; done)&
- Make list of input files:
ls -d /path/to/some/files/*.root > input.list
- add dataset to SAM:
-f option defines file list, can also just past files by wildcard
sam_add_dataset --name=alister1_diffusion_30k_cosmics_DT0_DL6-36 -f input.list
- archive dataset --- this means that when you pre-stage the dataset it gets put in to the "read/write pool", which basically looks like scratch to the analyser.
-N option defines number of copy threads.
I had some difficulty here. I assume dCache was being a little bit dodgy. Just keep an eye on it, if it fails, resubmit the command above. SAM is smart enough to know what's been sent to archive and what hasn't. For me I had 600 files, the first 250 took 3 days, and the last 350 to several hours so the time seems to be pretty variable.
You can check the status of the jobs by going to http://samweb.fnal.gov:8480/station_monitor/ and clicking through to uboone/uboone and looking for your name.
sam_archive_dataset --name="alister1_diffusion_30k_cosmics_DT0_DL6-36" -N 4
- You can de-attach your screen session with
- Come back later and use
screen -rto re-attach the session
- ...and now check that your files have only one locatioenstore:/pnfs/uboone/archive/sam_managed_users/alister1/data/0/9/e/en in the archive:
samweb list-files "defname:alister1_diffusion_30k_cosmics_DT0_DL6-36" samweb locate-file <some_file_name>
should output a single location, i.e:
- In the screen session before exiting, delete the kerberos ticket