Project

General

Profile

Running out of Disk Space on ubdaq-prod-evb ?

If this is the case there are several things one should do:

0) Make sure that dCache is normally operating:
--> Take a look at the log file for prod_transfer_binary_evb2dropbox_evb
/home/uboonepro/pubs/log/ubdaq-prod-near1.fnal.gov/prod_transfer_binary_evb2dropbox_evb.log #A note on history, this project is listed as running on evb, but is actually run on near1
and search for errors / transfer failures like those below
[ ERROR ] transfer (L: 234) >> {transfer_file} Issuing deletion command: ifdh rm /pnfs/uboone/scratch/uboonepro/dropbox/data/uboone/raw/PhysicsRun-2018_12_28_6_39_57-0020483-00002.ubdaq.json
[ ERROR ] transfer (L: 238) >> {transfer_file} TRIED TO DELETE /pnfs/uboone/scratch/uboonepro/dropbox/data/uboone/raw/PhysicsRun-2018_12_28_6_39_57-0020483-00002.ubdaq.json but got an error from ifdhc
If that is the case and the errors persist, activate the bypass by following the instructions here What to do if dCache/enstore go down (no access to pnfs area)

1) Idenfity who is using up the disk space. Options:
--> a) /data/uboonedaq/rawdata/ is where data from "official" runs goes. Files here are seen (and should be eventually removed) by PUBS.
--> b) /data/uboonedaq/TestRuns/ - is disk-space DAQ people use to test things. It is not seen by PUBS and needs to be removed by hand in order to be cleared.

useful info: there are ~ 33 TB of disk space in /data/ on the evb machine. PUBS will try and clear data in /data/uboonedaq/rawdata/ once it has been transferred to the FTS dropbox and verified. Note that files in other directories will not be deleted (e.g. /data/uboonedaq/TestRuns/).

If most of the space is not being used by /data/uboonedaq/rawdata/ we need to free space manually. If it is urgent to free up space (i.e. data-taking should not be interrupted and the disk will fill up rather soon) you are authorized to clear /data/uboonedaq/TestRuns/. Contact any other person who is using up a considerable amount of space and ask them to quickly remove contents in their /data/ folder.
If /data/uboonedaq/rawdata/ is using up a significant amount of space, the problem is probably PUBS' fault.
2) identify the cause of the problem. Why is disk space not being freed? Possible causes:
--> a) prod_clear_binary_evb is having issues. Look in this log for errors: /home/uboonepro/pubs/log/ubdaq-prod-evb.fnal.gov/prod_clean_evb_binary_evb.log
--> b) prod_clear_binary_evb does not find any new files to clear. This indicates a possible problem with one of the projects that prod_clear_binary_evb depends on. A possible cause could be poor network speed to drain data out of the evb machine. Look for projects upstream from prod_clear_binary_evb (e.g. /home/uboonepro/pubs/log/ubdaq-prod-near1.fnal.gov/prod_verify_binary_evb2dropbox_near1.log)