Disk Usage

A simple page to produce a snapshot of what we have stored where on the disks in no specific order.
If you have a directory on any of the disks and know what's in it please add it to the list.
We need to keep a better audit of the disk usage so as not to come unstuck when we hit quotas, as well to keep better track of where everything is located.

Usage as of June 18th, 2017:

Filesystem            Size  Used Avail Use% Mounted on
                       12T  8.6T  3.5T  72% /nova/app
blue3:/nova/ana       130T   97T   34T  75% /nova/ana
blue3:/nova/data      105T   90T   16T  86% /nova/data
blue3:/nova/prod      100T  2.2T   98T   3% /nova/prod

/nova/prod (Updated August 12th, 2016)

Space used Area Lifetime
49617 G mc
33550 G data
1722 G concat
977 G FTS_DropBoxes
344 G reco_validation_Oct2014_tmpdir

mc/: Various MC tagged releases from S12-12-12 through to S14-02-05 - needs a full audit.
Clearly the biggest hog here!

Here is the breakdown of the MC directory:

480K    /nova/prod/mc/development
9.1G    /nova/prod/mc/fcl
1.1G    /nova/prod/mc/None
1.6T    /nova/prod/mc/S12-12-12
32K    /nova/prod/mc/S13-01-15
1.6G    /nova/prod/mc/S13-02-03
13T    /nova/prod/mc/S13-02-26
5.1M    /nova/prod/mc/S13-06-05
1.4T    /nova/prod/mc/S13-06-13
7.9T    /nova/prod/mc/S13-06-18
18G    /nova/prod/mc/S13-06-26
441G    /nova/prod/mc/S13-12-13
531G    /nova/prod/mc/s14-01-20
338G    /nova/prod/mc/S14-01-20
985G    /nova/prod/mc/S14-02-05
4.3G    /nova/prod/mc/S14-02-05a
102G    /nova/prod/mc/S14-03-06
25G    /nova/prod/mc/S14-03-25
96K    /nova/prod/mc/S14-05-05
11T    /nova/prod/mc/S14-05-08
12T    /nova/prod/mc/S14-05-12
178G    /nova/prod/mc/S14-07-03
40G    /nova/prod/mc/S14-07-11
100G    /nova/prod/mc/S14-08-15
312G    /nova/prod/mc/S14-08-19

data/: 5.7TB - Tagged versions of reconstructed FarDet "numi" data

/nova/data (Updated August 12th, 2016)

Space used Area Lifetime
31562 G mc
21025 G novaroot
12379 G rawdata
10813 G nearline-OnMon
7918 G flux
4988 G nearline
3443 G nearline-Ana
2747 G spillserver_logs
2018 G pedestal_data
795 G pidlibs

Here is the breakdown of the MC directory:

609M    /nova/data/mc/daq_simulated_data
6.1M    /nova/data/mc/fcl
290M    /nova/data/mc/fclfiles
32K    /nova/data/mc/in_progress
152G    /nova/data/mc/in_progress_old
179G    /nova/data/mc/logfiles
86G    /nova/data/mc/S12.06.17_MDC_reco   Accessed 2014/03/31
6.5M    /nova/data/mc/S12-10-04_to_FTS     Not accessed this year, except metadata files
8.7T    /nova/data/mc/S12-11-16            Accessed 2014/04/14
32K    /nova/data/mc/S12-12-12
992G    /nova/data/mc/S13-01-15
32K    /nova/data/mc/S13-02-03
310G    /nova/data/mc/S13-02-26
25G    /nova/data/mc/S13-02-26a
551G    /nova/data/mc/S13-04-09
17T    /nova/data/mc/S13-06-05
3.5T    /nova/data/mc/S13-06-13
7.5T    /nova/data/mc/S13-06-18
386G    /nova/data/mc/S13-07-22
908M    /nova/data/mc/S13-09-17
205G    /nova/data/mc/S13-10-11
203G    /nova/data/mc/S13-12-13

/nova/ana (Updated August 12th, 2016)

Space used Area Lifetime
67844 G users
3524 G nu_e_ana
2812 G calibration
2221 G nu_int_ana
2008 G assembly_ana
758 G nova_cvmfs
745 G trigger
323 G exotics_ana
270 G steriles
2 G nu_mu_ana

/nova/ana/users (Updated August 12th, 2016)

Total Size: 65.477 T

Largest space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
bianjm 4420 G
radovic 2838 G
nsmayer 2078 G
crisprin 2047 G
ksachdev 1953 G
edniner 1932 G
tamsett 1894 G
psihas 1859 G
rschroet 1839 G
rhatcher 1762 G

No longer large space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
brunetti 184 G

/nova/app/users (Updated August 12th, 2016)

5.79 T

Largest space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
blinehan 200 G
prabhjot 195 G
bckhouse 192 G
crisprin 185 G
denis 182 G
timkudt 178 G
radovic 172 G
psail 156 G
bays 139 G
mstrait 137 G

No longer large space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
arrieta1 47 G Can remove more after graduate in October

novapro Quota

Usage as of June 18th, 2017:

           Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
                  3386G       0   4608G          35209k       0       0
                  1272G       0   1536G          17969k       0       0
                  4034M       0  51200M          20458k       0       0
blue3:/nova/ana   2353M       0   2048G          24701k       0       0
                  1875G       0    100T           29834       0       0
                     32       0    500G           5974k       0       0
                      0       0    300G               3       0       0
                 11247M       0    200G           20795       0       0
                 26437G       0  30720G            553k       0       0

If you have a bunch of important analysis files, but there just isn't room in /nova/ana/users for what you need to do ... welcome to dCache! 

Please see the pnfs tutorial here: "DocDB 13747":

Temporarily, you can put your fil|||es at (great for files returning from grid jobs):


The best-practices method for moving your files there is to use ifdh cp. The first thing you need to do is make the /pnfs/nova/scratch/users directory that you want to write to be GROUP-writeable. That is, if I wanted to move something to my area, I would (first-time only) do:

chmod g+w /pnfs/nova/scratch/users/lein

Then, the command information for ifdh cp is:

ifdh cp args 

The very simple example is:

ifdh cp test.txt /pnfs/nova/scratch/users/lein/                                                     

For more advanced use, general file copy using cpn locks dd, gridftp, or srmcp supports:                                                        

         * basic source/dest filenames: cp src1 dest1 [';' src2 dest2 [';'...]]                    
         * recursive directory copies: cp -r src1 dest1 [';' src2 dest2 [';'...]]                 
         * copies to dest. directory: cp -D src1 src2 destdir1 [';' src3 src4 destdir2 [';'...]]  
         * copies to a list file: cp -f file_with_src_space_dest_lines                       
         * any of the above can take --force={cpn,gridftp,srmcp,expgridftp}                          
         * any of the file/dest arguments can be URIs

Note that this is a scratch area where files have a limited lifetime.  Least recently accessed files are deleted first.  The lifetime is typically a few months, and if you need the file set more permanently, you can copy it to the tape backed area (use sam_clone_dataset from the Sam4users tools below).

More pnfs resources:
  Basic dCache documentation:
  lifetime plots:

h2. Disk Usage Management

These are notes written by Susan on how she manages disk space usage. 

The goal is to never let any of the areas get filled up.

Usually, it is relatively safe if all the areas are less than 80% full. If /nova/ana is 85% or more full, you should probably panic.

Fairly frequently (a few times a week, whenever I feel a prickle on the back of my neck that someone, somewhere is doing something horrible) I looked at "BlueArc for NOvA" plot at the bottom of this page:
(log in with services account)

If any of the areas are using more space than I expect, I investigate to figure out way. Sharp upward changes in size are particularly worrisome. 

Also, I update this wiki page with the information from weekly space usage emails. Updating the wiki page gives me a chance to meditate on the size of each area and think about who is using too much space. 

To update the wiki page, I also weekly run:

df -h (for the top section)

ksu novapro
quota -u novapro -s (for section on novapro Quota - make sure nothing is changing too much, getting close to filled)

/nova/ana is the area that gets filled up quickly and unexpectedly, usually due to new/rogue users. Usually, between the weekly emails and snooping on grid usage, followed by du -hs in individual user areas, I can find out who is mostly responsible. Then I send emails alerting them to the situation and urging alternative action. Feel free to send a lot of emails nagging about space - I certainly do. 

/nova/prod and a few other areas just slowly grow over time - I watch this and periodically make a fuss about needing things archived.

The weekly emails also list biggest space users of condor-tmp - I usually bug people who use more than production. This is a relatively low priority issue.