Disk Usage » History » Version 399

« Previous - Version 399/440 (diff) - Next » - Current version
Jagjeet Singh, 07/28/2016 04:02 AM

Disk Usage

A simple page to produce a snapshot of what we have stored where on the disks in no specific order.
If you have a directory on any of the disks and know what's in it please add it to the list.
We need to keep a better audit of the disk usage so as not to come unstuck when we hit quotas, as well to keep better track of where everything is located.

Usage as of July 28th, 2016:

Filesystem            Size  Used Avail Use% Mounted on
blue3:/nova/data      140T   98T   43T  70% /nova/data
blue3:/nova/ana        95T   80T   16T  84% /nova/ana
blue3:/nova/prod      100T   87T   14T  87% /nova/prod
                       10T  9.4T  703G  94% /nova/app

/nova/prod (Updated July 28th, 2016)

Space used Area Lifetime
49617 G mc
33550 G data
1722 G concat
977 G FTS_DropBoxes
344 G reco_validation_Oct2014_tmpdir

mc/: Various MC tagged releases from S12-12-12 through to S14-02-05 - needs a full audit.
Clearly the biggest hog here!

Here is the breakdown of the MC directory:

480K    /nova/prod/mc/development
9.1G    /nova/prod/mc/fcl
1.1G    /nova/prod/mc/None
1.6T    /nova/prod/mc/S12-12-12
32K    /nova/prod/mc/S13-01-15
1.6G    /nova/prod/mc/S13-02-03
13T    /nova/prod/mc/S13-02-26
5.1M    /nova/prod/mc/S13-06-05
1.4T    /nova/prod/mc/S13-06-13
7.9T    /nova/prod/mc/S13-06-18
18G    /nova/prod/mc/S13-06-26
441G    /nova/prod/mc/S13-12-13
531G    /nova/prod/mc/s14-01-20
338G    /nova/prod/mc/S14-01-20
985G    /nova/prod/mc/S14-02-05
4.3G    /nova/prod/mc/S14-02-05a
102G    /nova/prod/mc/S14-03-06
25G    /nova/prod/mc/S14-03-25
96K    /nova/prod/mc/S14-05-05
11T    /nova/prod/mc/S14-05-08
12T    /nova/prod/mc/S14-05-12
178G    /nova/prod/mc/S14-07-03
40G    /nova/prod/mc/S14-07-11
100G    /nova/prod/mc/S14-08-15
312G    /nova/prod/mc/S14-08-19

data/: 5.7TB - Tagged versions of reconstructed FarDet "numi" data

/nova/data (Updated July 28th, 2016)

Space used Area Lifetime
31562 G mc
21025 G novaroot
12379 G rawdata
10617 G nearline-OnMon
7918 G flux
4988 G nearline
3405 G nearline-Ana
2695 G spillserver_logs
2018 G pedestal_data
795 G pidlibs

Here is the breakdown of the MC directory:

609M    /nova/data/mc/daq_simulated_data
6.1M    /nova/data/mc/fcl
290M    /nova/data/mc/fclfiles
32K    /nova/data/mc/in_progress
152G    /nova/data/mc/in_progress_old
179G    /nova/data/mc/logfiles
86G    /nova/data/mc/S12.06.17_MDC_reco   Accessed 2014/03/31
6.5M    /nova/data/mc/S12-10-04_to_FTS     Not accessed this year, except metadata files
8.7T    /nova/data/mc/S12-11-16            Accessed 2014/04/14
32K    /nova/data/mc/S12-12-12
992G    /nova/data/mc/S13-01-15
32K    /nova/data/mc/S13-02-03
310G    /nova/data/mc/S13-02-26
25G    /nova/data/mc/S13-02-26a
551G    /nova/data/mc/S13-04-09
17T    /nova/data/mc/S13-06-05
3.5T    /nova/data/mc/S13-06-13
7.5T    /nova/data/mc/S13-06-18
386G    /nova/data/mc/S13-07-22
908M    /nova/data/mc/S13-09-17
205G    /nova/data/mc/S13-10-11
203G    /nova/data/mc/S13-12-13

/nova/ana (Updated June 30th, 2016)

Space used Area Lifetime
67363 G users
3523 G nu_e_ana
2812 G calibration
2200 G nu_int_ana
2008 G assembly_ana
758 G nova_cvmfs
745 G trigger
323 G exotics_ana
206 G steriles
2 G nu_mu_ana

/nova/ana/users (Updated June 9th, 2016)

Total Size: 65.477 T

Largest space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
bianjm 4406 G
radovic 2838 G
nsmayer 2078 G
crisprin 2047 G
ksachdev 1942 G
edniner 1912 G
tamsett 1894 G
psihas 1848 G
rschroet 1839 G
barnali 1632 G

No longer large space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
brunetti 184 G

/nova/app/users (Updated June 9th, 2016)

5.79 T

Largest space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
blinehan 200 G
rhatcher 194 G
prabhjot 194 G
bckhouse 184 G
denis 182 G
timkudt 178 G
radovic 169 G
crisprin 158 G
tianxc 138 G
bays 133 G

No longer large space users:

User Space Used Expected Space Need Reason for Files Expected Lifetime
arrieta1 47 G Can remove more after graduate in October

novapro Quota

Usage as of July 28th, 2016:

       Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
                 81705M       0    300G            638k       0       0
                  11144       0   5120M          14600k       0       0
                  1326G       0   1536G          25964k       0       0        
                 24156G       0  28672G            528k       0       0        
blue3:/nova/ana   6575G       0   8192G          25718k       0       0        
                 86203G       0    100T           4461k       0       0
                 37047M       0    200G          31563k       0       0        
                 99434G       0    128T           5460k       0       0        
                 12773G       0  27648G          12043k       0       0        
                  2305G       0   3994G          28815k       0       0        

Users files on dCache

If you have a bunch of important analysis files, but there just isn't room in /nova/ana/users for what you need to do ... welcome to dCache!

Please see the pnfs tutorial here: DocDB 13747

Temporarily, you can put your fil|||es at (great for files returning from grid jobs):


The best-practices method for moving your files there is to use ifdh cp. The first thing you need to do is make the /pnfs/nova/scratch/users directory that you want to write to be GROUP-writeable. That is, if I wanted to move something to my area, I would (first-time only) do:

chmod g+w /pnfs/nova/scratch/users/lein

Then, the command information for ifdh cp is:

ifdh cp args

The very simple example is:

ifdh cp test.txt /pnfs/nova/scratch/users/lein/

For more advanced use, general file copy using cpn locks dd, gridftp, or srmcp supports:

  • basic source/dest filenames: cp src1 dest1 [';' src2 dest2 [';'...]] * recursive directory copies: cp -r src1 dest1 [';' src2 dest2 [';'...]] * copies to dest. directory: cp -D src1 src2 destdir1 [';' src3 src4 destdir2 [';'...]] * copies to a list file: cp -f file_with_src_space_dest_lines * any of the above can take --force={cpn,gridftp,srmcp,expgridftp} * any of the file/dest arguments can be URIs

Note that this is a scratch area where files have a limited lifetime. Least recently accessed files are deleted first. The lifetime is typically a few months, and if you need the file set more permanently, you can copy it to the tape backed area (use sam_clone_dataset from the Sam4users tools below).

More pnfs resources:
Basic dCache documentation:
lifetime plots:

Disk Usage Management

These are notes written by Susan on how she manages disk space usage.

The goal is to never let any of the areas get filled up.

Usually, it is relatively safe if all the areas are less than 80% full. If /nova/ana is 85% or more full, you should probably panic.

Fairly frequently (a few times a week, whenever I feel a prickle on the back of my neck that someone, somewhere is doing something horrible) I looked at "BlueArc for NOvA" plot at the bottom of this page:
(log in with services account)

If any of the areas are using more space than I expect, I investigate to figure out way. Sharp upward changes in size are particularly worrisome.

Also, I update this wiki page with the information from weekly space usage emails. Updating the wiki page gives me a chance to meditate on the size of each area and think about who is using too much space.

To update the wiki page, I also weekly run:

df -h (for the top section)

ksu novapro
quota -u novapro -s (for section on novapro Quota - make sure nothing is changing too much, getting close to filled)

/nova/ana is the area that gets filled up quickly and unexpectedly, usually due to new/rogue users. Usually, between the weekly emails and snooping on grid usage, followed by du -hs in individual user areas, I can find out who is mostly responsible. Then I send emails alerting them to the situation and urging alternative action. Feel free to send a lot of emails nagging about space - I certainly do.

/nova/prod and a few other areas just slowly grow over time - I watch this and periodically make a fuss about needing things archived.

The weekly emails also list biggest space users of condor-tmp - I usually bug people who use more than production. This is a relatively low priority issue.