Support #22412

Duplicate files on tape.

Added by Arthur Kreymer almost 2 years ago. Updated 5 months ago.

Start date:
Due date:
% Done:


Estimated time:
4.00 h
Duration: 13


In migrating form LTO4 to LTO8,
a small fraction of duplicate files were detected.
We should investigate the origins of this.


#1 Updated by Arthur Kreymer almost 2 years ago

Date: Tue, 16 Apr 2019 16:16:45 -0500
From: Bo Jayatilaka <>
To: Jiyeon Han <>, Jorge Chaves <>, Arthur E Kreymer <>
Subject: Multiple copy files on tape for MINOS

In migrating MINOS data off of LTO4 tapes we've encountered files that have multiple copies on tape.
If you look at the summary table here:

there are 6 MINOS families where a small fraction of files have multiple copies on tape. As none of these are vault data, is it
possible these second copies were made by mistake?

media_type | storage_group |          file_family           | original_files | duplicated_files | non_duplicated_files 
LTO4 | minos | fardet_data | 135961 | 2 | 135959
LTO4 | minos | reco_far_cedar_phy_bhcurv_sntp | 5556 | 293 | 5263
LTO4 | minos | reco_far_R1_18_4 | 28917 | 7971 | 20946
LTO4 | minos | reco_far_R1_24b | 16102 | 48 | 16054
LTO4 | minos | reco_mc_near_cedar | 20279 | 998 | 19281
LTO4 | minos | reco_near_cedar_sntp | 19218 | 99 | 19119

#2 Updated by Arthur Kreymer almost 2 years ago

Defective files were sometimes moved to /pnfs/minos/BAD,
and replaced with good versions of files of the same name.

But I see no such fardet_data files under /pnfs/minos/BAD.

#3 Updated by Arthur Kreymer almost 2 years ago

  • % Done changed from 10 to 20

I have scanned for fardet_data duplicates, using the Complete File Listing
obtained from
and cached under /minos/data/web/computing/dh/dcache/CFL/

The latest listing, filtered to remove MIGRATION files, is CFLnom.

FARDET-det is a sorted list of files under /pnfs/minos/fardet_data
FARDET-mcout is a sorted list of 204 files under /pnfs/minos/mcout_data
which were assigned to the fardet_data family in error.
I see no duplicate file names.

I have checked for duplicate content by sorting file sizes and crc's in
Unique sizes/crcs are in FARDET-SIZCRCsu

Ten of the FARDET-SIZCRCs entries are not unique

The CFL.nom entries for these are listed in FARDET-DUPS.

In all cases it seems a file has been written to the same Volume twice,
with the same PNFS path. This is not something Minos controls

There are only three volumes involved: VON475 , PSF328, VP6336.

Bo -
is this the issue at hand, or should be be looking for some other
sort of duplicate files ?

#4 Updated by Bo Jayatilaka almost 2 years ago

Hi Art,

Ok, so it seems like none of these duplicates were intentional. Is it ok that we mark all second copies of files as deleted before migrating the tapes?


#5 Updated by Arthur Kreymer almost 2 years ago

  • % Done changed from 20 to 90
  • Status changed from Assigned to Work in progress

Because the copies on tape are identical, and were not produced intentionally,
it should be fine to pick the copy you prefer for migration to new media.

We have extra copies of fardet_data files,
and the other file families are not active.

#6 Updated by Arthur Kreymer over 1 year ago

  • % Done changed from 90 to 100
  • Status changed from Work in progress to Resolved

#7 Updated by Arthur Kreymer 5 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF