Duplicate files on tape.
In migrating form LTO4 to LTO8,
a small fraction of duplicate files were detected.
We should investigate the origins of this.
#1 Updated by Arthur Kreymer 4 months ago
Date: Tue, 16 Apr 2019 16:16:45 -0500
From: Bo Jayatilaka <email@example.com>
To: Jiyeon Han <firstname.lastname@example.org>, Jorge Chaves <email@example.com>, Arthur E Kreymer <firstname.lastname@example.org>
Subject: Multiple copy files on tape for MINOS
In migrating MINOS data off of LTO4 tapes we've encountered files that have multiple copies on tape.
If you look at the summary table here:
there are 6 MINOS families where a small fraction of files have multiple copies on tape. As none of these are vault data, is it
possible these second copies were made by mistake?
media_type | storage_group | file_family | original_files | duplicated_files | non_duplicated_files
LTO4 | minos | fardet_data | 135961 | 2 | 135959
LTO4 | minos | reco_far_cedar_phy_bhcurv_sntp | 5556 | 293 | 5263
LTO4 | minos | reco_far_R1_18_4 | 28917 | 7971 | 20946
LTO4 | minos | reco_far_R1_24b | 16102 | 48 | 16054
LTO4 | minos | reco_mc_near_cedar | 20279 | 998 | 19281
LTO4 | minos | reco_near_cedar_sntp | 19218 | 99 | 19119
#3 Updated by Arthur Kreymer 4 months ago
- % Done changed from 10 to 20
I have scanned for fardet_data duplicates, using the Complete File Listing
and cached under /minos/data/web/computing/dh/dcache/CFL/
The latest listing, filtered to remove MIGRATION files, is CFLnom.
FARDET-det is a sorted list of files under /pnfs/minos/fardet_data
FARDET-mcout is a sorted list of 204 files under /pnfs/minos/mcout_data
which were assigned to the fardet_data family in error.
I see no duplicate file names.
I have checked for duplicate content by sorting file sizes and crc's in
Unique sizes/crcs are in FARDET-SIZCRCsu
Ten of the FARDET-SIZCRCs entries are not unique
The CFL.nom entries for these are listed in FARDET-DUPS.
In all cases it seems a file has been written to the same Volume twice,
with the same PNFS path. This is not something Minos controls
There are only three volumes involved: VON475 , PSF328, VP6336.
is this the issue at hand, or should be be looking for some other
sort of duplicate files ?
#5 Updated by Arthur Kreymer 4 months ago
- % Done changed from 20 to 90
- Status changed from Assigned to Work in progress
Because the copies on tape are identical, and were not produced intentionally,
it should be fine to pick the copy you prefer for migration to new media.
We have extra copies of fardet_data files,
and the other file families are not active.