Bug #24641
FindMatchingFiles not handling old upper case directories
0%
Description
Loading history...
vito:house_with_garden: 12:04 PM
Hi Marc,
it looks like after dCache migration, the command
ifdh findMatchingFiles
doesn't behave well in the CI context.
With ifdhc > v2_5_2 a command like:
ifdh findMatchingFiles /pnfs/dune/persistent/stash/ContinuousIntegration/ \*202007170646\*.root
seems to go in an infinite(?) loop,
files of interest are two folders down, i.e. in /pnfs/dune/persistent/stash/ContinuousIntegration/protoDUNEsp/{list of few folders here}
If I use ifdhc v2_5_2, the command
ifdh findMatchingFiles /pnfs/dune/persistent/stash/ContinuousIntegration/* \*202007170646\*.root
(note the * after the ContinuousIntegration folder)
gives me the files I need,
while if I use a newer version of ifdh, that same command gives me each file six times.
From the log I see
running: uberftp -ls gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/dune/persistent/stash/ContinuousIntegration/*/datareco/
the * is used as it is, while with ifdhc v2_5_2 the * is expanded to existing folders
Is there something you can do?
History
#1 Updated by Marc Mengel 6 months ago
- Subject changed from FindMatchingFiles not handling * properly to FindMatchingFiles not handling old upper case directories
So it turns out the regex in the ifdh.cfg for gsiftp could misparse files/directories that
are both:
- old enough to have a year in the date
- start with a capital letter
the patch is:
457,458c457,458 < lss_re1 = ([-dl])[-rwxst]{9}\s*[^A-Z]*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*()([^/ ]*)\s*$ < lss_re2 = ([-dl])[-rwxst]{9}\s*[^A-Z]*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*(/.*/)([^/ ]*)\s*$ --- > lss_re1 = ([-dl])[-rwxst]{9}\s*.*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*()([^/ ]*)\s*$ > lss_re2 = ([-dl])[-rwxst]{9}\s*.*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*(/.*/)([^/ ]*)\s*$
SO the failure is that if you have lines like:
drwxrwxr-x 37 14191 9010 512 Dec 10 2019 old drwxrwxr-x 7 14191 9010 512 Dec 10 2019 DUNE35T
we would match the drwx... bits right, then we are supposed to skip the
intervening digits and/or usernames (with .*) then match the size digit
sequence, the date string by the leading capital letter and 11 chars;
and finally the filename after that. BUT, when the filename starts with
a capital, and is preceded by the year, as it is here, then our .* is too
greedy and we match the 2019 as the size, the filname as the date, and look for
the filename off at the end of the string, where it comes up empty.
This is easily fixed, once understood, by making the wildcard to skip stuff
we arent interested in in the middle be "[^A-Z]*" instead of ".*", so it cannot
inadvertently include the actual date.
#2 Updated by Lynn Garren 6 months ago
FYI: ifdhc_config v2_5_10 is now installed on the larsoft cvmfs.
#3 Updated by Vito Di Benedetto 6 months ago
Just tested ifdh findMatchingFiles
command mentioned in this ticket using ifdhc_config v2_5_10
and all works as expected now.
Thanks for the fix!