Project

General

Profile

Bug #24641

FindMatchingFiles not handling old upper case directories

Added by Marc Mengel 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
07/20/2020
Due date:
% Done:

0%

Estimated time:
Duration:

Description

Loading history...

vito:house_with_garden: 12:04 PM
Hi Marc,
it looks like after dCache migration, the command
ifdh findMatchingFiles
doesn't behave well in the CI context.
With ifdhc > v2_5_2 a command like:
ifdh findMatchingFiles /pnfs/dune/persistent/stash/ContinuousIntegration/ \*202007170646\*.root
seems to go in an infinite(?) loop,
files of interest are two folders down, i.e. in /pnfs/dune/persistent/stash/ContinuousIntegration/protoDUNEsp/{list of few folders here}
If I use ifdhc v2_5_2, the command
ifdh findMatchingFiles /pnfs/dune/persistent/stash/ContinuousIntegration/* \*202007170646\*.root
(note the * after the ContinuousIntegration folder)
gives me the files I need,
while if I use a newer version of ifdh, that same command gives me each file six times.
From the log I see
running: uberftp -ls gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/dune/persistent/stash/ContinuousIntegration/*/datareco/
the * is used as it is, while with ifdhc v2_5_2 the * is expanded to existing folders
Is there something you can do?

History

#1 Updated by Marc Mengel 3 months ago

  • Subject changed from FindMatchingFiles not handling * properly to FindMatchingFiles not handling old upper case directories

So it turns out the regex in the ifdh.cfg for gsiftp could misparse files/directories that
are both:

  • old enough to have a year in the date
  • start with a capital letter

the patch is:

457,458c457,458
< lss_re1 = ([-dl])[-rwxst]{9}\s*[^A-Z]*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*()([^/ ]*)\s*$
< lss_re2 = ([-dl])[-rwxst]{9}\s*[^A-Z]*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*(/.*/)([^/ ]*)\s*$
---
> lss_re1 = ([-dl])[-rwxst]{9}\s*.*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*()([^/ ]*)\s*$
> lss_re2 = ([-dl])[-rwxst]{9}\s*.*\s([0-9][0-9]*)\s\s*[A-Z].{11}\s\s*(/.*/)([^/ ]*)\s*$

SO the failure is that if you have lines like:

drwxrwxr-x  37 14191 9010 512 Dec 10  2019 old       
drwxrwxr-x   7 14191 9010 512 Dec 10  2019 DUNE35T  

we would match the drwx... bits right, then we are supposed to skip the
intervening digits and/or usernames (with .*) then match the size digit
sequence, the date string by the leading capital letter and 11 chars;
and finally the filename after that. BUT, when the filename starts with
a capital, and is preceded by the year, as it is here, then our .* is too
greedy and we match the 2019 as the size, the filname as the date, and look for
the filename off at the end of the string, where it comes up empty.

This is easily fixed, once understood, by making the wildcard to skip stuff
we arent interested in in the middle be "[^A-Z]*" instead of ".*", so it cannot
inadvertently include the actual date.

#2 Updated by Lynn Garren 3 months ago

FYI: ifdhc_config v2_5_10 is now installed on the larsoft cvmfs.

#3 Updated by Vito Di Benedetto 3 months ago

Just tested ifdh findMatchingFiles command mentioned in this ticket using ifdhc_config v2_5_10 and all works as expected now.
Thanks for the fix!

Also available in: Atom PDF