Bug #23167

Feature #23741: Improvements and clarifications for using XRootD

match art/root behavior to dCache

Added by Raymond Culbertson over 1 year ago. Updated over 1 year ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


We are having trouble with xrootd and the solution may be
partailly in art. In observations of log files and discussion with
dCache experts, we have three cases:
  1. the file is in tape-backed dCache and not on disk at the moment
    of the request. In this case, dCache returns via xrootd a code that
    that indicates this state. They say a user should wait and retry,
    but we saw that root/art aborts immediately. (ifdh and nfs block,
    so it isn't an issue there.)
  2. if a server is overloaded and the request goes in a dCache
    queue, after 30s, it will return an error (see below). In this case the
    right thing to do is retry for a while.
  3. there are transient errors (we've see mysterious DNS errors),
    and there should be retries.

In previous discussions with Kyle and Philippe, we had concluded
that root should currently be configured to retry many times
for perhaps an hour. If I understood, this does not happen
because art catches the error and treats all non-info messages as fatal.

As I recall, the error returned in the case of the file on tape
and not staged, the return code was special, "resource not available",
so it could be recognized and treated properly - wait and retry. Ideally
xrootd would just block.

The error returned in the overloaded case is essentially "file not found"
(see below) so it is not distinguishable from an actual missing file.
Ideally we can get those separated so we can treat them differently
and more ideally, it would block for longer than 30s.

The transient error case should be handled with at least a few retires.

22-Aug-2019 18:04:35 UTC  Initiating request to open input file 

%MSG-s ArtException:  PostEndJob 22-Aug-2019 18:05:01 UTC ModuleEndJob
cet::exception caught in art
---- OtherArt BEGIN
---- FileOpenError BEGIN
RootInputFileSequence::initFile(): Input file 
workflow/MDC2018_DS-cosmic-mix_i_0/good/22585566.00/00/00145/ was not found or could not be opened.


#1 Updated by Kyle Knoepfel over 1 year ago

  • Status changed from New to Feedback

This will take some investigation. How does this relate to issue #21638?

#2 Updated by Kyle Knoepfel over 1 year ago

  • Scope deleted (Internal)
  • Parent task set to #23741
  • Project changed from cet-is to art_root_io
  • Tracker changed from Support to Bug
  • SSI Package deleted (art)

Also available in: Atom PDF