Project

General

Profile

Smarter data routing

Currently ifdh cp and ifdh mv use a very simple algorithm to guess how it should
reach a given plain path (i.e. a path without a protocol://host on the front); to wit:

  • if the path, or the directory part of the path, can be looked up with statfs() and
    statfs() doesn't say it's NFS, it's local.
  • Otherwise, we assume it's on the Bluearc, and use per-experiment gridftp servers, or
    srm: via our bestman server, to reach it.
  • always takes a CPN lock out if the lock area is visible

This does not really cover our overall categories quite properly. The behavior we Really Want
is:

  • shouldn't take CPN locks if not using NFS/gridftp to the BLuearc
  • /pnfs areas will use the appropriate dcache srm: or (dccp or possibly NFS4) if local.
  • /{$experiment,grid}/{data,data2,app,prod,fermiapp}/user areas use per-experiment gridftp on output to get
    proper ownership, even if visible via nfs
  • /{$experiment,grid}/{data,data2,app,prod,fermiapp}/ areas otherwise use srm if offsite or cpn if onsite.
  • other paths are assumed local
  • at some remote sites, we should stage data through a local SRM, but not others.

This is important for several reasons:

  • remote sites may have a /grid or /pnfs mount which is not ours; we probably shouldn't copy data
    there.
  • it is really silly to invoke gridftp to tell us that /misspelled/directory/borked does not exist.

So it seems to me we need a mapping in ifdh, and of course, this mapping should be override-able.
The mapping can at the first level be a match against the hostname of the system we're on, and
the prefix of the path, and can map to a destination prefix

New Proposal

Fancier IFDH_STAGE_VIA == becomes a sort of logical expression: -- redo make it like a shell
case statement?

 *.smu.edu=>srm://smuosgse.hpc.smu.edu:8443/srm/v2/server?SFN=/data/srm;;*=>;;

i.e hostglob=>location;;hostglob=>location;;...
or it can be a plain location for backwards compatability