Project

General

Profile

TipsandTricks

Remember to clean up

ifdh has an "ifdh cleanup" call, to get rid of cached certificates, files pulled in
with fetchInput, etc. If you use ifdh regularly, put an "ifdh cleanup" in your
~/.bash_logout

set TMPDIR

On interactive nodes, ifdh will default to putting its working directory in /var/tmp, which is
often not big enough to hold physics files (i.e. for fetchInput calls, etc.). So where possible do

  export TMPDIR=/some/local/disk/path

where the path in question is some noticable chunk of local (Non NFS mounted) disk.

Use file lists

You can use lists of source/destination pairs in a file to setup copies. This can be
done in a shell script like:

echo file1 dest1 > list
echo file2 dest2 >> list

ifdh cp -f list

Or you can make get a list of files with ifdh ls, and then convert that to a list of source/destination pairs:

ifdh ls /pnfs/scratch/users/mengel/stuff > list
while read file
do
    echo $file $dest/`basename $file`
done < list > pairlist
ifdh cp -f pairlist

To see what's going on...

When something is not working, there are lots of ways to see whats going on.

IFDH_DEBUG

First you can set IFDH_DEBUG to 1 in the environment, and re-rerun the command

export IFDH_DEBUG=1

IFDH_XXX_EXTRA

You can turn on debugging in the underlying tool (i.e. globus-url-copy) by using the appropriate
environment variable; so for example

export IFDH_GRIDFTP_EXTRA="-dbg"

will turn on globus-url-copy's debug flag. There are several such environment varialbles:

  • IFDH_DD_EXTRA
  • IFDH_GRIDFTP_EXTRA
  • IFDH_IRODS_EXTRA
  • IFDH_S3_EXTRA
  • IFDH_SRM_EXTRA

you generally have to check the documentation for the underlying tool to see what debug/verbose
flags there are.

Check the other end

Frequently, we are copying to or from DCache here on site, and if you get an error on a copy,
you can look at http://fndca.fnal.gov/cgi-bin/dcache_files.py and find your copy there, and get
whatever error the server side saw. However, you need to look fairly quickly, the "recent" copies
roll off after about 15 minutes or so...

Staging output

When doing computing at some remote sites, the network pipe between there and here is narrow, and
having lots of jobs copying back files at once really bogs down Dcache, as there are, say, 50 files
all open being written at 10k/second, instead of 1 file being written at 500k/sec. The best approach
is to identify a local storage element there at the site, and use the IFDH_STAGE_VIA environment
variable to request staging the output. Identify a place on that storage element where you are
allowed to keep files (at least temporarily) and set IFDH_STAGE_VIA to that url when submitting to
that site.

   jobsub_submit --site=remote-site -e IFDH_STAGE_VIA=gsiftp://server.remote.site:2811/place/to/stage ... 

Or you can set IFDH_STAGE_VIA to a conditional syntax, based on hostname, so that it will use that
staging location when at that site:
   IFDH_STAGE_VIA="*.foo.edu=>gsiftp://server1:2811/place1;;*.bar.xz=>gsiftp://server2:2811/place2;;" 

although you pretty much have to set this in your script; you can't currently pass values with equals
and greater than signs through jobsub_submit's -e argument reliably.

Similarly, you can use IFDH_STAGE_VIA to stage output destined for BlueArc via DCache, so as not to
overload the gridftp servers serving BlueArc from remote sites.

   IFDH_STAGE_VIA=gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/$EXPERIMENT/scratch/users/$USER

NOTE that for older versions of ifdh, you need to be sure to

   ifdh mkdir gsiftp://fndca1.fnal.gov/pnfs/fnal.gov/usr/$EXPERIMENT/scratch/users/$USER/ifdh_stage

(i.e. whatever you set IFDH_STAGE_VIA to point to plus "/ifdh_stage") if the directory doesn't already exist.