Project

General

Profile

GRID access to large supporting data files

> Here is some info on how these files are used.
> This set of files is approximately 3 GB and is a static "library" of
> events that are used in our primary electron neutrino selection algorithm.
> These library events are looped over sequentially for the duration of each
> grid job.  We have something on the order of 200k jobs that would be run
> over the coming weeks and months.  The jobs individually range from 2 to
> 20 hours each, approximately, with a typical job probably around 6 hours
> in duration.  We typically run 500 to 1000 of these jobs simultaneously.
> The job scripts operate as follows:
> (1) Check if a complete local copy of the 3 GB "library" is already
> present on the worker node.
> (2) If not, copy the full library over (From: MINOS Bluearc space.  To:
> /local/stage1/minosgli/ .  The latter could be anywhere on a local
> disk.)
> (3) Begin physics processing.
> Copying the library over for each individual job was untenable due to
> Bluearc throughput limitations.  With the local cache, only the first job
> to hit each worker node has to do any copying.  Then, the next week or
> more of additional grid jobs go immediately into event processing.
> Historically, the cache seemed to get cleaned out on some interval, but
> the scripts handled that fine by simply copying over a fresh version when
> required.
> In the upcoming running, we'll actually have two classes of jobs, each
> requiring its own library, so the cache would become 6 GB instead of 3 GB.
> Let me know if I can provide any additional info.
> Ryan