Project

General

Profile

HISTORY

Date Event
2009-04-30 cp1 deployed
2009-08-21 cpn deployed
2009-08-25 d0ora2 retired
2009-09-22 D0 project disks being isolated
2009-09-24 D0 project disks isolated
2009-11-19 minos files cut over to new disks
2009-11-22 minos files completely up to date on new disks

In April of 2009, Minos needed to regulate Bluearc data movement,
due to /grid/data overloads associated with Minos jobs.

None of the usual tools seem appropriate ( fcp, srmcp, SAM, ... )

The simplest route was to use the intrinsic lock management
of the Bluearc NFS server.

The initial cp1 script invoked a lock1 script,
which wrote to a single lock file /minos/data2/LOCK1

This limited us to one action at a time, was very robust,
and provided stable if suboptimal operation April 30 through Aug 21 2009.

The primary cause of the overloads turned out to be Oracle RMAN backups
running from node d0ora2 with a misconfigured Fiber Channel interface.
d0ora2 was shut down Aug 25, 2009, removing that problem.

A secondary cause was unregulated access to the D0 project areas.

But regulating Bluearc access was still a good idea.

In Aug 2009 cp1 was migrated to cpn, allowing more activity and better control.

Locks are taken by touching files in a LOCKS directory.

By default, this is under /grid/data/${GROUP}/LOCK, where GROUP is the gid.

The 'limit' file sets the upper limit on locks.

When there are too many locks,
clients put themselves in a queue by touching a file in QUEUE.

Queued clients poll longer than 'wait' seconds,
with an interval calculated to result in a global 'rate' polling rate.

Queued clients also post a udp read.

Unlocking clients send a udp message to the process at the head of the queue.

Unlocking clients move their history information under LOG.

Locks older than 'stale' minutes are ignored.

Queue entries older than 'staleq' minutes are ignored.

'lock clean' appends LOG entries to monthly files under LOGS,
and similarly moves stale locks and queues to STALE/LOCKS and STALE/QUEUE.