Project

General

Profile

BLUELOAD
   2010 04 30  kreymer@fnal.gov

   Preliminary Bluearc load testing

    I have performed some preliminary Bluearc load testing of
    /minos/data2, which is exported from blue2:/minos/data.

    This is about 67 TBytes of HDS disk.

    The initial goal is to determine level at which clients
    can read data without creating global slowdowns.
    We are concerned about slowdowns to under 1 MByte/sec.

    Global slowdowns are measured by a script running on minos25
    reading a fresh 100 MB file every couple of minutes,
    logging data rates in files like
http://www-numi.fnal.gov/computing/dh/bluwatch/rate/2010/04/30/minos25.txt

    Rates are plotted under 
http://www-numi.fnal.gov/computing/dh/bluearc/rates.html
    Date rates for /minos/app vary widely, probably file fragmentation.
    Data rates for /minos/data2 are consistently around 60 MBytes/sec.

    Major slowdowns correlate with Bluearc internal latency increases.
    Normal 5 msec latencies increase to over 10 during slowdowns.
    http://www-numi.fnal.gov/computing/dh/bluearc/2009060313.Latency.png

    Unscheduled user loads of 50 heavy reads running on GPFarm nodes
    have produced slowdowns to around 20 MB/sec, which is not severe.

    Minos uses to cpn script to regulate Bluearc traffic.
    The access limit is set to 20 ( April 30 2010 ),
    and has been set as high as 40 for weeks, with no ill effect.

                T E S T S

    In recent tests, a controlled number of clients have read files
    from /minos/data2, while monitoring latencies and minos25 rates.
    The cpn locking script was used to control the data rates.
    Reads were performed from /minos/data2 to /dev/null.
    The testing script is /grid/fermiapp/minos/scripts/munch

    I list the number of clients reading data files from /minos/data2,
    the net rate seen by those clients, and the minos25 read rate (MB/sec)

    As requested by Fermigrid/FEF, 
    we have no run more than 200 clients on Fermigrid nodes
    That test will be scheduled during a Fermigrid downtime.

    All tests involved reading 1000 files of 100 MBytes, net 100 GB.    

Limit  Rate  Minos25  Notes

  60          30    Apr  1 18:00 Fermigrid - unscheduled user load test

  10   160    30    Apr  9 22:08 Fermigrid
   5   200    45 ?  Apr  9 10:53 Fermigrid
  20   180    55 ?  Apr  9 14:40 Fermigrid

 100   200    22    Apr 17 15:45 minos50-53
 200   200    20    Apr 17 16:21 minos50-53 and minos17-22, 60 open files seen
 200   200    24    Apr 17 16:41 minos50-53 and minos17-22, 60 open files seen
 200   160    21    Apr 17 17:18 minos50-53, 17-22, 200 open files, improved munch

   Tests from Fermigrid worker nodes ( D0 ), to /grid/data,
   monitored by minos27

 Limit  Rate  Minoss7  Notes

  20     555       40  May  4 14:40  Fermigrid D0
 200     360        4  May  4 14:50  Fermigrid D0  some clients hung 40 minutes