Project

General

Profile

DarkSide-50 disk writing lessons

09-Apr-2014, Ron Rechenmacher and Kurt Biery

Over the last 4-6 weeks, there have been occasional problems in the DS-50 DAQ in which it takes a long time to close one file and open a new one. The symptom was a long period of backpressure throughout the DAQ system caused by the temporary inability to write data to disk by the disk-writing Aggregator. The problem most often occurred on Saturday mornings between 03:00 and 12:00 local time.

The problem was traced to a combination of several factors:
  • the disk-writing buffer cache was configured to be 10% of the available system memory. Since there is a considerable amount of memory on that machine (64 GB), this corresponded to a large amount of data that could be buffered in memory without actually being written to disk.
  • one or more calls to the fsync Linux system utility each time that a file was closed. These calls were being made from the ROOT file writing code. The fsync calls, coupled with the configuration of a large number of dirty buffers, was causing the system to actually write most, if not all, of the file to disk at file close time.
  • weekly RAID array Consistency Checks that were configured to start at 03:00 on Saturday mornings; they would take several hours to complete.

We are currently discussing the possibility of making the fsync call from ROOT user-selectable so that we can avoid it in our DAQ systems. We presume that this feature will be added.

Based on these findings, our recommendation(s) for computers that are used to write ROOT disk files from artdaq are the following:
  • reduce the disk cache dirty fraction from 10% to 1%. For a Data Acquisition system with 64 GB, 1% is near 650 MB or several seconds of disk writing.
In addition, our recommendation(s) for artdaq are the following:
  • if and when ROOT allows us to disable fsync system calls when files are closed, make the appropriate modifications to artdaq so that we always disable fsync calls
Lastly, our recommendation(s) for reading from the disk(s) that are used to write the ROOT files from artdaq are the following:
  • as much as possible, minimize the number of processes that are reading from the disk during the time that the weekly Consistency Checks are scheduled

Using iotop (and top)

The following command line chants were run (as root, for iotop) to collect disk read/write performance data and support system cpu usage data:

day=`date +%a`
while true;do
  while [ $day = `date +%a` ];do sleep 10;done;sleep 5
  day=`date +%a`
  iotop -btok -d10 -n`expr \( 24 \* 3600 - 300 \) / 10`|tee ~/iotop.$day.out
done

day=`date +%a`
while true;do
  while [ $day = `date +%a` ];do sleep 10;done;sleep 5
  day=`date +%a`
  top ib -d10 -n`expr \( 24 \* 3600 - 300 \) / 10`|tee ~/topib.$day.out
done

Note: the Sunday data presented was collected beginning at near 4 AM.
Note: the Raid system log says the the Consistency Check for 4/5 ended at 11:13:43 (for 3/29 ended 09:22:49); it appears that the performance degradation can persist for several minutes after the completion.


Before the Raid system logs confirmed that the Consistency Check was active, the top logs were studied and additionally
the system log on dsag and also the system log on the NFS server were also studied looking for any possible reason that the system should experience higher than usually iowait. After nothing was found, Don Holmgren was consulted and confirmed that the Raid system can experience several hours of performance degradation on a weekly basis; the logs then just needed to confirm that the weekly occurrence was on Saturday morning.

After removing fsync and decreasing the dirty_background percentage, the graphs are as follows (for just the first 16 hours of Thursday):

The IO wait for the Aggregator process are very low. Additionally, the IO wait for bbcp process reads and the data rates for both the bbcp reads and the Aggregator++ writes are as expected.

++ writes for the system flush task and journal daemon (as well as directly for the Aggregator) are included

The fact the the iowait increased and the read rate decreased at 5:00 AM suggests that he Consistency Check started at 5:00 AM.
This is what the log shows. It's a bit of a mystery how the time got changed from 3 AM to 5 AM, but this now seems to be the new
time. The log is attached as dsag_4_12_2014.txt.

Additional information from 17-19-May-2014

The RAID Consistency Check was particularly disruptive on 17-May. It ran from 05:00 to 21:32 local time (much longer than normal). The DAQ had problems writing data to disk from 10:36 to 23:12. The symptom in the DAQ was that the maximum data rate to disk was ~8 MB/s (approximately 1/3 of the needed rate, given the current trigger rate).

Here are useful commands from Ron for checking Consistency Check history:
  • RAID command line utility: /opt/MegaRAID\ Storage\ Manager/StorCLI/storcli64
  • useful shell function: storcli() { /opt/MegaRAID\ Storage\ Manager/StorCLI/storcli64 "$@"; }
  • search: storcli /c0 show termlog | egrep 'Consistency Check done|Consistency Check started'

Here are the steps that Ron used to disable the Consistency Check. (Note that the CC is a data consistency check. Since we don't leave the data files on the dsag disk for very long (they are copied to the ds50fs computer at LNGS), the CC doesn't seem to have a high value for ds50daq.)

storcli() { /opt/MegaRAID\ Storage\ Manager/StorCLI/storcli64 "$@"; }

/root
dsag :^| storcli /c0 set cc=off
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================

----------------
Ctrl_Prop Value
----------------
CC Mode   OFF
----------------

--05/19_18:53:24--
/root
dsag :^| storcli /c0 show cc
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================

-----------------------------------------------
Ctrl_Prop                 Value
-----------------------------------------------
CC Operation Mode         Disabled
CC Execution Delay        0
CC Next Starttime         07/28/2135, 02:00:00
CC Current State          Stopped
CC Number of iterations   49
CC Number of VD completed 1
CC Excluded VDs           None
-----------------------------------------------

--05/19_18:53:32--