Project

General

Profile

SFA Appliance Hardware and Software Configuration

This document describes the configuration of the first SFA Appliance pagg04.

Hardware.

I am skipping this section. It can be looked up in miser.

Software.

  1. ZFS on Linux. Details are provided in [[https://cdcvs.fnal.gov/redmine/projects/small-files-asggregation-zfs-server/wiki/ZFS_installation]]
  2. Enstore standard server installation.

Enstore configuration.
10 disk movers (this may change depending on performance observations).
1 Migrator with max_process = 20 (this may change depending on performance observations).

Test results.
Below are test results for I/O rates and enstore performance.

I/O tests.

From Alex e-mail:
Configuration.
There are two enclosures, each served by raid card. I use jbod.
Six SSDs are in one enclosure together with 16 HDD.
12 HHD are in other enclosure.
I use 12+12 HDD in each enclosure for the tests to have symmetric configuration.
We need to move two HDD and three SSD to another enclosure.

SSD:
I set up raidz2 with 4 data + 2 parity drives. Further I will use notation raidz2 (4+2).
Single zpool, default zfs settings. 128KB record.

HDD:
I configured single zfs pool from 24 drives (12+12).
Single pool may contain several zfs filesystems.

I checked following three configurations:
a) 4x(4+2) = one zpool consisting of 4 raidz2 arrays, (4d+2p) each.
b) 2x(10+2)
c) 1x(22+2)

Drives named like e2s11 for enclosure 2 slot 11.
For hdd I alternate enclosures/LSI cards: like e2s1 e3s1 e2s2 e3s2 ...
This is an example, slot locations are different.

SDD:
I ran iozone with various record size vs file size, for different ops (write, rewrite, read, re-read, etc.)
for 128 KB record,
writes are 560 MB/sec to 1.8 GB/sec
reads 2.3 to 5.9 GB/sec

HDD:
I ran throughput tests with record size 4KB, 16 KB, 128KB.
The number of the concurrent processes is 1 to 256, with N proc as 2^n, and few more.
The aggregate file size is about twice more than 256GB, which is twice the memory on pagg04.

Writes.
Rates are very similar for each of three raid configurations, for the record size 4KB and 16 KB.
Rates are similar for 128KB record for 1-6 processes. For larger number of processes the rate vary; I believe this is indication iozone rate measurement is not reliable here.
roughly, rates are
4K: 350-600 MB/sec for 4K IO transfers with N_proc > 4; single process 150 MB/sec.
16K: 1.5-2 GB/sec for 16 KB IO transfer (single proc: 450 MB/sec).
128K: 3. to 6. GB/sec for 128 KB IO transfers (single proc: 1.2 GB/sec).

I believe there are few factors contributing to the consistency of the measurement:
1) We may need to re-run test with larger file size. It may or may not help to iozone.
At present, a) There can be re-reads from cache. b) IO processes may starve with IO.
2) iozone uses temporary files in working directory.
For SSD tests I ran on system disk, this slows down test substantially - most of the time iozone spent on generating temp files. And system disk is connected to the same raid card (sits in the same enclosure).
For HDD test, I used runtime directory on SSD disks. This helped with runtime, but it could affected consistency of the data as SSD IO is on the same SAS bus.
I can not do anything differently at this time.

Reads.
Read rates vary a lot.
Rates: roughly 450 MB/sec (single proc) to 4 GB/sec for 128 KB record, with 2x(10+2).
I do not know the cause. It can be caching in memory effects; concurrency on SAS.

My preference for configuration at this time is 2x(10+2) raidz2 configuration for HDD.
For production configuration we need to use other 4 drives and move two down, thus we will have 14+14, then I will do 2x(12+2) raidz2.

We may take a look for larger IO record size (in case it can be used for data IO by enstore /and or tar). IO transfer size 1-4 MB can change the picture for the wide raid.

Alex.

Enstore tests.

Write tests.

There were 383738 small files written to SFA appliance during write testing 12/08 -12/10 and 12/15 - 12/17.
packaging rates:
pagg04 - ~ 1700 MB/s
disk mover rates:
pagg04 - ~ 550 MB/s

The test with checking CRC were running 12/15-12/17 and had no single failure.

Read tests.

There were 384586 small files written to SFA appliance during write testing 12/08 -12/10 and 12/15 - 12/17.
packaging rates:
pagg04 - ~ 1500 MB/s
disk mover rates:
pagg04 - ~ 550 MB/s