Project

General

Profile

Work progress Oct 4 2012

  • must reboot before ibstat works (possibly because a second subnet manager was added to the mix)

Single-node artdaq tests

To test artdaq on dsfr5 and dseb7, I (KAB) used the following steps:

  1. source /products/setup
  2. export FHICL_FILE_PATH=.
  3. mkdir work
  4. cd work
  5. setup artdaq v0_02_04 -q debug:e2
  6. daqrate 1 1 10 101 -- -c ../daqrate_gen_test.fcl

where daqrate_gen_test.fcl contained the following:

BEGIN_PROLOG
payload_size: 524288
det:
{
  generator: GenericFragmentSimulator
  payload_size: @local::payload_size
  run_number: 101
  fragments_per_event: 1
  events_to_generate: 2500
}
END_PROLOG

daq:
{
  use_art: false
  max_payload_size: @local::payload_size
  detectors: [ det0, det1, det2, det3, det4 ]
  fragments_per_source: 1000
  det0: @local::det
  det1: @local::det
  det2: @local::det
  det3: @local::det
  det4: @local::det
}

det0.starting_fragment_id: 0
det1.starting_fragment_id: 1
det2.starting_fragment_id: 2
det3.starting_fragment_id: 3
det4.starting_fragment_id: 4
These single-node tests highlighted a couple of improvements that we should make:
  • provide some sample fcl files in the distributed artdaq product (I had to dig out/create daqrate_gen_test.fcl)
  • make more of the debug timing printouts user-configurable

The only surprising result from the single node tests was that the throughput of a one detector, one source, one sink test was less on dseb7 than on dsfr5:

[biery@dsfr5 build]$ cat EventStoreEventRate_0101_0000.txt
EventStore rank 0: events processed = 1000 at 167.871 events/sec, date rate = 671.487 MB/sec, duration = 5.95695 sec
  1349374678.845: 383 events at 195.741 events/sec, data rate = 782.966 MB/sec, bin size = 1.957 sec
  1349374679.845: 198 events at 197.986 events/sec, data rate = 791.948 MB/sec, bin size = 1.000 sec
  1349374680.846: 197 events at 196.986 events/sec, data rate = 787.946 MB/sec, bin size = 1.000 sec
  1349374681.846: 198 events at 197.986 events/sec, data rate = 791.946 MB/sec, bin size = 1.000 sec
  1349374682.846: 24 events at 23.998 events/sec, data rate = 95.994 MB/sec, bin size = 1.000 sec

[biery@dseb7 build]$ cat EventStoreEventRate_0201_0000.txt
EventStore rank 0: events processed = 1000 at 112.346 events/sec, date rate = 449.386 MB/sec, duration = 8.90107 sec
  1349374737.213: 222 events at 116.815 events/sec, data rate = 467.262 MB/sec, bin size = 1.900 sec
  1349374738.213: 118 events at 117.991 events/sec, data rate = 471.966 MB/sec, bin size = 1.000 sec
  1349374739.213: 119 events at 118.991 events/sec, data rate = 475.966 MB/sec, bin size = 1.000 sec
  1349374740.213: 117 events at 116.991 events/sec, data rate = 467.968 MB/sec, bin size = 1.000 sec
  1349374741.214: 120 events at 119.991 events/sec, data rate = 479.965 MB/sec, bin size = 1.000 sec
  1349374742.214: 118 events at 117.991 events/sec, data rate = 471.966 MB/sec, bin size = 1.000 sec
  1349374743.214: 119 events at 118.991 events/sec, data rate = 475.966 MB/sec, bin size = 1.000 sec
  1349374744.214: 67 events at 66.988 events/sec, data rate = 267.953 MB/sec, bin size = 1.000 sec

Two-node artdaq testing

When I (KAB) tried a two-node test, I initially saw errors in MPI_Init.

Here are the steps that I used:

  1. source /products/setup
  2. export FHICL_FILE_PATH=.
  3. setup artdaq v0_02_04 -q debug:e2
  4. cd work
  5. daqrate 2 2 10 302 --nodes=dsfr5,dseb7 -- -c ../daqrate_gen_test.fcl

network configuration

  • turn off NetworkManager
  • /etc/rc.d/init.d/NetworkManager stop
  • chkconfig NetworkManager off
  • edit /etc/sysconfig/network-scripts/ifcfg-ib0
    BOOTPROTO="static" 
    IPADDR=192.168.176.xx
    ...
    NM_CONTROLLED="no" 
    ONBOOT="yes" 
    
  • /etc/rc.d/init.d/network restart should work, but the machines seem to hang, so just reboot
  • touch /etc/hushlogins to silence login messages
  • add these lines to /etc/security/limits.conf
    * hard memlock unlimited
    * soft memlock unlimited