Initial daq cluster setup checklist » History » Version 9

« Previous - Version 9/40 (diff) - Next » - Current version
Geoff Savage, 01/27/2020 01:05 PM

Initial DAQ cluster setup checklist.

Objective: To reduce the number service desk tickets during the initial setup of DAQ development / production clusters.


  1. define subnets for IPMI, fnal/public and data interfaces
  2. define host names for all network interfaces and make them consistent
    • mydaq-br01, mydaq-eb01, mydaq-ipmi-br01, mydaq-data-br01
    • the list of host names should be complete as if all hardware is available
    • put all host names into /etc/hosts and distribute it across all servers
  3. make a consistent IP address assignment across all subnets
    • use address blocks for the same server roles
    • make the last octet of an IP address being the same across all NICs of the same host
  4. configure authentication
    • Kerberos for the public interface
    • publickey for the data interface
  5. create instructions for rebooting servers using IPMI
  6. enable the 9000 MTU frames on all interfaces and networking equipment by default
  7. configure and verify that multicasting is enabled and working all networking equipment


  1. define a shared user for
    • managing UPS products
    • running daq, dcs, databases
  2. add all people from the RSI group to the /root/.k5login
  3. add all known daq users to the daq and dcs shared accounts
  4. shared user profiles are not expected to have any customizations

Storage areas

  1. setup a reliable NFS server for /home, /daq/products, /daq/database, /daq/log, /daq/database, /daq/tmp,.... /data /scratch, /daq/backup
  2. reserve adequate disk space for each area
  3. create a designated scratch area for doing builds on a local NVMe derive, preferably on the fastest server
    • a faster NVMe drive such as Samsung 970 Pro or faster is preferred
  4. setup a nightly backup for /home and a weekly backup for /daq/backup areas
  5. the performance of the NFS should be monitored


  1. any base software such as the OS and productivity RPMs should be identical on all servers
  2. a default list of installed software packages should not be impeding the development/ testing work, e.g. emacs, vim, mc, tmux, perf, iperf, strace, dstat,..... VNC/MATE should be installed by default
  3. implement system monitoring using ganglia
    or similar software

System Services

  1. Optional: DNS, Kerberos, NIS, Supervisord, influxdb, prometheus.
  2. Ganglia, graphite.


  • Turn off checking of raid arrays.
  • Raid arrays must be raid 10? You lose half the disk size?
  • Do we really need hosts file?
    • If we use a hosts file we should use a script to create the file.
  • ntp from fermilab servers works well. No need for an experiment ntp server.
  • Who is in the RSI group?
  • Use Ansible to verify the settings from puppet are correct?
  • Buffer sizes in network switches
  • Database computer specs