Project

General

Profile

Initial daq cluster setup checklist » History » Version 10

« Previous - Version 10/40 (diff) - Next » - Current version
Geoff Savage, 01/27/2020 01:13 PM


Initial DAQ cluster setup checklist.

Objective: To reduce the number service desk tickets during the initial setup of DAQ development / production clusters.

Networking

  1. define subnets for IPMI, fnal/public and data interfaces
    • How many cables are needed? Shared IPMI/public?
    • Name interfaces by function.
  2. define host names for all network interfaces and make them consistent
    • mydaq-br01, mydaq-eb01, mydaq-ipmi-br01, mydaq-data-br01
    • the list of host names should be complete as if all hardware is available
    • put all host names into /etc/hosts and distribute it across all servers
  3. make a consistent IP address assignment across all subnets
    • use address blocks for the same server roles
    • make the last octet of an IP address being the same across all NICs of the same host
  4. configure authentication
    • Kerberos for the public interface
    • publickey for the data interface
  5. create instructions for rebooting servers using IPMI
  6. enable the 9000 MTU frames on all DAQ interfaces and networking equipment by default
    • Switch configuration by networking.
    • Just on DAQ network, not all interfaces.
    • NFS on public network - jumbo frames for performance?
  7. configure and verify that multicasting is enabled and working all networking equipment

Users

  1. define a shared user for
    • managing UPS products
    • running daq, dcs, databases
  2. add all people from the RSI group to the /root/.k5login
  3. add all known daq users to the daq and dcs shared accounts
  4. shared user profiles are not expected to have any customizations

Storage areas

  1. setup a reliable NFS server for /home, /daq/products, /daq/database, /daq/log, /daq/database, /daq/tmp,.... /data /scratch, /daq/backup
  2. reserve adequate disk space for each area
  3. create a designated scratch area for doing builds on a local NVMe derive, preferably on the fastest server
    • a faster NVMe drive such as Samsung 970 Pro or faster is preferred
  4. setup a nightly backup for /home and a weekly backup for /daq/backup areas
  5. the performance of the NFS should be monitored

Software

  1. any base software such as the OS and productivity RPMs should be identical on all servers
  2. a default list of installed software packages should not be impeding the development/ testing work, e.g. emacs, vim, mc, tmux, perf, iperf, strace, dstat,..... VNC/MATE should be installed by default
  3. implement system monitoring using ganglia
    or similar software

System Services

  1. Optional: DNS, Kerberos, NIS, Supervisord, influxdb, prometheus.
  2. Ganglia, graphite.

Geoff

  • Turn off checking of raid arrays.
  • Raid arrays must be raid 10? You lose half the disk size?
  • Do we really need hosts file?
    • If we use a hosts file we should use a script to create the file.
  • ntp from fermilab servers works well. No need for an experiment ntp server.
  • Who is in the RSI group?
  • Use Ansible to verify the settings from puppet are correct?
  • Buffer sizes in network switches
  • Database computer specs