Initial daq cluster setup checklist » History » Version 5
Pengfei Ding, 01/13/2020 10:28 AM
1 | 1 | Gennadiy Lukhanin | h1. Initial DAQ cluster setup checklist. |
---|---|---|---|
2 | 1 | Gennadiy Lukhanin | |
3 | 1 | Gennadiy Lukhanin | _Objective: To reduce the number service desk tickets during the initial setup of DAQ development / production clusters._ |
4 | 1 | Gennadiy Lukhanin | |
5 | 1 | Gennadiy Lukhanin | h2. Networking |
6 | 1 | Gennadiy Lukhanin | |
7 | 1 | Gennadiy Lukhanin | # define subnets for IPMI, fnal/public and data interfaces |
8 | 1 | Gennadiy Lukhanin | # define host names for all network interfaces and make them consistent |
9 | 1 | Gennadiy Lukhanin | ** mydaq-br01, mydaq-eb01, mydaq-ipmi-br01, mydaq-data-br01 |
10 | 1 | Gennadiy Lukhanin | ** the list of host names should be complete as if all hardware is available |
11 | 3 | Ron Rechenmacher | ** put all host names into /etc/hosts and distribute it across all servers |
12 | 1 | Gennadiy Lukhanin | # make a consistent IP address assignment across all subnets |
13 | 1 | Gennadiy Lukhanin | ** use address blocks for the same server roles |
14 | 1 | Gennadiy Lukhanin | ** make the last octet of an IP address being the same across all NICs of the same host |
15 | 1 | Gennadiy Lukhanin | # configure authentication |
16 | 1 | Gennadiy Lukhanin | ** Kerberos for the public interface |
17 | 1 | Gennadiy Lukhanin | ** publickey for the data interface |
18 | 1 | Gennadiy Lukhanin | # create instructions for rebooting servers using IPMI |
19 | 1 | Gennadiy Lukhanin | # enable the 9000 MTU frames on all interfaces and networking equipment by default |
20 | 1 | Gennadiy Lukhanin | # configure and verify that multicasting is enabled and working all networking equipment |
21 | 1 | Gennadiy Lukhanin | |
22 | 1 | Gennadiy Lukhanin | h2. Users |
23 | 1 | Gennadiy Lukhanin | |
24 | 1 | Gennadiy Lukhanin | # define a shared user for |
25 | 1 | Gennadiy Lukhanin | ** managing UPS products |
26 | 1 | Gennadiy Lukhanin | ** running daq, dcs, databases |
27 | 1 | Gennadiy Lukhanin | # add all people from the RSI group to the /root/.k5login |
28 | 1 | Gennadiy Lukhanin | # add all known daq users to the daq and dcs shared accounts |
29 | 1 | Gennadiy Lukhanin | # shared user profiles are not expected to have any customizations |
30 | 1 | Gennadiy Lukhanin | |
31 | 1 | Gennadiy Lukhanin | h2. Storage areas |
32 | 1 | Gennadiy Lukhanin | |
33 | 1 | Gennadiy Lukhanin | # setup a reliable NFS server for /home, /daq/products, /daq/database, /daq/log, /daq/database, /daq/tmp,.... /data /scratch, /daq/backup |
34 | 1 | Gennadiy Lukhanin | # reserve adequate disk space for each area |
35 | 1 | Gennadiy Lukhanin | # create a designated scratch area for doing builds on a local NVMe derive, preferably on the fastest server |
36 | 1 | Gennadiy Lukhanin | ** a faster NVMe drive such as Samsung 970 Pro or faster is preferred |
37 | 1 | Gennadiy Lukhanin | # setup a nightly backup for /home and a weekly backup for /daq/backup areas |
38 | 1 | Gennadiy Lukhanin | # the performance of the NFS should be monitored |
39 | 1 | Gennadiy Lukhanin | |
40 | 1 | Gennadiy Lukhanin | h2. Software |
41 | 1 | Gennadiy Lukhanin | |
42 | 1 | Gennadiy Lukhanin | # any base software such as the OS and productivity RPMs should be identical on all servers |
43 | 1 | Gennadiy Lukhanin | # a default list of installed software packages should not be impeding the development/ testing work, e.g. emacs, vim, mc, tmux, perf, iperf, strace, dstat,..... VNC/MATE should be installed by default |
44 | 2 | Gennadiy Lukhanin | # implement system monitoring using ganglia |
45 | 2 | Gennadiy Lukhanin | or similar software |
46 | 4 | Pengfei Ding | |
47 | 5 | Pengfei Ding | h2. System Services |
48 | 4 | Pengfei Ding | |
49 | 4 | Pengfei Ding | # DNS, Kerberos, NIS? |
50 | 4 | Pengfei Ding | # Supervisord? |
51 | 4 | Pengfei Ding | # Ganglia, graphite, influxdb, prometheus, mongoDB? |