Mu2e Pilot System » History » Version 40
« Previous -
Version 40/43
(diff) -
Next » -
Current version
Glenn Horton-Smith, 08/10/2020 11:36 AM
Add new mu2edaqxx hosts, a found TDS3054B, and proposed rackmon box
Mu2e Pilot System¶
Connecting to the Pilot System worker nodes¶
The Mu2e Pilot System worker nodes (mu2edaq04-08) are behind a NAT firewall managed by mu2edaq01. mu2edaq01 is the "gateway" node for the rest of the cluster. Access to mu2edaq01 can be granted through a Service Desk ticket with approval by the registered managers of the system.
Rebooting the Pilot System worker nodes¶
The Pilot System nodes have IPMI interfaces which are available through the cluster-internal network. The following commands will power on each of the Pilot System nodes:
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.132 chassis power on # mu2edaq04 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.133 chassis power on # mu2edaq05 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.134 chassis power on # mu2edaq06 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.135 chassis power on # mu2edaq07 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.136 chassis power on # mu2edaq08 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.138 chassis power on # mu2edaq10 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.139 chassis power on # mu2edaq11 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.140 chassis power on # mu2edaq12 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.141 chassis power on # mu2edaq13 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.142 chassis power on # mu2edaq14 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.143 chassis power on # mu2edaq15 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.144 chassis power on # mu2edaq16 ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.145 chassis power on # mu2edaq17
Network Description¶
There are three1 networks present on the Pilot System cluster.
Name | IP | Netmask | Interfaces | Notes | |
---|---|---|---|---|---|
Mu2edaq01 | Mu2edaq{04-12} | ||||
lab | 131.225.80.173 | 255.255.248.0 | eth0 | N/A | main network path to Pilot System cluster. |
ctrl | 192.168.157.0 | 255.255.255.128 | eth1 | eth0 | Slow Controls, General networking (NFS, HTTP, etc) |
mgmt/ipmi | 192.168.157.128 | 255.255.255.128 | eth1:0, IPMI | IPMI Only | IPMI interfaces (designated -ipmi to avoid confusion with other hostnames), Switch/PDU management |
data | 10.226.9.0 | 255.255.255.0 | bond0 (eth2,3) | eth1 | ARTDAQ data transmission |
The data network consists of 10 Gbps interlinks between the farm nodes (daq04-08), and 20 Gbps link-aggregated to mu2edaq01. The ctrl network is 10 Gbps to the farm nodes, and 1 Gbps to mu2edaq01 and the BeagleBone DCS data collectors. The mgmt network is 100 Mbps to the IPMI BMCs and the switch and pdu management interfaces.
mu2edaqXX-data¶
This is the primary network used for transmission of data between the BoardReader processes and the EventBuilder processes, then from the EventBuilders to the Aggregator.
If the total link bandwidth between the Aggregator and the Eventbuilders is T, then the maximum rate for 0-bias data between the BoardReaders and the EventBuilders is 0.2T in a 5-node system (1/N). The maximum link bandwidth for both event building and transmission to the Aggregator is then 0.2T In/0.4T Out per node. In our current data network, T = 20 Gbps, so each node should produce a maximum of 4 Gbps in/8 Gbps out network traffic.
mu2edaqXX-ctrl¶
This network is currently mostly unused. It serves as a "catch-all" for any data that must be collected from each node and sent elsewhere. (Ganglia metrics, DCS data from the ROCs, etc). It also hosts the BeagleBone rack monitor interfaces. This network is cross-connected between both switches present on the test stand.
mu2edaqXX-mgmt¶
This is the management network. It provides for remote management of the switches and pdus, as well as remote consoles and power control for the individual nodes through IPMI.
Other hosts in the hosts file¶
IP | Name/Notes | Description |
---|---|---|
192.168.157.201 | mlnx40g-mgmt | Mellanox 10 Gbps switch |
192.168.157.238 | tripplite-mgmt | Tripp-Lite PDU |
192.168.157.239 | netg1g-r-mgmt | Netgear 1 Gbps switch (in rack) |
192.168.1.239 | netg1g-bb-mgmt | Netgear 1 Gbps switch (BBs) |
192.168.1.20x | mu2edcsXX | DCS BeagleBone 1 |
1 four if you count the general lab network on mu2edaq01.
Network Connection Map¶
The following table summarizes all of the network connections in place at the Mu2e Pilot System test stand.
Host | Port | Switch | Switch Port | Network | IP Address |
---|---|---|---|---|---|
netg1g-r | 49F | mlnx40g | 1/1 | ctrl | 192.168.157.239 |
mu2edaq01 | eth1 | netg1g-r | 43 | ctrl (mgmt) | 192.168.157.1, 192.168.1.1, 172.24.20.1 |
eth2 | mlnx40g | 7/1 | data | 10.226.9.16 | |
eth3 | mlnx40g | 7/2 | data | ||
IPMI | netg1g-r | 2 | mgmt | 192.168.157.131 | |
mu2edaq03 | eth1 | netg1g-r | 37 | ctrl | |
eth3 | mlnx40g | 11 | WC-test | ||
IPMI | netg1g-r | 41 | mgmt | ||
mu2edaq04 | eth0 | mlnx40g | 1/3 | ctrl | 192.168.157.4 |
eth1 | mlnx40g | 7/3 | data | 10.226.9.18 | |
eth2 | mlnx40g | 12 | WC-test | ||
IPMI | netg1g-r | 33 | mgmt | 192.168.157.132 | |
mu2edaq05 | eth0 | mlnx40g | 1/4 | ctrl | 192.168.157.5 |
eth1 | mlnx40g | 7/4 | data | 10.226.9.19, 10.226.19.19 | |
IPMI | netg1g-r | 31 | mgmt | 192.168.157.133 | |
mu2edaq06 | eth0 | mlnx40g | 2/1 | ctrl | 192.168.157.6 |
eth1 | mlnx40g | 8/1 | data | 10.226.9.20, 10.226.19.20 | |
IPMI | netg1g-r | 29 | mgmt | 192.168.157.134 | |
mu2edaq07 | eth0 | mlnx40g | 2/2 | ctrl | 192.168.157.7 |
eth1 | mlnx40g | 8/2 | data | 10.226.9.21 | |
IPMI | netg1g-r | 27 | mgmt | 192.168.157.135 | |
mu2edaq08 | eth0 | mlnx40g | 2/3 | ctrl | 192.168.157.8 |
eth1 | mlnx40g | 8/3 | data | 10.226.9.22 | |
IPMI | netg1g-r | 25 | mgmt | 192.168.157.136 | |
mu2edaq10 | eth0 | mlnx40g | 3/1 | ctrl | 192.168.157.10 |
eth1 | mlnx40g | N/C | data | 10.226.9.24 | |
IPMI | netg1g-r | 26 | mgmt | 192.168.157.138 | |
mu2edaq11 | eth0 | mlnx40g | 3/2 | ctrl | 192.168.157.11 |
eth1 | mlnx40g | N/C | data | 10.226.9.25 | |
IPMI | netg1g-r | 28 | mgmt | 192.168.157.139 | |
mu2edaq12 | eth0 | mlnx40g | 3/3 | ctrl | 192.168.157.12 |
eth1 | mlnx40g | N/C | data | 10.226.9.26 | |
IPMI | netg1g-r | 30 | mgmt | 192.168.157.140 | |
mu2edaq13-17 | eth0? | mlnx40g? | ?/? | ctrl | 192.168.157.(13-17) |
eth1 | mlnx40g? | N/C? | data | 10.226.9.(27-31) | |
IPMI | netg1g-r | ? | mgmt | 192.168.157.(140-145) | |
(Tektronics TDS 3054B) | n/a | ? | ? | ctrl | 192.168.157.90 |
mu2erackmondev01 | n/a | ? | ? | ctrl | 192.168.157.101 |
**N/C = Not Connected
Link Aggregation/Channel Bonding¶
The Mellanox 10 Gbps switch supports link aggregation. The Pilot System nodes have been connected in a logical fashion to expedite this configuration.
Currently, only mu2edaq01 is configured for link aggregation, the table below describes the settings for the case where all nodes are link-aggregated.
Host | Port 1 | Port 2 | Link Aggregation Port |
mu2edaq01 | 1/7/1 | 1/7/2 | Po1 |
Mellanox Switch Configuration¶
The Mellanox switch is connected to /dev/ttyS0 on mu2edaq04. To enter its configuration interface, as root, type:
screen /dev/ttyS0 9600
Configuration of the switch is achieved through the "configure terminal" command. For example, to enable flow control on Port 1, branch A:
mlnx40g-mgmt [standalone: master] > mlnx40g-mgmt [standalone: master] > enable mlnx40g-mgmt [standalone: master] # configure terminal mlnx40g-mgmt [standalone: master] (config) # interface ethernet 1/1/1 mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol ? receive <receive(on/off)> send <send(on/off)> mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # shutdown mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol receive on mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol send on mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # no shutdown mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # exit
The switch console supports "tab-completion", and typing ? at any time will give a list of commands from that point.