Project

General

Profile

Mu2e Pilot System » History » Version 41

« Previous - Version 41/43 (diff) - Next » - Current version
Glenn Horton-Smith, 08/10/2020 03:22 PM
Add IP camera, small update to rackmon interface info


Mu2e Pilot System

Connecting to the Pilot System worker nodes

The Mu2e Pilot System worker nodes (mu2edaq04-08) are behind a NAT firewall managed by mu2edaq01. mu2edaq01 is the "gateway" node for the rest of the cluster. Access to mu2edaq01 can be granted through a Service Desk ticket with approval by the registered managers of the system.

Rebooting the Pilot System worker nodes

The Pilot System nodes have IPMI interfaces which are available through the cluster-internal network. The following commands will power on each of the Pilot System nodes:

ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.132 chassis power on # mu2edaq04
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.133 chassis power on # mu2edaq05
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.134 chassis power on # mu2edaq06
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.135 chassis power on # mu2edaq07
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.136 chassis power on # mu2edaq08
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.138 chassis power on # mu2edaq10
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.139 chassis power on # mu2edaq11
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.140 chassis power on # mu2edaq12
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.141 chassis power on # mu2edaq13
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.142 chassis power on # mu2edaq14
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.143 chassis power on # mu2edaq15
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.144 chassis power on # mu2edaq16
ipmitool -I lanplus -UADMIN -PADMIN -H 192.168.157.145 chassis power on # mu2edaq17

Network Description

There are three1 networks present on the Pilot System cluster.

Name IP Netmask Interfaces Notes
Mu2edaq01 Mu2edaq{04-12}
lab 131.225.80.173 255.255.248.0 eth0 N/A main network path to Pilot System cluster.
ctrl 192.168.157.0 255.255.255.128 eth1 eth0 Slow Controls, General networking (NFS, HTTP, etc)
mgmt/ipmi 192.168.157.128 255.255.255.128 eth1:0, IPMI IPMI Only IPMI interfaces (designated -ipmi to avoid confusion with other hostnames), Switch/PDU management
data 10.226.9.0 255.255.255.0 bond0 (eth2,3) eth1 ARTDAQ data transmission

The data network consists of 10 Gbps interlinks between the farm nodes (daq04-08), and 20 Gbps link-aggregated to mu2edaq01. The ctrl network is 10 Gbps to the farm nodes, and 1 Gbps to mu2edaq01 and the BeagleBone DCS data collectors. The mgmt network is 100 Mbps to the IPMI BMCs and the switch and pdu management interfaces.

mu2edaqXX-data

This is the primary network used for transmission of data between the BoardReader processes and the EventBuilder processes, then from the EventBuilders to the Aggregator.
If the total link bandwidth between the Aggregator and the Eventbuilders is T, then the maximum rate for 0-bias data between the BoardReaders and the EventBuilders is 0.2T in a 5-node system (1/N). The maximum link bandwidth for both event building and transmission to the Aggregator is then 0.2T In/0.4T Out per node. In our current data network, T = 20 Gbps, so each node should produce a maximum of 4 Gbps in/8 Gbps out network traffic.

mu2edaqXX-ctrl

This network is currently mostly unused. It serves as a "catch-all" for any data that must be collected from each node and sent elsewhere. (Ganglia metrics, DCS data from the ROCs, etc). It also hosts the BeagleBone rack monitor interfaces. This network is cross-connected between both switches present on the test stand.

mu2edaqXX-mgmt

This is the management network. It provides for remote management of the switches and pdus, as well as remote consoles and power control for the individual nodes through IPMI.

Other hosts in the hosts file

IP Name/Notes Description
192.168.157.201 mlnx40g-mgmt Mellanox 10 Gbps switch
192.168.157.238 tripplite-mgmt Tripp-Lite PDU
192.168.157.239 netg1g-r-mgmt Netgear 1 Gbps switch (in rack)
192.168.1.239 netg1g-bb-mgmt Netgear 1 Gbps switch (BBs)
192.168.1.20x mu2edcsXX DCS BeagleBone 1

1 four if you count the general lab network on mu2edaq01.

Network Connection Map

The following table summarizes all of the network connections in place at the Mu2e Pilot System test stand.

Host Port Switch Switch Port Network IP Address
netg1g-r 49F mlnx40g 1/1 ctrl 192.168.157.239
mu2edaq01 eth1 netg1g-r 43 ctrl (mgmt) 192.168.157.1, 192.168.1.1, 172.24.20.1
eth2 mlnx40g 7/1 data 10.226.9.16
eth3 mlnx40g 7/2 data
IPMI netg1g-r 2 mgmt 192.168.157.131
mu2edaq03 eth1 netg1g-r 37 ctrl
eth3 mlnx40g 11 WC-test
IPMI netg1g-r 41 mgmt
mu2edaq04 eth0 mlnx40g 1/3 ctrl 192.168.157.4
eth1 mlnx40g 7/3 data 10.226.9.18
eth2 mlnx40g 12 WC-test
IPMI netg1g-r 33 mgmt 192.168.157.132
mu2edaq05 eth0 mlnx40g 1/4 ctrl 192.168.157.5
eth1 mlnx40g 7/4 data 10.226.9.19, 10.226.19.19
IPMI netg1g-r 31 mgmt 192.168.157.133
mu2edaq06 eth0 mlnx40g 2/1 ctrl 192.168.157.6
eth1 mlnx40g 8/1 data 10.226.9.20, 10.226.19.20
IPMI netg1g-r 29 mgmt 192.168.157.134
mu2edaq07 eth0 mlnx40g 2/2 ctrl 192.168.157.7
eth1 mlnx40g 8/2 data 10.226.9.21
IPMI netg1g-r 27 mgmt 192.168.157.135
mu2edaq08 eth0 mlnx40g 2/3 ctrl 192.168.157.8
eth1 mlnx40g 8/3 data 10.226.9.22
IPMI netg1g-r 25 mgmt 192.168.157.136
mu2edaq10 eth0 mlnx40g 3/1 ctrl 192.168.157.10
eth1 mlnx40g N/C data 10.226.9.24
IPMI netg1g-r 26 mgmt 192.168.157.138
mu2edaq11 eth0 mlnx40g 3/2 ctrl 192.168.157.11
eth1 mlnx40g N/C data 10.226.9.25
IPMI netg1g-r 28 mgmt 192.168.157.139
mu2edaq12 eth0 mlnx40g 3/3 ctrl 192.168.157.12
eth1 mlnx40g N/C data 10.226.9.26
IPMI netg1g-r 30 mgmt 192.168.157.140
mu2edaq13-17 eth0? mlnx40g? ?/? ctrl 192.168.157.(13-17)
eth1 mlnx40g? N/C? data 10.226.9.(27-31)
IPMI netg1g-r ? mgmt 192.168.157.(140-145)
(Tektronics TDS 3054B) n/a ? ? ctrl 192.168.157.90
(IP camera) n/a ? ? ctrl 192.168.157.91
mu2erackmondev01 eth0 ? ? ctrl 192.168.157.101

**N/C = Not Connected

Link Aggregation/Channel Bonding

The Mellanox 10 Gbps switch supports link aggregation. The Pilot System nodes have been connected in a logical fashion to expedite this configuration.
Currently, only mu2edaq01 is configured for link aggregation, the table below describes the settings for the case where all nodes are link-aggregated.

Host Port 1 Port 2 Link Aggregation Port
mu2edaq01 1/7/1 1/7/2 Po1

Mellanox Switch Configuration

The Mellanox switch is connected to /dev/ttyS0 on mu2edaq04. To enter its configuration interface, as root, type:

screen /dev/ttyS0 9600

Configuration of the switch is achieved through the "configure terminal" command. For example, to enable flow control on Port 1, branch A:

mlnx40g-mgmt [standalone: master] >
mlnx40g-mgmt [standalone: master] > enable
mlnx40g-mgmt [standalone: master] # configure terminal
mlnx40g-mgmt [standalone: master] (config) # interface ethernet 1/1/1
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol ?
receive                        <receive(on/off)>
send                           <send(on/off)>
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # shutdown
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol receive on
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # flowcontrol send on
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # no shutdown
mlnx40g-mgmt [standalone: master] (config interface ethernet 1/1/1) # exit

The switch console supports "tab-completion", and typing ? at any time will give a list of commands from that point.