Project

General

Profile

10 GigE switch

Out of the box, all the switch ports were disabled.
In additions to connecting the 10 GigE ports:
  • port 1 (of 8) to grunt1
  • port 2 (of 8) to grunt2
  • port 3 (of 8) to grunt3
  • port 4 (of 8) to grunt4
  • port 5 (of 8) to grunt5
  • port 8 (of 8) to cluck
    (leaving ports 6 and 7 unconnected),

I also connected both the serial console line (to cluck ttyS0,115200,8,N no flow control)
and the 10/100/1000 "management" port to the IPMI switch.
The IPMI switch allows access to both the IPMI subnet (192.168.77.0/24)
and the "private net" subnet (192.168.76.0/24).

The switch came with a 4 page "Quick Start Guide" which
contained, among other things, information on how to retrieve a electronic
(pdf) version of itself. I've retrieved this document and others and have
attached them to this wiki page.

The most significant commands I learned (from the quick start and from and email
exchange with interface masters tech support) are the following.
To set the IP address for the management interface:

imt# configure terminal
imt(config)# interface cpu0
imt(config-if)# ip address <ip_addr> <subnet_mask>

and to save the current config for the next power cycle/reboot:
end
write startup-config

As of 2012.09.05, the IP address for the management port is 192.168.76.100.
The management web interface can be accessed from a browser from cluck.
User root, password admin123.

I've enabled all port and configured them to handle 9000 byte MTUs. In order
to do this, I had to configure the switch for a mtu of 9017 bytes.

Additional strangeness:
I wanted to verify that the ethernet packet size is larger than 1500 bytes
(ie. 1514 (ethernet 14 byte header included)) when the interface was configured
for 1500 bytes. I saw that this was true, so it is surprising that I would
need to configure the switch for greater than 9000 bytes.

While looking at tcpdump traces, I notices packets which seemed to be combined and
eventually found out that the interface driver support for tcp segmentation offload
is enabled (which is mainly a good thing).

CLUCK (or grunt) dual ib/ethernet (10 gig ethernet) interface issue

After the series of cluck reboots (because of mother board issues), the
10 Gig ethernet port (do "ibstat" command) on the melonox card was in State: Down

/etc/sysconfig/network-scripts
11/09 11:09 cluck :^| ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.700
        Hardware version: b0
        Node GUID: 0x0002c903004cada4
        System image GUID: 0x0002c903004cada7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 12
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c903004cada5
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x0202c9fffe4cada5
                Link layer: Ethernet

and the associated port on the 10 gig switch indicate this displaying the color RED.
To get it into the "Active" state (color GREEN), I did:
/etc/sysconfig/network-scripts
11/09 11:13 cluck :^| rmmod mlx4_en
/etc/sysconfig/network-scripts
11/09 11:14 cluck :^| modprobe mlx4_en
/etc/sysconfig/network-scripts
11/09 11:14 cluck :^| ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.700
        Hardware version: b0
        Node GUID: 0x0002c903004cada4
        System image GUID: 0x0002c903004cada7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 12
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c903004cada5
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x0202c9fffe4cada5
                Link layer: Ethernet
/etc/sysconfig/network-scripts
11/09 11:14 cluck :^| 

The last part of dmesg showed:
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.3 (Jan 2011)
mlx4_en 0000:41:00.0: Using 5 tx rings for port:2
mlx4_en 0000:41:00.0: Defaulting to 16 rx rings for port:2
mlx4_en 0000:41:00.0: Activating port:2
mlx4_en: 0000:41:00.0: Port 2: Using 5 TX rings
mlx4_en: 0000:41:00.0: Port 2: Using 16 RX rings
ADDRCONF(NETDEV_UP): eth0: link is not ready
mlx4_en: eth0: Link Up
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
------------[ cut here ]------------
WARNING: at net/core/dev.c:2243 get_rps_cpu+0x140/0x390() (Tainted: P           ----------------  )
Hardware name: H8QGL
eth0 received packet on queue 6, but number of RX queues is 5
Modules linked in: mlx4_en(U) ipmi_devintf ipmi_si ipmi_msghandler fuse nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) iw_nes(U) libcrc32c iw_cxgb4(U) cxgb4(U) iw_cxgb3(U) cxgb3(U) ib_qib(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_mirror dm_region_hash dm_log uinput igb dca nvidia(P)(U) sg microcode serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp ext4 mbcache jbd2 sd_mod crc_t10dif nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video output mpt2sas scsi_transport_sas raid_class ata_generic pata_acpi pata_atiixp ahci dm_mod [last unloaded: mlx4_en]
Pid: 0, comm: swapper Tainted: P           ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81069c97>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81069d86>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff8142bfc0>] ? get_rps_cpu+0x140/0x390
 [<ffffffff8142e739>] ? netif_receive_skb+0x29/0x60
 [<ffffffffa11df959>] ? mlx4_en_process_rx_cq+0x429/0x910 [mlx4_en]
 [<ffffffff8105ec2a>] ? rebalance_domains+0x19a/0x5b0
 [<ffffffffa11dfe9b>] ? mlx4_en_poll_rx_cq+0x5b/0xe0 [mlx4_en]
 [<ffffffffa0e065f2>] ? mlx4_cq_completion+0x42/0x90 [mlx4_core]
 [<ffffffff81431013>] ? net_rx_action+0x103/0x2f0
 [<ffffffff81072291>] ? __do_softirq+0xc1/0x1d0
 [<ffffffff810d9740>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
 [<ffffffff81072075>] ? irq_exit+0x85/0x90
 [<ffffffff814f5515>] ? do_IRQ+0x75/0xf0
 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff810375eb>] ? native_safe_halt+0xb/0x10
 [<ffffffff810147dd>] ? default_idle+0x4d/0xb0
 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
 [<ffffffff814d492a>] ? rest_init+0x7a/0x80
 [<ffffffff81c1ff7b>] ? start_kernel+0x424/0x430
 [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109
---[ end trace fc2ee1629c717944 ]---

There seems to be an issue with configuring the interface on cluck.
The replies from pings to grunt1 are not received unless tcpdump is active
on cluck. tcpdump puts the port in promiscuous mode.
Perhaps yet another reboot of cluck will fix the problem?