Intel X520 Dual SFP+ Port - 1 port missing from ESXi

I came across a strange one today with 2 x Dell R730 PowerEdge servers running VMWare ESXi 5.5  with an Intel X520 dual port adapter in each host.

I ordered these two R730 hosts with the X520 PCI NIC some time ago in preparation of eventually installing an iSCSI SAN (HPE Nimble AF1000) in place of internal storage.

When the servers were first deployed in production they were using internal storage and the Intel X520 SFP+ ports had no transceivers installed as they were not in use.

Several months down the track we invested in a nimble SAN. Knowing I had the X520 cards ready to go, I ordered 4 x HPE 10 SFP+ SR transceivers for our HPE switches and 4 x Intel 10GB SR Transceivers with the plan of connecting port1 on X520 to switch1 and port2 on the X520 to switch2.

Once ordered there was a delay with receiving the HPE 10GB transceivers and as I was keen to utilize the nimble SAN storage I used 2 x Intel 10GB SFP+ transceivers in the HPE switch and 2 in the the X520 SFP+ ports.

All was working well - just on a single 10GB data path for each ESXi server.

A couple weeks or so later the HPE 10GB SFP+ transceivers arrived so I installed them into the second port on the Intel X520 NIC's and into the HPE switch with the server still booted. All appeared to be working well with two data-paths between each of the R730 hosts and the nimble SAN.

Several months later, there was a major unplanned power outage at the datacentre affecting all customers.  After our servers booted back up they were missing vnic 4. Though, there was no issue with vnic 5.

After looking into this it turns out that if a non-Intel SFP+ transceiver is in the X520 at time of boot it does not show up in VMWare ESXi at all. The way I resolved this was to move the Intel transceivers in the switch from the first data path into the second port on the X520 and moved the HPE transceivers from the second port of X520 to the HPE switches.

Below are some troubleshooting commands that helped confirm this was the cause:

1. The below command confirms there is no vnic4 visible to ESXi after booting with non-Intel 10GB Transceiver


#esxcli network nic list

Name    PCI Device     Driver  Link  Speed  Duplex  MAC Address         MTU  Description
------  -------------  ------  ----  -----  ------  -----------------  ----  -------------------------------------------------------
vmnic0  0000:001:00.0  tg3     Up     1000  Full    18:66:da:11:00:00  1500  Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic1  0000:001:00.1  tg3     Up     1000  Full    18:66:da:10:00:01  1500  Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic2  0000:002:00.0  tg3     Up     1000  Full    18:66:da:01:00:10  1500  Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic3  0000:002:00.1  tg3     Up     1000  Full    18:66:da:00:00:11  1500  Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic5  0000:084:00.1  ixgbe   Up    10000  Full    a0:36:9f:00:01:00  9000  Intel(R) Ethernet 10G 2P X520 Adapter

2. This next command looks behind ESXi and confirms vnic4 is actually present, despite not showing up in ESXi

# lspci |grep -i netw

0000:01:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic0]
0000:01:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic1]
0000:02:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic2]
0000:02:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic3]
0000:84:00.0 Network controller: Intel Corporation Ethernet 10G 2P X520 Adapter [vmnic4]
0000:84:00.1 Network controller: Intel Corporation Ethernet 10G 2P X520 Adapter [vmnic5]

3.  The final command searches /var/log/vmkernel.log file for the word SFP. This confirms that the Intel X520 SFP+ did not load at boot due to unsupported SFP+ module (i.e. HPE 10GBE SFP+ Transceiver)


~ # less /var/log/vmkernel.log | grep -i 'SFP'
2018-08-02T00:44:12.036Z cpu7:33589)<3>ixgbe 0000:84:00.0: failed to load because an unsupported SFP+ or QSFP module type was detected.
2018-08-02T00:44:12.175Z cpu7:33589)<6>ixgbe 0000:84:00.1: 0000:84:00.1: MAC: 2, PHY: 20, SFP+: 6, PBA No: G73131-006
2018-08-02T00:44:13.723Z cpu35:33550)<6>ixgbe 0000:84:00.1: vmnic5: detected SFP+: 6
2018-08-02T00:44:25.056Z cpu15:33549)<6>ixgbe 0000:84:00.1: vmnic5: detected SFP+: 6




Comments

Popular Posts