VCDX #200 The Ultimate Way to VirtualizeBlog of one VMware Infrastructure Designer: March 2018

Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

VMware vCenter High Availability is a very interesting feature included in vSphere 6.5. Generally, it provides higher availability of vCenter service by having three vCenter nodes (active/passive/witness) all serving the single vCenter service.

This is written in the official vCenter HA documentation

vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance.
After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available.

The last sentence is very true. The simplest VCHA deployment is within the same SSO domain and within the single datacenter with two Layer2 networks, one for management and second for the heartbeat. Such design can be deployed in fully automated manner and you just need to provide dedicated network (portgroup/VLAN) for the heartbeat network and use 3 IP addresses from separated heartbeat subnet. Easy. But is it what you are expecting from vCenter HA? To be honest, the much more attractive use case is to spread vCenter HA nodes across three datacenters to keep vSphere management up and running even one of two datacenters experiences some issue. Conceptually it is depicted in the figure below.

Conceptual vCenter HA Design

In this particular concept, I have embedded PSC controllers because of simplicity and vCenter HA can increase availability even of PSC services. The most interesting challenge in this concept is networking so let's look into the intended network logical design.

vCenter HA - networking logical design

Networking logical design:

Each vCenter Server Appliance node has two NICs
One NIC is connected to management network and second NIC to heartbeat network
Layer 2 Management network (VLAN 4) is stretched across datacenters A and B because vCenter IP address must work without human intervention in datacenter B after VCHA fail-over.
In each datacenter we have independent heartbeat network (VCHA-HB-A, VCHA-HB-B, VCHA-HB-C) with different IP subnets to not stretch Layer 2 across datacenters, especially not to datacenter C where is the witness. This requires specific static routes in each vCenter Server Appliance node to have IP reachability over heartbeat network.
Specific VCHA network tcp/udp ports must be allowed among VCHA nodes across a heartbeat network.

Helpful documents:

FAQ: vCenter High Availability (2148003) - https://kb.vmware.com/s/article/2148003
Deploying vCenter High Availability with network addresses in separate subnets (2148442) - https://kb.vmware.com/kb/2148442
Required ports for VCHA 6.5 (52835) - https://kb.vmware.com/kb/52835
Errors "Failed to ssh connect peer node x.x.x.x" and "sshConnect Authentication (publickey) failed" while configuring vCenter HA (2150715) - https://kb.vmware.com/kb/2150715

Implementation Notes:

Note 1:

VMware KB 2148442 (Deploying vCenter High Availability with network addresses in separate subnets) is very important to deploy such design but one information is missing there. After cloning of vCenter Server Appliances, you have to go to passive node and configure on eth0 the same IP address you use in active node. Configuration is in file /etc/systemd/network/10-eth0.network.manual

Note 2:

In case of badly destroyed VCHA cluster use following commands to destroy VCHA from the command line

cd /etc/systemd/network

mv 10-eth0.network.manual 20-eth0.networkdestroy-vchareboot

The solution was found at https://communities.vmware.com/thread/552084

Link to the official documentation (Resolving Failover Failures) - https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.avail.doc/GUID-FE5106A8-5FE7-4C38-91AA-D7140944002D.html

Note 3:

In case, you will see the error message MethodFault.summary error during the finalization process it is because a hostname mismatch is detected. The hostname assigned to the Passive node must be the same as the hostname of the Active node. The solution was found at https://www.altaro.com/vmware/how-to-deploy-a-vcenter-ha-cluster-part-2/ but also written in KB https://kb.vmware.com/kb/2148442

Friday, March 09, 2018

How to check I/O device on VMware HCL

VMware has Hardware Compatibility List of supported I/O devices is available here
https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io

VMware HCL for I/O devices

The best identification of I/O device is VID (Vendor ID), DID (Device ID), SVID (Sub-Vendor ID), SSID (Sub-Device ID). VID, DID, SVID and SSID can be simply entered into VMware HCL and you will find if it is supported and what capabilities have been tested. You can also find supported firmware and driver.

The get these identifiers you have to log in to ESXi via SSH and use command "vmkchdev -l". This command shows VID:DID SVID:SSID for PCI devices and you can use grep to filter just VMware NICs (aka vmnic)

vmkchdev -l | grep vmnic

You should get similar output

[dpasek@esx01:~] vmkchdev -l | grep vmnic

0000:02:00.0 14e4:1657 103c:22be vmkernel vmnic0

0000:02:00.1 14e4:1657 103c:22be vmkernel vmnic1

0000:02:00.2 14e4:1657 103c:22be vmkernel vmnic2

0000:02:00.3 14e4:1657 103c:22be vmkernel vmnic3

0000:05:00.0 14e4:168e 103c:339d vmkernel vmnic4

0000:05:00.1 14e4:168e 103c:339d vmkernel vmnic5

0000:88:00.0 14e4:168e 103c:339d vmkernel vmnic6

0000:88:00.1 14e4:168e 103c:339d vmkernel vmnic7

So, in case of vmknic4 there is
· VID:DID SVID:SSID
· 14e4:168e 103c:339d

The same applies to HBAs and disk controllers. For HBA and local disk controllers use

vmkchdev -l | grep vmhba

This is the output from my Intel NUC at my home lab

[root@esx02:~] vmkchdev -l | more

0000:00:00.0 8086:0a04 8086:2054 vmkernel

0000:00:02.0 8086:0a26 8086:2054 vmkernel

0000:00:03.0 8086:0a0c 8086:2054 vmkernel

0000:00:14.0 8086:9c31 8086:2054 vmkernel vmhba32

0000:00:16.0 8086:9c3a 8086:2054 vmkernel

0000:00:19.0 8086:1559 8086:2054 vmkernel vmnic0

0000:00:1b.0 8086:9c20 8086:2054 vmkernel

0000:00:1d.0 8086:9c26 8086:2054 vmkernel

0000:00:1f.0 8086:9c43 8086:2054 vmkernel

0000:00:1f.2 8086:9c03 8086:2054 vmkernel vmhba0

0000:00:1f.3 8086:9c22 8086:2054 vmkernel

vmhba0 is local disk controller
vmhba32 is USB storage controller
vmnic0 is network interface

Hope this helps.

VCDX #200 The Ultimate Way to Virtualize
Blog of one VMware Infrastructure Designer

Pages

Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

Friday, March 09, 2018

How to check I/O device on VMware HCL

Pages

Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

Friday, March 09, 2018

How to check I/O device on VMware HCL

Subscribe To