VMware vCenter High Availability is a very interesting feature included in vSphere 6.5. Generally, it provides higher availability of vCenter service by having three vCenter nodes (active/passive/witness) all serving the single vCenter service.
This is written in the official vCenter HA documentation
In this particular concept, I have embedded PSC controllers because of simplicity and vCenter HA can increase availability even of PSC services. The most interesting challenge in this concept is networking so let's look into the intended network logical design.
Networking logical design:
This is written in the official vCenter HA documentation
vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance.The last sentence is very true. The simplest VCHA deployment is within the same SSO domain and within the single datacenter with two Layer2 networks, one for management and second for the heartbeat. Such design can be deployed in fully automated manner and you just need to provide dedicated network (portgroup/VLAN) for the heartbeat network and use 3 IP addresses from separated heartbeat subnet. Easy. But is it what you are expecting from vCenter HA? To be honest, the much more attractive use case is to spread vCenter HA nodes across three datacenters to keep vSphere management up and running even one of two datacenters experiences some issue. Conceptually it is depicted in the figure below.
After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available.
Conceptual vCenter HA Design |
vCenter HA - networking logical design |
- Each vCenter Server Appliance node has two NICs
- One NIC is connected to management network and second NIC to heartbeat network
- Layer 2 Management network (VLAN 4) is stretched across datacenters A and B because vCenter IP address must work without human intervention in datacenter B after VCHA fail-over.
- In each datacenter we have independent heartbeat network (VCHA-HB-A, VCHA-HB-B, VCHA-HB-C) with different IP subnets to not stretch Layer 2 across datacenters, especially not to datacenter C where is the witness. This requires specific static routes in each vCenter Server Appliance node to have IP reachability over heartbeat network.
- Specific VCHA network tcp/udp ports must be allowed among VCHA nodes across a heartbeat network.
Helpful documents:
- FAQ: vCenter High Availability (2148003) - https://kb.vmware.com/s/article/2148003
- Deploying vCenter High Availability with network addresses in separate subnets (2148442) - https://kb.vmware.com/kb/2148442
- Required ports for VCHA 6.5 (52835) - https://kb.vmware.com/kb/52835
- Errors "Failed to ssh connect peer node x.x.x.x" and "sshConnect Authentication (publickey) failed" while configuring vCenter HA (2150715) - https://kb.vmware.com/kb/2150715
Implementation Notes:
Note 1:
VMware KB 2148442 (Deploying vCenter High Availability with network addresses in separate subnets) is very important to deploy such design but one information is missing there. After cloning of vCenter Server Appliances, you have to go to passive node and configure on eth0 the same IP address you use in active node. Configuration is in file /etc/systemd/network/10-eth0.network.manual
In case of badly destroyed VCHA cluster use following commands to destroy VCHA from the command line
cd /etc/systemd/network
mv 10-eth0.network.manual 20-eth0.networkdestroy-vchareboot
Note 3:
In case, you will see the error message MethodFault.summary error during the finalization process it is because a hostname mismatch is detected. The hostname assigned to the Passive node must be the same as the hostname of the Active node. The solution was found at https://www.altaro.com/vmware/how-to-deploy-a-vcenter-ha-cluster-part-2/ but also written in KB https://kb.vmware.com/kb/2148442