I have received following message in to my mailbox ...
Force10 is the legacy product name of DELL S-series datacenter networking. Force10 S4810's are datacenter L3 switches. If you don't know what Force10 VLT is look here. Generally it is something like CISCO virtual Port Channel (vPC), Juniper MC-LAG, Arista MLAG, etc.
I think my answer can be valuable for broader networking and virtualization community so here it is ...
Hi.
I have a customer that has been testing Force10 VLT with peer routing and VMWare and has encountered the warning message on all hosts during failover of the switches (S4810’s) only when the primary VLT node is failed
“vSphere HA Agent, on this host couldn’t not reach isolation address 10.100.0.1”
Does this impact HA at all? Is there a solution?
Thanks
Paul
Force10 is the legacy product name of DELL S-series datacenter networking. Force10 S4810's are datacenter L3 switches. If you don't know what Force10 VLT is look here. Generally it is something like CISCO virtual Port Channel (vPC), Juniper MC-LAG, Arista MLAG, etc.
I think my answer can be valuable for broader networking and virtualization community so here it is ...
First of all let’s make some assumptions:
- Force10 VLT is used for multi chassis LAG capability
- Force10 VLT peer routing is enabled in VLT domain to achieve L3 routing redundancy
- 10.100.0.1 is IP address of VLAN interface on Force10 S4810 (primary VLT node) and this particular VLAN is used for vSphere management.
- 10.100.0.2 is IP address on Force10 S4810 - secondary VLT node.
- vSphere 5.x and above is used.
Root cause with explanation:
When primary Force10 VLT node is
down then ping to 10.100.0.1 doesn’t work because peer-routing is ARP proxy on
L2. Secondary node will route L2 traffic on behalf of primary node but
10.100.0.1 doesn’t answer on L3 therefore ICMP doesn’t work.
VMware (vSphere 5 and above) HA
Cluster use network and storage heartbeat mechanism. Network mechanism use two probe algorithms listed below.
- ESXi hosts in the cluster are sending heartbeat beacon to each other. This should work ok during primary VLT node failure.
- ESXi hosts are also pinging HA isolation addresses (Default HA isolation address is default gateway therefore 10.100.0.1 in your particular case). This doesn’t work during primary VLT node failure.
That’s the reason VMware HA
Cluster will log about this situation.
Is there any impact?
There is no impact on HA Cluster
because
- It is just informative message because algorithm (1) works correctly and there is still network visibility among ESXi hosts in the cluster.
- From vSphere 5 and above there is also storage heartbeat mechanism which can eliminate network invisibility among ESXi host in the cluster.
Are there any potential
improvements?
Yes they are. You can configure
multiple HA Isolation Addresses to mitigate default gateway unavailability. In
your particular case I would recommend to use two IP addresses (10.100.0.1 and 10.100.0.2) because at least one VLT
node will be always available.
For more information how to configure multiple HA isolation addresses look at http://kb.vmware.com/kb/1002117
No comments:
Post a Comment