I work as VMware HCI Specialist, therefore I have to do a lot of vSAN testing and demonstrations in my home lab. The only reasonable way how to effectively test and demonstrate different vSAN configurations and topologies is to run vSAN in a nested environment. Thanks to a nested virtualization, I can very easily and quickly build any type of vSAN cluster.
Recently I have experienced the issue in 3-node (nested) vSAN cluster. I have seen vSAN datastore capacity just of a single node instead of three nodes and on hosts was an error message "Host cannot communicate with one or more other nodes in the vSAN enabled cluster".
The first idea was about networking issue but ping between nodes was working ok so it was not a physical network issue. This is the lab environment so all services (mgmt, vMotion, vSAN) are enabled on single VMKNIC (vmknic0) so everything is pretty straight forward.
So what's the problem?
I did some google searching and found that some people were seeing the same error message when experiencing problems with vSAN unicast agents.
Here is the command to list of unicast agents on vSAN node
esxcli vsan cluster unicastagent list
I test it in my environment.
Grrrr. The list is empty!!!! It is empty on all ESXi hosts in my 3 nodes vSAN cluster.
Let's try to configure it manually.
Each vSAN node should have a connection to agents on other vSAN nodes in the cluster.
For example, one vSAN node from 4-node vSAN Cluster should have 3 connections
We need the get local UUID of the cluster node.
So here are my nodes
n-esx08 - 192.168.11.108 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx09 - 192.168.11.109 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx10 - 192.168.11.110 - 5df792b0-f49f-6d76-45af-005056a89963
And now the problem is clear. All vSAN nodes have the same UUID.
Why? Let's check ESXi system UUIDs on each ESXi host.
Note: if you want to check UUID of all ESXi hosts, use following PowerCLI
So the root cause is obvious.
You can do it from command line as well
sed -i 's/system\/uuid.*//' /etc/vmware/esx.conf
reboot
So we have identified the problem and we are done. After ESXi hosts restart vSAN Cluster Nodes UUIDs are changed automatically and vSAN unicastagents are automatically configured on vSAN nodes as well.
However, if you are interested in how to manually add a connection to a unicast agent on a particular node, you would execute the following command
esxcli vsan cluster unicastagent add –a [ip address unicast agent] –U [supports unicast] –u [Local UUID] -t [type]
Anyway, such a manual configuration should not be necessary and you should do it only when instructed by VMware support.
Hope this helps someone else in VMware community.
Recently I have experienced the issue in 3-node (nested) vSAN cluster. I have seen vSAN datastore capacity just of a single node instead of three nodes and on hosts was an error message "Host cannot communicate with one or more other nodes in the vSAN enabled cluster".
The first idea was about networking issue but ping between nodes was working ok so it was not a physical network issue. This is the lab environment so all services (mgmt, vMotion, vSAN) are enabled on single VMKNIC (vmknic0) so everything is pretty straight forward.
So what's the problem?
I did some google searching and found that some people were seeing the same error message when experiencing problems with vSAN unicast agents.
Here is the command to list of unicast agents on vSAN node
esxcli vsan cluster unicastagent list
I test it in my environment.
Grrrr. The list is empty!!!! It is empty on all ESXi hosts in my 3 nodes vSAN cluster.
Let's try to configure it manually.
Each vSAN node should have a connection to agents on other vSAN nodes in the cluster.
For example, one vSAN node from 4-node vSAN Cluster should have 3 connections
[root@n-esx04:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint
------------------------------------ --------- ---------------- -------------- ----- ---------- -----------------------------------------------------------
5e3ec640-c033-7c7d-888f-00505692f54d 0 true 192.168.11.105 12321 18:F3:B7:9F:66:C4:C4:3E:0F:7D:69:BB:55:92:BC:A3:AC:E4:DD:5F
5df792b0-f49f-6d76-45af-005056a89963 0 true 192.168.11.107 12321 20:4C:C1:48:F5:2D:04:16:55:F1:D3:F1:4C:26:B5:C4:23:E5:B4:12
5e3e467a-1c1b-f803-3d0f-00505692ddc7 0 true 192.168.11.106 12321 53:99:00:B8:9D:1A:97:42:C0:10:C0:AF:8C:AD:91:59:22:8E:C9:79
[root@n-esx08:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-02-11T08:32:55Z
Local Node UUID: 5df792b0-f49f-6d76-45af-005056a89963
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5df792b0-f49f-6d76-45af-005056a89963
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 52c99c6b-6b7a-3e67-4430-4c0aeb96f3f4
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 5df792b0-f49f-6d76-45af-005056a89963
Sub-Cluster Member HostNames: n-esx08.home.uw.cz
Sub-Cluster Membership UUID: f8d4415e-aca5-a597-636d-005056997c1d
Unicast Mode Enabled: true
Maintenance Mode State: ON
Config Generation: 7ef88f9d-a402-48e3-8d3f-2c33f951fce1 6 2020-02-10T21:58:16.349
So here are my nodes
n-esx08 - 192.168.11.108 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx09 - 192.168.11.109 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx10 - 192.168.11.110 - 5df792b0-f49f-6d76-45af-005056a89963
And now the problem is clear. All vSAN nodes have the same UUID.
Why? Let's check ESXi system UUIDs on each ESXi host.
[root@n-esx08:~] esxcli system uuid get
5df792b0-f49f-6d76-45af-005056a89963
[root@n-esx08:~]
[root@n-esx09:~] esxcli system uuid get
5df792b0-f49f-6d76-45af-005056a89963
[root@n-esx09:~]
[root@n-esx10:~] esxcli system uuid get
5df792b0-f49f-6d76-45af-005056a89963
[root@n-esx10:~]
Note: if you want to check UUID of all ESXi hosts, use following PowerCLI
Get-VMHost | Select Name,
@{N='HW BIOS Uuid';E={$_.Extensiondata.Hardware.SystemInfo.Uuid}},
@{N='ESXi System UUid';E={(Get-Esxcli -VMHost $_).system.uuid.get()}}
So the root cause is obvious.
I use nested ESXi hosts to test vSAN and I forgot to regenerate system UUID after the clone.The solution is easy. Just delete UUID from /etc/vmware/esx.conf and restart ESXi hosts.
ESXi system UUID in /etc/vmware/esx.conf |
You can do it from command line as well
sed -i 's/system\/uuid.*//' /etc/vmware/esx.conf
reboot
So we have identified the problem and we are done. After ESXi hosts restart vSAN Cluster Nodes UUIDs are changed automatically and vSAN unicastagents are automatically configured on vSAN nodes as well.
However, if you are interested in how to manually add a connection to a unicast agent on a particular node, you would execute the following command
esxcli vsan cluster unicastagent add –a [ip address unicast agent] –U [supports unicast] –u [Local UUID] -t [type]
Anyway, such a manual configuration should not be necessary and you should do it only when instructed by VMware support.
Hope this helps someone else in VMware community.