Thursday, July 03, 2025

VMwareOpsGuide.com has been retired

I'm an architect and designer, not involved in day-to-day operations, but I firmly believe that any system architecture must be thoughtfully designed for efficient operations, otherwise the Ops team will go mad in no time.

Over the years, I’ve been learning a lot from the book VMware Operations Management by  Iwan E1 Rahabok, which covers everything related to vROps, Aria Operations, and now VCF Operations.

Sunday, June 15, 2025

Veeam Backup & Replication on Linux v13 [Beta]

I have finally found some spare time and I decided to test Veeam Backup & Replication on Linux v13 [Beta] in my home lab. It is BETA, so it is good to test it and be prepared for the final release, even anything can change before the final release is available. 

There is clear information that update and upgrade into newer versions will not be possible, but I'm really curious how Veeam transition from Windows to Linux is doing. 

Anyway, let's test it and get the feeling about the Veeam future with Linux based systems.

Saturday, June 14, 2025

PureStorage has 150TB DirectFlash Modules

I have just realized that PureStorage has 150TB DirectFlash Modules

That got me thinking. 

Flash capacity is increasing year by year. What are performance/capacity ratios?

The reason I'm thinking about it is that poor Tech Designer (like me) need some rule-of-thumb numbers for capacity/performance planning and sizing.

Virtual NIC Link Speed - is it really speed?

This will be a quick blog post, prompted by another question I received about VMware virtual NIC link speed. In this blog post I’d like to demonstrate that the virtual link speed shown in operating systems is merely a reported value and not an actual limit on throughput.

I have two Linux Mint (Debian based) systems mlin01 and mlin02 virtualized in VMware vSphere 8.0.3. Each system has VMXNET3 NIC. Both virtual machines are hosted on the same ESXi host, so they are not constraint by physical network. Let's test network bandwidth between these two systems with iperf.

Tuesday, June 03, 2025

How to troubleshoot virtual disk high latencies in VMware Virtual Machine

In VMware vSphere environments, even the most critical business applications are often virtualized. Occasionally, application owners may report high disk latency issues. However, disk I/O latency can be a complex topic because it depends on several factors, such as the size of the I/O operations, whether the I/O is a read or a write and in which ratio, and of course, the performance of the underlying storage subsystem. 

One of the most challenging aspects of any storage troubleshooting is understanding what size of I/O workload is being generated by the virtual machine. Storage workload I/O size is the significant factor to response time. There are different response times for 4 KB I/O and 1 MB I/O. Here are examples from my vSAN ESA performance testing.

  • 32k IO, 100% read, 100% random - Read Latency: 2.03 ms Write Latency: 0.00 ms
  • 32k IO, 100% write, 100% random - Read Latency: 0.00 ms Write Latency: 1.74 ms
  • 32k IO, 70% read - 30% write, 100% random - Read Latency: 1.55 ms Write Latency: 1.99 ms
  • 1024k IO, 100% read, 100% sequential - Read Latency: 6.38 ms Write Latency: 0.00 ms
  • 1024k IO, 100% write, 100% sequential - Read Latency: 0.00 ms Write Latency: 8.30 ms
  • 1024k IO, 70% read - 30% write, 100% sequential - Read Latency: 5.38 ms Write Latency: 8.68 ms

You can see that response times vary based on storage profile. However, application owners very often do not know what is the storage profile of their application workload and just complain that storage is slow. 

As one storage expert (I think it was Howard Marks [1] [2]) once said, there are only two types of storage performance - good enough and not good enough.
Fortunately, on an ESXi host, we have a useful tool called vscsiStats. We have to know on which ESXi host VM is running and ssh into that particular ESXi host.

The vSCSI monitoring procedure is

  1. List all running virtual machines on particular ESXi host, and identify our Virtual Machine and its identifiers (worldGroupID and Virtual SCSI Disk handleID)
  2. Start vSCSI statistics collection in ESXi host
  3. Collect vSCSI statistics histogram data
  4. Stop vSCSI statistics collection

The procedure is documented in VMware KB - Using vscsiStats to collect IO and Latency stats on Virtual Disks 

Let's test it in lab.

Monday, May 19, 2025

Are you looking for VMware SRM and cannot find it?

Here is what happened with VMware Site Recovery Manager. It was repackaged into VMware Live Recovery.

What is VMware Live Recovery?

VMware Live Recovery is the latest version of disaster and ransomware recovery from VMware. It combines VMware Live Site Recovery (previously Site Recovery Manager) with VMware Live Cyber Recovery (previously VMware Cloud Disaster Recovery) under a single shared management console and a single license. Customers can protect applications and data from modern ransomware and other disasters across VMware Cloud Foundation environments on-premises and in public clouds with flexible licensing for changing business needs and threats. 

For more details see the VMware Live Recovery FAQ and the VMware Live Recovery resource page.

In this blog post I will just copy information from Site Recovery Manager FAQ PDF, because that's what old good on-prem SRM is, and it is good to have it in HTML form in case Broadcom/VMware PDF from what ever reasons disapeer.

Here you have it ...

Thursday, May 15, 2025

How to run IPERF on ESXi host?

iperf is great tool to test network throughput.There is iperf3 in ESXi host, but there are restrictions and you cannot run it.

There is the trick.

First of all, you have to disable ESXi advanced option execInstalledOnly=0. This enables you to run executable binaries which were not preinstalled by VMware.

Second step is to make a copy of iperf binary, because installed version os estricted and cannot be run.

The third step is to disable ESXi firewall to allow cross ESXi communication between iperf client and iperf server.

After finishing performance testing, you should clean ESXi environment

  • delete your copy of iperf
  • re-enable ESXi firewall to allow only required tcp/udp ports for ESXi services
  • re-enable ESXi advanced option (execInstalledOnly=1) to keep ESXi hypervisor secure by default

ESXi Commands 

# Allow execute binaries which are not part of base installation
localcli system settings advanced set -o /User/execInstalledOnly -i 0
 
# Make a copy of iperf
cp /usr/lib/vmware/vsan/bin/iperf3 /usr/lib/vmware/vsan/bin/iperf3.copy
 
# Disable firewall
esxcli network firewall set --enabled false

# Run iperf server
./iperf3.copy -s -B 192.168.123.22

# Run iperf client (typically in another ESXi host than iperf server is running)
./iperf3.copy -c -B 192.168.123.22

After iperf benchmarking you should enable firewall and disallow execution of binaries which are not part of base installation
 
# Cleaning
rm /usr/lib/vmware/vsan/bin/iperf3.copy
esxcli network firewall set --enabled true
localcli system settings advanced set -o /User/execInstalledOnly -i 1
 

Wednesday, May 14, 2025

Test Jumbo Frames (MTU 9000) between ESXi hosts

When we want to enable Jumbo-Fames on VMware vSphere, it must be enabled on

  • physical switches
  • virtual switches - VMware Distributed Switch (VDS) or VMware Standard Switch (VSS)
  • VMkernel interfaces where you would like to use Jumbo-Frames (typically NFS, iSCSI, NVMeoF, vSAN, vMotion)

Let's assume it is configured by network and vSphere administrators and we want to validate that vMotion network between two ESXi hosts supports Jumbo Frames. Let's say we have these two ESXi hosts

  • ESX11 has IP address 10.160.22.111 on vMotion vmk interface within vMotion TCP/IP stack. 
  • ESX12 has IP address 10.160.22.112 on vMotion vmk interface within vMotion TCP/IP stack.
Ping is a good network diagnostic tool for this purpose. It uses ICMP protocol within IP protocol. So, what is the maximum size of IP/ICMP packet? With Jumbo Frame we have 9000 bytes for Layer 2 (Ethernet Frame) payload and within Ethernet Frame is IP packet cariing ICMP packet with Echo Request. So, here is the calculation. 
9000 (MTU) - 20 (IP header) - 8 (ICMP header) = 8972 bytes
 
Let's do ping with 8972 bytes payload and with flag -d (fragmentation disabled).

[root@esx11:~] ping -I vmk1 -S vmotion -s 8972 -d 10.160.22.112
PING 10.160.22.112 (10.160.22.112): 8972 data bytes
8980 bytes from 10.160.22.112: icmp_seq=0 ttl=64 time=0.770 ms
8980 bytes from 10.160.22.112: icmp_seq=1 ttl=64 time=0.637 ms
8980 bytes from 10.160.22.112: icmp_seq=2 ttl=64 time=0.719 ms

We can see succesful test of large ICMP packets without fragmentation. We validated that ICMP packets with size 8972 bytes can be transfered over the network without fragmentation. That's the indication that Jumbo Frames (MTU 9000) are enabled end-to-end.

Now let's try to cary ICMP packets with size 8973 bytes.

[root@esx11:~] ping -I vmk1 -S vmotion -s 8973 -d 10.160.22.112
PING 10.160.22.112 (10.160.22.112): 8973 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)

We can see that ICMP packets with size 8973 bytes cannot be transfered over the network without fragmentation. This is expected behavior and it proves that we know what we do.