Tuesday, June 03, 2025

How to troubleshoot virtual disk high latencies in VMware Virtual Machine

In VMware vSphere environments, even the most critical business applications are often virtualized. Occasionally, application owners may report high disk latency issues. However, disk I/O latency can be a complex topic because it depends on several factors, such as the size of the I/O operations, whether the I/O is a read or a write and in which ratio, and of course, the performance of the underlying storage subsystem. 

One of the most challenging aspects of any storage troubleshooting is understanding what size of I/O workload is being generated by the virtual machine. Storage workload I/O size is the significant factor to response time. There are different response times for 4 KB I/O and 1 MB I/O. Here are examples from my vSAN ESA performance testing.

  • 32k IO, 100% read, 100% random - Read Latency: 2.03 ms Write Latency: 0.00 ms
  • 32k IO, 100% write, 100% random - Read Latency: 0.00 ms Write Latency: 1.74 ms
  • 32k IO, 70% read - 30% write, 100% random - Read Latency: 1.55 ms Write Latency: 1.99 ms
  • 1024k IO, 100% read, 100% sequential - Read Latency: 6.38 ms Write Latency: 0.00 ms
  • 1024k IO, 100% write, 100% sequential - Read Latency: 0.00 ms Write Latency: 8.30 ms
  • 1024k IO, 70% read - 30% write, 100% sequential - Read Latency: 5.38 ms Write Latency: 8.68 ms

You can see that response times vary based on storage profile. However, application owners very often do not know what is the storage profile of their application workload and just complain that storage is slow. 

As one storage expert (I think it was Howard Marks [1] [2]) once said, there are only two types of storage performance - good enough and not good enough.
Fortunately, on an ESXi host, we have a useful tool called vscsiStats. We have to know on which ESXi host VM is running and ssh into that particular ESXi host.

The vSCSI monitoring procedure is

  1. List all running virtual machines on particular ESXi host, and identify our Virtual Machine and its identifiers (worldGroupID and Virtual SCSI Disk handleID)
  2. Start vSCSI statistics collection in ESXi host
  3. Collect vSCSI statistics histogram data
  4. Stop vSCSI statistics collection

The procedure is documented in VMware KB - Using vscsiStats to collect IO and Latency stats on Virtual Disks 

Let's test it in lab.