Friday, December 20, 2024

CPU cycles required for general storage workload

I recently published a blog post about CPU cycles required for network and VMware vSAN ESA storage workload. I realized it would be nice to test and quantify CPU cycles needed for general storage workload without vSAN ESA backend operations like RAID/RAIN and compression.

Performance testing is always tricky as it depends on guest OS, firmware, drivers, and application, but we are not looking for exact numbers and approximations are good enough for a general rule of thumb helping pure designer during capacity planning. 

My test environment was old Dell PowerEdge R620 (Intel Xeon CPU E5-2620 @ 2.00GHz), with ESXi 8.0.3 and Windows Server 2025 in a Virtual Machine (2 vCPU @ 2 GHz, 1x para-virtualized SCSI controller/PVSCSI, 1x vDisk). Storage subsystem was VMware VMFS datastore on local NVMe consumer-grade disk (Kingston SNVS1000GB flash).

Storage tests were done using an old good Iometer.

Test VM had total CPU capacity of 4 GHz (4,000,000,000 Hz aka CPU Clock Cycles)

Below are some test results to help me define another rule of thumb.

TEST - 512 B, 100% read, 100% random - 4,040 IOPS @ 2.07 MB/s @ avg response time 0.25 ms

  • 15.49% CPU = 619.6 MHz
  • 619.6 MHz  (619,600,000 CPU cycles) is required to deliver 2.07 MB/s (16,560,000 b/s)
    • 37.42 Hz to read 1 b/s
    • 153.4 KHz for reading 1 IOPS (512 B, random)

TEST - 512 B, 100% write, 100% random - 4,874 IOPS @ 2.50 MB/s @ avg response time 0.2 ms

  • 19.45% CPU = 778 MHz
  • 778 MHz  (778,000,000 CPU cycles) is required to deliver 2.50 MB/s (20,000,000 b/s)
    • 38.9 Hz to write 1 b/s
    • 159.6 KHz for writing 1 IOPS (512 B, random)

TEST - 4 KiB, 100% read, 100% random - 3,813 IOPS @ 15.62 MB/s @ avg response time 0.26 ms

  • 13.85% CPU = 554.0 MHz
  • 554.0 MHz  (554,000,000 CPU cycles) is required to deliver 15.62 MB/s (124,960,000 b/s)
    • 4.43 Hz to read 1 b/s
    • 145.3 KHz for 1 reading IOPS (4 KiB, random)

TEST - 4 KiB, 100% write, 100% random - 4,413 IOPS @ 18.08 MB/s @ avg response time 0.23 ms

  • 21.84% CPU = 873.6 MHz
  • 873.6 MHz  (873,600,000 CPU cycles) is required to deliver 18.08 MB/s (144,640,000 b/s)
    • 6.039 Hz to write 1 b/s
    • 197.9 KHz for writing 1 IOPS (4 KiB, random)

TEST - 32 KiB, 100% read, 100% random - 2,568 IOPS @ 84.16 MB/s @ avg response time 0.39 ms

  • 10.9% CPU = 436 MHz
  • 436 MHz  (436,000,000 CPU cycles) is required to deliver 84.16 MB/s (673,280,000 b/s)
    • 0.648 Hz to read 1 b/s
    • 169.8 KHz for reading 1 IOPS (32 KiB, random)

TEST - 32 KiB, 100% write, 100% random - 2,873 IOPS @ 94.16 MB/s @ avg response time 0.35 ms

  • 14.16% CPU = 566.4 MHz
  • 566.4 MHz  (566,400,000 CPU cycles) is required to deliver 94.16 MB/s (753,280,000 b/s)
    • 0.752 Hz to write 1 b/s
    • 197.1 KHz for writing 1 IOPS (32 KiB, random)

TEST - 64 KiB, 100% read, 100% random - 1,826 IOPS @ 119.68 MB/s @ avg response time 0.55 ms

  • 9.06% CPU = 362.4 MHz
  • 362.4 MHz  (362,400,000 CPU cycles) is required to deliver 119.68 MB/s (957,440,000 b/s)
    • 0.37 Hz to read 1 b/s
    • 198.5 KHz for reading 1 IOPS (64 KiB, random)

TEST - 64 KiB, 100% write, 100% random - 2,242 IOPS @ 146.93 MB/s @ avg response time 0.45 ms

  • 12.15% CPU = 486.0 MHz
  • 486.0 MHz  (486,000,000 CPU cycles) is required to deliver 149.93 MB/s (1,199,440,000 b/s)
    • 0.41 Hz to write 1 b/s
    • 216.7 KHz for writing 1 IOPS (64 KiB, random)

TEST - 256 KiB, 100% read, 100% random - 735 IOPS @ 192.78 MB/s @ avg response time 1.36 ms

  • 6.66% CPU = 266.4 MHz
  • 266.4 MHz  (266,400,000 CPU cycles) is required to deliver 192.78 MB/s (1,542,240,000 b/s)
    • 0.17 Hz to read 1 b/s
    • 362.4 KHz for reading 1 IOPS (256 KiB, random)

TEST - 256 KiB, 100% write, 100% random - 703 IOPS @ 184.49 MB/s @ avg response time 1.41 ms

  • 7.73% CPU = 309.2 MHz
  • 309.2 MHz  (309,200,000 CPU cycles) is required to deliver 184.49 MB/s (1,475,920,000 b/s)
    • 0.21 Hz to write 1 b/s
    • 439.9 KHz for writing 1 IOPS (256 KiB, random)

TEST - 256 KiB, 100% read, 100% seq - 2784 IOPS @ 730.03 MB/s @ avg response time 0.36 ms

  • 15.26% CPU = 610.4 MHz
  • 610.4 MHz  (610,400,000 CPU cycles) is required to deliver 730.03 MB/s (5,840,240,000 b/s)
    • 0.1 Hz to read 1 b/s
    • 219.25 KHz for reading 1 IOPS (256 KiB, sequential)

TEST - 256 KiB, 100% write, 100% seq - 1042 IOPS @ 273.16 MB/s @ avg response time 0.96 ms

  • 9.09% CPU = 363.6 MHz
  • 363.6 MHz  (363,600,000 CPU cycles) is required to deliver 273.16 MB/s (2,185,280,000 b/s)
    • 0.17 Hz to write 1 b/s
    • 348.4 KHz for writing 1 IOPS (256 KiB, sequential)

TEST - 1 MiB, 100% read, 100% seq - 966 IOPS @ 1013.3 MB/s @ avg response time 1 ms

  • 9.93% CPU = 397.2 MHz
  • 397.2 MHz  (397,200,000 CPU cycles) is required to deliver 1013.3 MB/s (8,106,400,000 b/s)
    • 0.05 Hz to read 1 b/s
    • 411.18 KHz for reading 1 IOPS (1 MiB, sequential)

TEST - 1 MiB, 100% write, 100% seq - 286 IOPS @ 300.73 MB/s @ avg response time 3.49 ms

  • 10.38% CPU = 415.2 MHz
  • 415.2 MHz  (415,200,000 CPU cycles) is required to deliver 300.73 MB/s (2,405,840,000 b/s)
    • 0.17 Hz to write 1 b/s
    • 1.452 MHz for writing 1 IOPS (1 MiB, sequential)

Observations

We can see that the CPU cycles required to read 1 b/s vary based on I/O size, Read/Write, and Random/Sequential pattern.

  • Small I/O (512 B, random) can consume almost 40 Hz to read or write 1 b/s. 
  • Normalized I/O (32 KiB, random) can consume around 0.7 Hz to read or write 1 b/s
  • Large I/O (1 MiB, sequential) can consume around 0.1 Hz to read or write 1 b/s
If we use the same approach as for vSAN and average 32 KiB I/O (random) and 1 MiB I/O (sequential), we can define the following rule of thumb 
"0.5 Hz of general purpose x86-64 CPU (Intel Sandy Bridge) is required to read or write 1 bit/s from local NVMe flash disk"

If we compare it with the 3.5 Hz rule of thumb for vSAN ESA RAID-5 with compression, we can see the vSAN ESA requires 7x more CPU cycles, but it makes perfect sense because vSAN ESA does a lot of additional processing on the backend. Such processing mainly involves data protection (RAID-5/RAIN-5) and compression.  

I was curious how much CPU cycles require a non-redundant storage workload and observed numbers IMHO make sense.

Hope this helps others during infrastructure design exercises. 

No comments: