Wednesday, July 30, 2025

vSAN ESA RAID5 issue?

I was observing unexpected behavior in my vSAN ESA cluster. I have a 6-node vSAN ESA cluster and a VM with a Storage Policy configured for RAID-5 (Erasure Coding). Based on the cluster size, I would expect a 4+1 stripe configuration. However, the system is using 2+1 striping, which typically applies to clusters with only 3 to 5 nodes.

RAID-5 (2+1) striping is using 133% of the raw storage

RAID-5 (4+1) striping is using 120% of the raw storage

A 13% difference is worth investigating.

Screenshots from my environment are attached below ... 

6-Node vSAN ESA Cluster
 

Storage Policy with RAID-5 Erasure Coding
 

VM with RAID-5 Storage Policy on 6-node vSAN Cluster. Why 2+1 and nor 4+1?

Is there something I’m misunderstanding or doing incorrectly?

No, I'm not doing anything incorrectly and here is the explanation what happened.

I had one host in long-time maintenance, and RAID-5 was proactively rebuilt from 4+1 to 2+1, because that's how it works for 5-node vSAN clusters. This is expected behavior and I was fine with that.

When 6th host was added back to the cluster, it took 24 hours to rebuild it back to 4+1. I just did not wait long enough :-) I thought it would take a while and checked the status after 22 hours, but that was not enough. 

After 24 hours, the vSAN object striping was 4+1 as depicted on screenshot below.

The problem solved itself after 24 hours

Conclusion

The issue resolved itself after approximately 24 hours. It's important to be aware of this behavior, as it can impact capacity planning and design decisions for a 6-node vSAN ESA cluster.

I’m planning to scale out to a 7-node vSAN cluster soon, which will enable the use of RAID-6 with a consistent 4+2 erasure coding scheme. However, documenting this kind of adaptive RAID-5 behavior in 6-node configurations could be valuable for other VMware users relying on vSAN ESA and similar storage policies.

What still confuses me is that the same degraded RAID-5 policy continued to be applied to newly created vSAN objects during the 24 hours after the host exited long-term maintenance mode.

No comments: