Pages

Friday, April 10, 2026

MS-SQL Windows Server Failover Clustering on VCF - Best Practices

MS-SQL Windows Server Failover Clustering (WSFC) is used for MS-SQL High Availability deployment on VMware VCF. 

Traditional (historical) WSFC deployment model is Microsoft Windows Server Failover Clustering (WSFC) Always On Failover Cluster Instance (FCI). Always On Failover Cluster Instance is a Microsoft SQL Server high-availability technology that provides instance-level protection. This means that the entire SQL Server installation including binaries, system databases (like master and msdb), user databases, logins, and SQL Server Agent jobs, is protected and fails over as a single cohesive unit to another node in the cluster if a failure occurs. 

An FCI uses a virtual identity (virtual network name and IP address) that is independent of the underlying physical or virtual node names, allowing applications to connect seamlessly regardless of which node is active.

An FCI requires shared storage accessible by all nodes in the cluster and supporting SCSI-3 Persistent Reservations (PR). vSAN ESA is a perfect fit for such shared storage.

Let's document typical topics and best practices about WSFC/FCI.

Usage of vSAN Express Storage Architecture (ESA) 

vSAN ESA (Express Storage Architecture) leverages NVMe disks has positive impact on storage response times, typically below 1 ms.

vSAN ESA Single Tier Storage has more predictable performance than Two Tier (cache tier, capacity tier) vSAN OSA (Original Storage Architecture). vSAN OSA is treated as a legacy vSAN architecture, therefore ESA is highly recommended.

vSAN ESA provides native SCSI-3 PR support.

VM Storage Controller and Shared Virtual Disks Considerations 

Let's document VMware terminology around shared virtual disks used on various documents, because this area is little bit confusing.

  • Multi-writer = low-level disk capability (concurrent writes enabled)
    • used for Oracle RAC, but not for WSFC/FCI Microsoft Clustering
  • Clustered VMDK = supported shared-disk clustering pattern (single-writer with arbitration)
    • used for WSFC/FCI Microsoft Clustering on VMFS based datastores, but it is not available on vSAN datastore
    • shared and non-shared virtual disks cannot be used on the same VMFS based datastore with Clustered VMDK feature enabled.

For fully supported Microsoft WSFC/FCI deployment on vSAN, the multi-writer flag must not be used, therefore, the virtual disk must remain in the default ‘No sharing / Unspecified’ mode, while access coordination is handled by SCSI-3 Persistent Reservations.  

Clustered VMDK feature is not available on vSAN datastore. but shared virtual disks are supported on vSAN ESA for WSFC/FCI Microsoft Clustering out-of-the box. 

All shared virtual disks must be connected via VMware Paravirtual SCSI (PVSCSI) controller with SCSI Bus Sharing "Physical".

Virtual SCSI Controller
 

If you want improve parallel performance, distribute shared disks across multiple PVSCSI controllers (up to four per VM) to maximize I/O parallelism and improve queue depths.

Shared virtual disk mode must be set to Independent - Persistent to disable possibility to use VMware snapshots. The multi-writer flag in sharing mode must not be used.

Virtual Disk Mode

Shared virtual disks in independent mode used for WSFC/FCI shared storage comes with following feature restrictions: 

  • No snapshots and no Changed Block Tracking (CBT), therefore, limited backup options (no standard image-level backup)
  • No Storage vMotion
  • Disk hot-extend restrictions
  • All shared VMDKs must be Eager Zeroed Thick (EZT).   

Storage Policy-Based Management (SPBM) 

Use RAID 5/6 (Erasure Coding) as FTT (Failure-to-Tolerate) method as it delivers RAID-1 performance with significantly better space efficiency.

Optionally use vSAN compression and global deduplication as the performance impact is negligible. 

Other VM Configuration Considerations

The virtual machine must use Virtual Hardware version 15 or higher to support the sharing of VMDKs.

Virtual Machines (WSFC/FCIs) must be deployed using a CAB (Cluster Across Boxes) model, where each cluster node resides on a different physical ESXi host. DRS Anti-Affinity Rules should be used to strictly enforce physical separation and prevent a single host failure from taking down multiple DB nodes.

Assign full memory reservations (Reserved RAM should equal to Provisioned RAM) to all VMs participating in the cluster. This prevents memory ballooning or swapping, which can cause high latency and disrupt critical cluster heartbeats.

Conclusion 

This blog post is covering  the traditional MS-SQL Windows Server Failover Clustering (WSFC) using shared disks

WSFC/FCI on vSAN represents a legacy shared-disk clustering model, whereas modern architectures (Microsoft SQL Server Always On Availability Groups) prefer replication-based approaches (real-time transaction log streaming) that avoid shared storage dependencies.

No comments:

Post a Comment