Friday, April 10, 2026

MS-SQL Windows Server Failover Clustering on VCF - Best Practices

MS-SQL Windows Server Failover Clustering (WSFC) is used for MS-SQL High Availability deployment on VMware VCF. 

Traditional (historical) WSFC deployment model is Microsoft Windows Server Failover Clustering (WSFC) Always On Failover Cluster Instance (FCI). Always On Failover Cluster Instance is a Microsoft SQL Server high-availability technology that provides instance-level protection. This means that the entire SQL Server installation including binaries, system databases (like master and msdb), user databases, logins, and SQL Server Agent jobs, is protected and fails over as a single cohesive unit to another node in the cluster if a failure occurs. 

An FCI uses a virtual identity (virtual network name and IP address) that is independent of the underlying physical or virtual node names, allowing applications to connect seamlessly regardless of which node is active.

An FCI requires shared storage accessible by all nodes in the cluster and supporting SCSI-3 Persistent Reservations (PR). vSAN ESA is a perfect fit for such shared storage.

WSFC/FCI vSAN best practices

Here are typical topics discussed during consulting engagements.

Usage of vSAN Express Storage Architecture (ESA) 

vSAN ESA (Express Storage Architecture) leverages NVMe disks has positive impact on storage response times, typically below 1 ms.

vSAN ESA Single Tier Storage has more predictable performance than Two Tier (cache tier, capacity tier) vSAN OSA (Original Storage Architecture). vSAN OSA is treated as a legacy vSAN architecture, therefore ESA is highly recommended.

vSAN ESA provides native SCSI-3 PR support.

Shared virtual disks are supported on vSAN ESA for WSFC/FCI Microsoft Clustering out-of-the box.

Based on KB Article 327037,  VMware vSAN supports multi-writer virtual disks, which are used to implement Clustered VMDK scenarios (shared-disk clustering).

Clustered VMDK is a specific Multi-writer use case.

Multi-writer allows admins to make a virtual disk accessible for more than one VM simultaneously. However, it does not provide any reservations or locking mechanism for coordinating access to the virtual disk for consistency.  

Clustered VMDK uses a shared virtual SCSI bus. With Clustered VMDKs, vSAN implements SCSI3-PR for Microsoft Windows Server Failover Clusters (WSFC). Not all of the SCSI3-PR reservation types are supported, only those needed for MS WSFC. If you are trying to use any other application with SCSI3-PR on vSAN, it is not currently supported by VMware. 

Clustered VMDK is little bit misleading term in vSAN because vSAN does not use VMDKs. VMDKs are used on VMFS filesystem backed by LUN on block storage. vSAN is internally object-based storage.

Shared (Multi-writer) virtual disks comes with significant feature restrictions: 

  • Multi-writer (clustered VMDK) on vSAN comes with significant feature restrictions:
  • No snapshots
  • No Storage vMotion for shared disks
  • No Changed Block Tracking (CBT)
  • Limited backup options (no standard image-level backup)
  • Hot-extend restrictions

VM Storage Controller Considerations

For all shared disks use VMware Paravirtual SCSI (PVSCSI) controller with SCSI Bus Sharing "Physical".

Virtual SCSI Controller
 

If you want improve parallel performance, distribute shared disks across multiple PVSCSI controllers (up to four per VM) to maximize I/O parallelism and improve queue depths.

Shared (multi-writer) virtual disks mode must be set to Independent - Persistent as it disables possibility to use VMware snapshots.

Virtual Disk Mode
 

In this mode, image based VMware Backup is not available because snapshots and CBT (Change Block Tracking) is not available.

Storage Policy-Based Management (SPBM) 

Use RAID 5/6 (Erasure Coding) as FTT (Failure-to-Tolerate) method as it delivers RAID-1 performance with significantly better space efficiency.

Optionally use vSAN compression and global deduplication as the performance impact is negligible. 

Other VM Configuration Considerations

The virtual machine must use Virtual Hardware version 15 or higher to support the sharing of VMDKs.

Virtual Machines (WSFC/FCIs) must be deployed using a CAB (Cluster Across Boxes) model, where each cluster node resides on a different physical ESXi host. DRS Anti-Affinity Rules should be used to strictly enforce physical separation and prevent a single host failure from taking down multiple DB nodes.

Assign full memory reservations (Reserved RAM should equal to Provisioned RAM) to all VMs participating in the cluster. This prevents memory ballooning or swapping, which can cause high latency and disrupt critical cluster heartbeats.

Conclusion 

This blog post is covering MS-SQL Windows Server Failover clustering using shared disks

The more modern clustering alternative is Microsoft SQL Server Always On Availability Groups (AG) where shared disks are not used at all and Database Synchronous replication (real-time transaction log streaming) is leveraged. In such alternative architecture, each DB node has its own disks. Such DB clustering architecture does not need any storage specific capabilities.

No comments: