MS-SQL Windows Server Failover Clustering (WSFC) is used for MS-SQL High Availability deployment on VMware VCF.
Traditional (historical) WSFC deployment model is Microsoft Windows Server Failover Clustering (WSFC) Always On Failover Cluster Instance (FCI). Always On Failover Cluster Instance is a Microsoft SQL Server high-availability technology that provides instance-level protection. This means that the entire SQL Server installation including binaries, system databases (like master and msdb), user databases, logins, and SQL Server Agent jobs, is protected and fails over as a single cohesive unit to another node in the cluster if a failure occurs.
An FCI uses a virtual identity (virtual network name and IP address) that is independent of the underlying physical or virtual node names, allowing applications to connect seamlessly regardless of which node is active.
An FCI requires shared storage accessible by all nodes in the cluster and supporting SCSI-3 Persistent Reservations (PR). vSAN ESA is a perfect fit for such shared storage.
Let's document typical topics and best practices about WSFC/FCI.
Usage of vSAN Express Storage Architecture (ESA)
vSAN ESA (Express Storage Architecture) leverages NVMe disks has positive impact on storage response times, typically below 1 ms.
vSAN ESA Single Tier Storage has more predictable performance than Two Tier (cache tier, capacity tier) vSAN OSA (Original Storage Architecture). vSAN OSA is treated as a legacy vSAN architecture, therefore ESA is highly recommended.
vSAN ESA provides native SCSI-3 PR support.
VM Storage Controller and Shared Virtual Disks Considerations
Let's document VMware terminology around shared virtual disks used on various documents, because this area is little bit confusing.
- Multi-writer = low-level disk capability (concurrent writes enabled)
- used for Oracle RAC, but not for WSFC/FCI Microsoft Clustering
- Clustered VMDK = supported shared-disk clustering pattern (single-writer with arbitration)
- used for WSFC/FCI Microsoft Clustering on VMFS based datastores, but it is not available on vSAN datastore
- shared and non-shared virtual disks cannot be used on the same VMFS based datastore with Clustered VMDK feature enabled.
For fully supported Microsoft WSFC/FCI deployment on vSAN, the multi-writer flag must not be used, therefore, the virtual disk must remain in the default ‘No sharing / Unspecified’ mode, while access coordination is handled by SCSI-3 Persistent Reservations.
Clustered VMDK feature is not available on vSAN datastore. but shared virtual disks are supported on vSAN ESA for WSFC/FCI Microsoft Clustering out-of-the box.
All shared virtual disks must be connected via VMware Paravirtual SCSI (PVSCSI) controller with SCSI Bus Sharing "Physical".
![]() |
| Virtual SCSI Controller |
If you want improve parallel performance, distribute shared disks across multiple PVSCSI controllers (up to four per VM) to maximize I/O parallelism and improve queue depths.
Shared virtual disk mode must be set to Independent - Persistent to disable possibility to use VMware snapshots. The multi-writer flag in sharing mode must not be used.
![]() |
| Virtual Disk Mode |
Shared virtual disks in independent mode used for WSFC/FCI shared storage comes with following feature restrictions:
- No snapshots and no Changed Block Tracking (CBT), therefore, limited backup options (no standard image-level backup)
- No Storage vMotion
- Disk hot-extend restrictions
- All shared VMDKs must be Eager Zeroed Thick (EZT).
Storage Policy-Based Management (SPBM)
Use RAID 5/6 (Erasure Coding) as FTT (Failure-to-Tolerate) method as it delivers RAID-1 performance with significantly better space efficiency.
Optionally use vSAN compression and global deduplication as the performance impact is negligible.
vMotion
vMotion is another interesting topic. Using vMotion for Microsoft SQL Server virtual machines, especially in clustered solutions (FCI or AG), requires specific configuration to ensure stability and avoid unwanted cluster crashes during migration.
Here are the main recommendations and ways to use vMotion correctly:
- Minimize the impact of stun time - During the final phase of vMotion migration, there is a so-called "stun time" (a short pause) when the last part of the memory is transferred. During this state, the VM cannot send or receive the cluster heartbeat.
- Heartbeat limits: In modern versions of Windows Server, the limit for declaring a node unavailable is increased to 10 seconds (within the same subnet) or 20 seconds (across subnets). A well-configured vMotion should complete the migration well below these limits.
- vMotion Application Notification (Recommended New Feature - For SQL Server 2025 and VCF 9 environments, it is highly recommended to enable vMotion Application Notification.
- How it works: SQL Server receives a notification via VMware Tools before the migration begins.
- Benefit: SQL Server can temporarily suspend latency-sensitive operations, stabilize transactions, or prepare Always On replica synchronization for a short interruption, minimizing the risk of an unwanted failover.
- Requirements: Requires vSphere 8 and later, VMware Tools (min. version 11.0), and virtual hardware version 20.
Memory (RAM) Reservations
All SQL Server VMs participating in the cluster should be set to full memory reservation (Reserved = Provisioned). This prevents swapping to disk, which could dramatically increase the stun time during vMotion and cause the cluster to break.
Note: If you are using shared disks for FCI, Clustered VMDK and vSAN ESA are fully compatible with vMotion (including Storage vMotion for storage maintenance)
VM RAM reservation also prevents memory ballooning or swapping, which can cause high latency and disrupt critical cluster heartbeats.
Virtual Machine Hardware
The virtual machine must use Virtual Hardware version 15 or higher to support the sharing of VMDKs, hoever, vMotion Application Notification requires Virtual Hardware Version 20 and newer.
Cluster Nodes (VMs) must be deployed across different physical ESXi hosts
Virtual Machines (WSFC/FCIs) must be deployed using a CAB (Cluster Across Boxes) model, where each cluster node resides on a different physical ESXi host. DRS Anti-Affinity Rules should be used to strictly enforce physical separation and prevent a single host failure from taking down multiple DB nodes.
Conclusion
This blog post is covering the traditional MS-SQL Windows Server Failover Clustering (WSFC) using shared disks.
WSFC/FCI on vSAN represents a legacy shared-disk clustering model, whereas modern architectures (Microsoft SQL Server Always On Availability Groups) prefer replication-based approaches (real-time transaction log streaming) that avoid shared storage dependencies.


No comments:
Post a Comment