Tuesday, July 02, 2019

vSAN logical design and SSD versus NVMe considerations

I'm just preparing vSAN capacity planning for PoC of one of my customers. Capacity planning for traditional and hyper-converged infrastructure is principally the same. You have to understand TOTAL REQUIRED CAPACITY of your workloads and  USABLE CAPACITY of vSphere Cluster you are designing. Of course, you need to understand how vSAN hyper-converged system conceptually and logically works but it is not rocket science. vSAN is conceptually very straight forward and you can design very different storage systems from performance and capacity point of view. It is just a matter of components you will use. You probably understand that performance characteristics differ if you use rotational SATA disks, SSD or NVMe. For NVMe, 10Gb network can be the bottleneck so you should consider 25Gb network or even more. So, in the figure below is an example of my particular vSAN capacity planning and proposed logical specifications.


Capacity planning is the part of the logical design phase, therefore any physical specifications and details should be avoided. However, within the logical design, you should compare multiple options having an impact on infrastructure design qualities such as

  • availability, 
  • manageability, 
  • scalability, 
  • performance, 
  • security, 
  • recoverability 
  • and last but not least the cost.  

For such considerations, you have to understand the characteristics of different "materials" your system will be eventually built from. When we are talking about magnetic disks, SSD, NVMe, NICs, etc. we are thinking about logical components. So I was just considering the difference between SAS SSD and NVMe Flash for the intended storage system. Of course, different physical models will behave differently but hey, we are in the logical design phase so we need at least some theoretical estimations. We will see the real behavior and performance characteristics after the system is built and tested before production usage or we can invest some time into PoC and validate our expectations.

Nevertheless, cost and performance is always a hot topic when talking with technical architects. Of course, higher performance costs more. However, I was curious about the current situation on the market so I quickly checked the price of SSD and NVMe on DELL.com e-shop.

Note that this is just the indicative, kind of street price, but it has some informational value.

This is what I have found there today

  • Dell 6.4TB, NVMe, Mixed Use Express Flash, HHHL AIC, PM1725b, DIB - 213,150 CZK
  • Dell 3.84TB SSD vSAS Mixed Use 12Gbps 512e 2.5in Hot-Plug AG drive,3 DWPD 21024 TBW - 105,878 CZK

1 TB of NVMe storage costs 33,281 CZK
1 TB of SAS SSD storage costs 27,572 CZK
This is approximately 20% difference advantage for SAS SSD.

So here are SAS SSD advantages

  • ~ 20% less expensive material
  • scalability because you can put 24 and more SSD disks to 2U rack server but the same server supports usually less than 8 PCIe slots
  • manageability as you can more easily replace disks than PCI cards

The NVMe advantage is the performance with a positive impact on storage latency as SAS SSD has ~250 μs latency and NVMe ~= 80 μs so you should improve performance and storage service quality by a factor of 3.

So as always, you have to consider what infrastructure design quality is good for your particular use case and non-functional requirements and do the right design decision(s) with justification(s).

Any comment? Real experience? Please, leave the comment below the article. 

2 comments:

Anonymous said...

Really nice post David one unfortunate issue i had front was doing that capacity planning not considering the 30% slack space needed for rebalance and resync operations in vSAN.

It is important to let know customers about that space, having a lot of free space on your vSAN Datastore does not always mean you can fill it up.

David Pasek said...

Hi Leonardo. You are absolutely correct. Slack space is very often forgotten and it leads to some bad surprises during vSAN operation. Btw, there is vSAN Slack Space (25%) calculated in my capacity planning drawing but it can be overlooked so thanks a lot for highlighting this important fact.