Wednesday, May 25, 2016

ESXi : How to mask storage device causing some issues

I have heard about the issue with ESXi 6 Update 2 and HP 3PAR storage where VVOLs are enabled. I have been told that the issue is caused by issuing unsupported SCSI command to PE LUN (256). PE stands for Protocol Endpoint and it is VVOL technical LUN for data path between ESXi and remote storage system.

Observed symptoms:
  • ESX 6 Update 2 – issues (ESXi disconnects from vCenter, console is very slow)
  • Hosts may take a long time to reconnect to vCenter after reboot or hosts may enter a "Not Responding" state in vCenter Server
  • Storage-related tasks such as HBA rescan may take a very long time to complete
  • I have been told that ESX 6 Update 1 doesn't experience such issues (there are entries are in log file but no other symptoms occur)
Below is a snippet from a log file ..

 2016-05-18T11:31:27.319Z cpu1:242967)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac0150c3" - issuing command 0x43a657470fc0  
 2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a657470fc0) to dev "naa.2ff70002ac0150c3" failed on path "vmhba0:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.  
 2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba0:C0:T2:L256 device naa.2ff70002ac0150c3 - triggering path failover  
 2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac0150c3": awaiting fast path state update before retrying failed command again.  

Possible workarounds
  • ESXi hostd restart helps therefore SSH to ESXi hosts was enabled for quick resolution in case of problem
  • LUN masking of LUN 256
UPDATE 2016-09-30: There is most probably another workaround. Changing the Disk.MaxLUN parameter on ESXi Hosts as described in VMware KB 1998

Final solution
  • Application of HP 3PAR firmware patch (unfortunately patch is not available for current firmware thus firmware upgrade has to be planned and executed)
  • Investigation of root cause why ESXi 6 Update 2 is more sensitive then ESXi 6 Update 1
Immediate steps
  • Application of workarounds mentioned above  

HOME LAB EXERCISE
I have tested in my home lab how to mask particular LUN on ESXi host just to be sure I know how to do it.

Below is quick solution for impatient readers.

Let's sat we have following device with following path.
  • Device: naa.6589cfc000000bf5e731ffc99ec35186
  • Path: vmhba36:C0:T0:L1
LUN Masking
esxcli storage core claimrule add -P MASK_PATH -r 500 -t location -A vmhba36 -C 0 -T 0 -L 1
esxcli storage core claimrule load
esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186

LUN Unmasking
esxcli storage core claimrule remove --rule 500
esxcli storage core claimrule load
esxcli storage core claiming unclaim --type=path --path=vmhba36:C0:T0:L1
esxcli storage core claimrule run

... continue reading for details.

LUN MASKING DETAILS
Exact LUN masking procedure is documented in vSphere 6 Documentation here. It is also documented in these KB articles 1009449 and 1014953.

List storage devices

 [root@esx02:~] esxcli storage core device list  
 naa.6589cfc000000bf5e731ffc99ec35186  
   Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186)  
   Has Settable Display Name: true  
   Size: 10240  
   Device Type: Direct-Access  
   Multipath Plugin: NMP  
   Devfs Path: /vmfs/devices/disks/naa.6589cfc000000bf5e731ffc99ec35186  
   Vendor: FreeNAS  
   Model: iSCSI Disk  
   Revision: 0123  
   SCSI Level: 6  
   Is Pseudo: false  
   Status: degraded  
   Is RDM Capable: true  
   Is Local: false  
   Is Removable: false  
   Is SSD: true  
   Is VVOL PE: false  
   Is Offline: false  
   Is Perennially Reserved: false  
   Queue Full Sample Size: 0  
   Queue Full Threshold: 0  
   Thin Provisioning Status: yes  
   Attached Filters:  
   VAAI Status: supported  
   Other UIDs: vml.010001000030303530353661386131633830300000695343534920  
   Is Shared Clusterwide: true  
   Is Local SAS Device: false  
   Is SAS: false  
   Is USB: false  
   Is Boot USB Device: false  
   Is Boot Device: false  
   Device Max Queue Depth: 128  
   No of outstanding IOs with competing worlds: 32  
   Drive Type: unknown  
   RAID Level: unknown  
   Number of Physical Drives: unknown  
   Protection Enabled: false  
   PI Activated: false  
   PI Type: 0  
   PI Protection Mask: NO PROTECTION  
   Supported Guard Types: NO GUARD SUPPORT  
   DIX Enabled: false  
   DIX Guard Type: NO GUARD SUPPORT  
   Emulated DIX/DIF Enabled: false
  
 naa.6589cfc000000ac12355fe604028bf21  
   Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21)  
   Has Settable Display Name: true  
   Size: 10240  
   Device Type: Direct-Access  
   Multipath Plugin: NMP  
   Devfs Path: /vmfs/devices/disks/naa.6589cfc000000ac12355fe604028bf21  
   Vendor: FreeNAS  
   Model: iSCSI Disk  
   Revision: 0123  
   SCSI Level: 6  
   Is Pseudo: false  
   Status: degraded  
   Is RDM Capable: true  
   Is Local: false  
   Is Removable: false  
   Is SSD: true  
   Is VVOL PE: false  
   Is Offline: false  
   Is Perennially Reserved: false  
   Queue Full Sample Size: 0  
   Queue Full Threshold: 0  
   Thin Provisioning Status: yes  
   Attached Filters:  
   VAAI Status: supported  
   Other UIDs: vml.010002000030303530353661386131633830310000695343534920  
   Is Shared Clusterwide: true  
   Is Local SAS Device: false  
   Is SAS: false  
   Is USB: false  
   Is Boot USB Device: false  
   Is Boot Device: false  
   Device Max Queue Depth: 128  
   No of outstanding IOs with competing worlds: 32  
   Drive Type: unknown  
   RAID Level: unknown  
   Number of Physical Drives: unknown  
   Protection Enabled: false  
   PI Activated: false  
   PI Type: 0  
   PI Protection Mask: NO PROTECTION  
   Supported Guard Types: NO GUARD SUPPORT  
   DIX Enabled: false  
   DIX Guard Type: NO GUARD SUPPORT  
   Emulated DIX/DIF Enabled: false  

So we have two device with following NAA IDs
  • naa.6589cfc000000bf5e731ffc99ec35186
  • naa.6589cfc000000ac12355fe604028bf21
Now let's list paths of both of my iSCSI devices

[root@esx02:~] esxcli storage nmp path list
iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000bf5e731ffc99ec35186
   Runtime Name: vmhba36:C0:T0:L1
   Device: naa.6589cfc000000bf5e731ffc99ec35186
   Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186)
   Group State: active
   Array Priority: 0
   Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
   Path Selection Policy Path Config: {current path; rank: 0}

iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000ac12355fe604028bf21
   Runtime Name: vmhba36:C0:T0:L2
   Device: naa.6589cfc000000ac12355fe604028bf21
   Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21)
   Group State: active
   Array Priority: 0
   Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
   Path Selection Policy Path Config: {current path; rank: 0}

Let's mask iSCSI devices exposed as a LUN 1.
So our path we want to mask is vmhba36:C0:T0:L1 and device UID is naa.6589cfc000000bf5e731ffc99ec35186

So let's create masking rule of path above. In this particular case we have just a single path because it is local device. In real environment we have usually multiple paths and all paths should be masked.

 esxcli storage core claimrule add -P MASK_PATH -r 500 -t location -A vmhba36 -C 0 -T 0 -L 1
 esxcli storage core claimrule load  

We can list our claim rules to see the result

 [root@esx02:~] esxcli storage core claimrule list  
 Rule Class  Rule Class  Type    Plugin   Matches                  XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size  
 ---------- ----- ------- --------- --------- ---------------------------------------- ------------------------------- --------------------------- -----------------------  
 MP       0 runtime transport NMP    transport=usb                            false            false            0  
 MP       1 runtime transport NMP    transport=sata                           false            false            0  
 MP       2 runtime transport NMP    transport=ide                            false            false            0  
 MP       3 runtime transport NMP    transport=block                           false            false            0  
 MP       4 runtime transport NMP    transport=unknown                          false            false            0  
 MP      101 runtime vendor   MASK_PATH vendor=DELL model=Universal Xport                  false            false            0  
 MP      101 file   vendor   MASK_PATH vendor=DELL model=Universal Xport                  false            false            0  
 MP      500 runtime location  MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1              false            false            0  
 MP      500 file   location  MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1              false            false            0  
 MP     65535 runtime vendor   NMP    vendor=* model=*                          false            false            0  

We can see that new claim rule (500) is in configuration file (/etc/vmware/esx.com) and also loaded in runtime.

However, to really mask our particular device without ESXi host reboot we have to reclaim device

 [root@esx02:~] esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186  

The particular device disappear from ESXi host immediately. ESXi host reboot is not needed.
So we are done. Particular device is not visible to ESXi host anymore.

Note: I was unsuccessful when I was testing LUN masking with local device. Therefore I assume that LUN masking works only with remote disks (iSCSI, Fibre Channel). 

LUN UNMASKING
Just in case you would like to unmask device and use it again here is the procedure.

Let's start with removing claimrules for our previously masked path.

 [root@esx02:~] esxcli storage core claimrule remove --rule 500  
 [root@esx02:~] esxcli storage core claimrule list  
 Rule Class  Rule Class  Type    Plugin   Matches                  XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size  
 ---------- ----- ------- --------- --------- ---------------------------------------- ------------------------------- --------------------------- -----------------------  
 MP       0 runtime transport NMP    transport=usb                            false            false            0  
 MP       1 runtime transport NMP    transport=sata                           false            false            0  
 MP       2 runtime transport NMP    transport=ide                            false            false            0  
 MP       3 runtime transport NMP    transport=block                           false            false            0  
 MP       4 runtime transport NMP    transport=unknown                          false            false            0  
 MP      101 runtime vendor   MASK_PATH vendor=DELL model=Universal Xport                  false            false            0  
 MP      101 file   vendor   MASK_PATH vendor=DELL model=Universal Xport                  false            false            0  
 MP      500 runtime location  MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1              false            false            0  
 MP     65535 runtime vendor   NMP    vendor=* model=*                          false            false            0  
 [root@esx02:~]   

You can see that rule is removed from file configuration but it is still running. We have to re-load claimrules from file to runtime.

 [root@esx02:~] esxcli storage core claimrule load  
 [root@esx02:~] esxcli storage core claimrule list  
 Rule Class  Rule Class  Type    Plugin   Matches              XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size  
 ---------- ----- ------- --------- --------- --------------------------------- ------------------------------- --------------------------- -----------------------  
 MP       0 runtime transport NMP    transport=usb                        false            false            0  
 MP       1 runtime transport NMP    transport=sata                        false            false            0  
 MP       2 runtime transport NMP    transport=ide                        false            false            0  
 MP       3 runtime transport NMP    transport=block                       false            false            0  
 MP       4 runtime transport NMP    transport=unknown                      false            false            0  
 MP      101 runtime vendor   MASK_PATH vendor=DELL model=Universal Xport              false            false            0  
 MP      101 file   vendor   MASK_PATH vendor=DELL model=Universal Xport              false            false            0  
 MP     65535 runtime vendor   NMP    vendor=* model=*                       false            false            0  
 [root@esx02:~]   

Here we go. Now there is no rule with id 500.

But the device is still not visible and we cannot execute command
esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186
because such device is not visible to ESXi host. We mask it, right? So it is exactly how it should behave.

ESXi host would probably help but can we do it without ESXi host reboot?
The answer is yes we can.
We have to unclaim the path to our device and re-run claim rules.

 esxcli storage core claiming unclaim --type=path --path=vmhba36:C0:T0:L1  
 esxcli storage core claimrule run  

and now we can see both paths to iSCSI LUNs again.

 [root@esx02:~] esxcli storage nmp path list  
 iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000bf5e731ffc99ec35186  
   Runtime Name: vmhba36:C0:T0:L1  
   Device: naa.6589cfc000000bf5e731ffc99ec35186  
   Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186)  
   Group State: active  
   Array Priority: 0  
   Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}  
   Path Selection Policy Path Config: {current path; rank: 0}  
 iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000ac12355fe604028bf21  
   Runtime Name: vmhba36:C0:T0:L2  
   Device: naa.6589cfc000000ac12355fe604028bf21  
   Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21)  
   Group State: active  
   Array Priority: 0  
   Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}  
   Path Selection Policy Path Config: {current path; rank: 0}  

Hope this helps to other vmware users having a need for LUN masking / unmasking.

Monday, May 23, 2016

Storage DRS Design Considerations

This blog post follows blog post "VMware vSphere SDRS - test plan of SDRS initial placement" and summarizes several facts having an impact on SDRS design decisions. If you want to see results of several SDRS tests I did in my home lab read my previous blog post.

SDRS design considerations:
  • SDRS Initial Placement algorithm does NOT take VM swap file capacity into account. However, Subsequent Rebalance Calculations are based on free space on particular datastores therefore if Virtual Machines are in PowerOn state and swap files exist then it is considered because VM swap file usage will be decreased from datastore total free space and it has an impact on space load. Space load formula is [space load] = [total consumed space on the datastore] / [datastore capacity]
  • Storage Space threshold is just a threshold (soft limit) used by SDRS for balancing and defragment. It is not a hard limit. SDRS is trying to keep free space on datastores based on space threshold but SDRS doesn't guarantee you will have always some amount of free space in datastores. See. Test-3 in my SDRS test plan here [6].
  • SDRS defragmentation works but there can be some cases when initial placement fails even the storage was freed up and there will be free continuous space in some datastore after defragmentation. See. Test-2 in my SDRS test plan here [6]. That's up to the component which does provisioning. It is important to understand how provisioning to datastore cluster really works. A lot of people think that Datastore Cluster behaves like a giant datastore. It is true from high-level view (abstracted view) but in reality Datastore Cluster is nothing else then just a group of single datastores where SDRS is "just" a scheduler on top of Datastore Cluster datastores. You can imagine a scheduler as a placement engine which prepares placement recommendations for initial placement and continuous balancing. That means that other software components (C# Client, Web Client, PowerCLI, vRealize Automation, vCloud Director, etc) are responsible for initial placement provisioning and SDRS give them recommendations where is the best place to put a new storage object (vmdk file or VM config file). Here [8] is the proof of my statement.
  • When SDRS is configured to consider I/O metrics for load balancing then it is considered also during initial placement. Please, do not mix up SDRS I/O metrics and SIOC. These are two different things even SDRS I/O metrics are leveraging normalized latency calculated by SIOC.
  • Q: Do I need to use SDRS I/O metrics for load balancing? A: It depends on your physical storage system (disk array). If you have disk array with modern storage architecture then you will have most probably all datastores  (aka LUNs, volumes) on single physical disk pool. In that case, it doesn't make sense to load balance (do storage vMotion in case of I/O contention) between datastores because it will always end up in same physical spindles anyway and on top of that it will generate additional storage workload. The same is true for initial placement. If you have your datastores on different physical spindles then it can help. This is typically used on storage systems using RAID groups which is not very common nowadays.
  • SDRS calculation is done on vmdks, not the whole Virtual Machine. But affinity rules (keep together) tend to keep the vmdks together making it similar behavior as if the was VM. By default, virtual machine files are kept together in the working directory of the virtual machine. If the virtual machine needs to be migrated, all the files inside the virtual machines’ working directory are moved. However, if the default affinity rule is disabled (see. Screenshot 1), Storage DRS will move the working directory and virtual disks separately allowing Storage DRS to distribute the virtual disk files on a more granular level.
  • Q: When and how often is SDRS rebalancing kicked in? A: Rebalancing happens 1) at regular interval (default 8 hours - it can be changed see Screenshot 2); 2) when threshold violation is detected like above; 3) user requests a configuration change 4) API call like clicking run SDRS via a client. 
  • Multiple VM provisioning can behave differently - less deterministically - because of other SDRS calculation factors (I/O load, capacity usage trend) and also because of particular provisioning workflow and exact timing when SDRS recommendation is called and when datastore space is really consumed. Recall that datastore reported free capacity is one of the main factors for next SDRS recommendations.
  • From vSphere 6.0 SDRS is integrated with other relevant technologies like SDRS VASA awareness (array-based thin-provisioning, deduplication, auto-tiering, snapshot, replication), Site Recovery Manager (consistency groups and protection groups considerations), vSphere Replication (replica placement recommendations), Storage Policy Based Management (moves just between same compliant VM storage policies). For further details read reference [9].
The general solution to overcome challenges highlighted above (architects call it risk mitigation) is to have "enough" free space on each datastore. Enough free storage space per datastore gives some more flexibility to SDRS algorithm. What is "enough" and how to achieve it is out of the scope of this blog post but think about following hints
  • Thin Provisioning on physical storage. See. Frank Denneman's note here [2] about thin provisioning alarm which should be considered by SDRS algorithm. I'm writing should because I had no chance to test it.
  • VVOLs, VSAN - I believe that one big VVOLs datastore (storage container) eliminate the need for datastore free space considerations.
But don't forget that there are always other considerations with potential impacts to your specific design so think holistically and use critical thinking during all design decisions.

Last but not least - I highly encourage you to study carefully the book "VMware vSphere Clustering Deepdive 5.1" to be familiar with basic SDRS algorithm and terminology.

Screenshots:
Screenshot 1: SDRS - Default VM affinity 
Screenshot 2: SDRS - Default imbalance check is 8 hours. It can be changed to X minutes, hours, days 
Videos:
Video 1: VMware vSphere SDRS VM provisioning process 

Relevant resources:
  1. Frank Denneman : Storage DRS initial placement and datastore cluster defragmentation
  2. Frank Denneman : Storage DRS Initial placement workflow
  3. Frank Denneman : Impact of Intra VM affinity rules on Storage DRS
  4. Frank Denneman : SDRS out of space avoidance
  5. Duncan Epping, Frank Denneman : VMware vSphere Clustering Deepdive 5.1
  6. David Pasek : VMware vSphere SDRS - test plan of SDRS initial placement
  7. VMware : Ignore Storage DRS Affinity Rules for Maintenance Mode
  8. David Pasek : VMware vSphere SDRS VM provisioning process 
  9. Duncan Epping : What is new for Storage DRS in vSphere 6.0?
  10. VMware KB : Storage DRS FAQ 

Thursday, May 19, 2016

VMware vSphere SDRS - test plan of SDRS initial placement

VMware vSphere Storage DRS (aka SDRS) stands for Storage Distributed Resource Scheduler. It continuously balances storage space usage and storage I/O load while avoiding resource bottlenecks to meet application service levels.

Lab environment:
5x10GB Datastores formed into Datastore Cluster with SDRS enabled.
It is configured to balance based on storage space usage and also I/O load.

  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • Each "empty" datastore has real capacity 9.75 GB where real free capacity is 8.89 GB because 882 MB is used. 

You can see configuration details on screenshot below.  


Capacity of one particular 10 GB datastore is depicted below.
Used space (882 MB) is occupied by following system files (.sf) ...


In VMFS 5 every datastore gets its own hidden files to save the file-system structure.

Test 1

Test description: Does SDRS Initial Placement algorithm take into account VM swap file capacity?

Test prerequisites:
  • All 5 datastores in datastore cluster are empty
  • That's mean that each datastore has free capacity 8.89 GB
  • Provisioned VM doesn't have any RAM reservation
Test steps:
  • Deploy Virtual Machine with 4 GB RAM and 8 GB Disk manually (through Web Client)
  • Start deployed Virtual Machine
  • Observe behavior 
Test expectations:

  • I want to test if swap file is considered during SDRS initial placement
  • We have only 8.89 GB free space on datatastores therefore if VM swap file is considered new VM with 8GB disk and 4 GB RAM wont be provisioned because we would need 12 GB space on some datastore which is not our case.
  • In other words, if provisioning fails then we will proof that SDRS doesn't take VM swap file into account.

Test screenshots:

Deployed Virtual Machine.
VM PowerOn Failure 
Test Result:

  • Virtual machine was successfully provisioned and 8GB was decreased from Datastore 5 available space.
  • Virtual machine power on action failed because of not enough storage space for 4 GB swap file. This is expected behavior in case that SDRS doesn't take VM swap into account.

Test Summary:

  • We have tested that SDRS Initial Placement algorithm does NOT take VM swap file capacity into account.
  • Virtual Machine memory (RAM) reservation would have impact on such test because if VM has for example 100% memory reservation it doesn't need any disk space for VM swap.

Test 2

Test description: How SDRS defragmentation is efficient when datastore cluster is running out of storage space?

Test prerequisites:
  • 4 datastores in datastore cluster are almost full
  • 1 datastore (Datastore4) has 7.58 GB free space
  • In one datastore (Datastore5) we have virtual machine (test1_big) having 8GB disk
Test steps:
  • Clone Virtual Machine (test1_big) to datstore cluster (through Web Client)
  • Observe behavior 
Test expectations:
  • SDRS will free up Datastore4 to have enough space for clone of virtual machine (test1_big)
  • Provisioning of virtual machine clone will be successful 

Test screenshots:
Before SDRS defragmentation
After SDRS defragmentation and clone provisioning

Test Result:
  • SDRS freed up Datastore4 as expected
  • Provisioning of virtual machine clone FAILED because of insufficient disk space on Datastore4. 
  • That's unexpected behavior because Datastore4 is empty (thanks to SDRS defragmentation) and another machine with same configuration was successfully provisioned on Datastore5.
Test Summary:

  • SDRS successfully freed up the only datastore where virtual machine clone can be placed but VM clone deployment started before storage vMotion finished therefore clone provisioning failed.
  • SDRS defragmentation works but there can be some cases when initial placement fails even the storage was freed up and there will be free continuous space in some datastore after defragmentation.
  • It is important to understand how VM provisioning to datastore cluster really works. Datastore Cluster is nothing else then the group of single datastores where SDRS is "just" a scheduler on top of Datastore Cluster. You can imagine a scheduler as a placement engine which prepare placement recommendations for initial placement and continuous balancing. That means that other software component (C# Client, Web Client, PowerCLI, vRealize Automation, vCloud Director, etc) is responsible for initial placement provisioning and SDRS give them recommendations where is the best place to put a new storage objects (vmdk file or VM config file).
  • In other words, Initial VM provisioning doesn’t have nothing to do with SDRS initial placement. VM initial provisioning process is managed by vSphere Client, vRA, vRO, PowerCLI or other software component over vSphere API. SDRS is just a placement engine gives recommendation where is the best place at the moment when is asked for recommendations. Provisioning process selects one particular SDRS recommendation and continue with provisioning (API method ApplyStorageDrsRecommendation_Task). However, in the mean time there can be some other software doing VM provisioning and selected datastore can be filled by somebody else. There is always some probability for vm provisioning failure and it is exactly where good vSphere / Storage design has crucial role to decrease probability of provisioning failure. 

Test 3

Test description: How is SDRS initial placement balancing among different datastores?

Test prerequisites:
  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • Each "empty" datastore has real capacity 9.75 GB where real free capacity is 8.89 GB because 882 MB is used. 
  • Usage of PowerCLI script to provision multiple VMs. PowerCLI script is available here.
Test steps:
  • Run PowerCLI script to generate 50 virtual machines with following specification (1 vCPU, 512 MB RAM, 1GB Disk - thick) in to datastore cluster with SDRS enabled.
  • Observe behavior
Test expectations:
  • We have datastore cluster with 5 datastores each having 8.89 GB (9,103 MB) available storage.
  • We are deploying VMs with 1000 MB each.
  • It is deployed in not power on state - so swap file doesn't need to be considered. 
  • Therefore we would expect to end up with 45 VMs balanced in round robin fashion across 5 datastores.  
Test screenshots:
Single datastore capacity
PowerCLI Automated Provisioning.
Datastore free space after automatic sequential provisioning

Test Result:
  • 45 VMs was successfully provisioned and 46th-50th VM failed because of "Insufficient disk space on datastore 'Datastore1'." This was expected behavior.
  • Following VMs are provisioned on datastores
  • Datastore 1: TEST-05, TEST-06, TEST-15, TEST-20, TEST-21, TEST-30, TEST-31, TEST-40, TEST-45 
  • Datastore 2: TEST-04, TEST-10, TEST-11, TEST-19, TEST-25, TEST-26, TEST-35, TEST-36, TEST-44
  • Datastore 3: TEST-03, TEST-09, TEST-14, TEST-18, TEST-24, TEST-29, TEST-34, TEST-39, TEST-43
  • Datastore 4: TEST-02, TEST-08, TEST-13, TEST-17, TEST-23, TEST-28, TEST-33, TEST-38, TEST-42
  • Datastore 5: TEST-01, TEST-07, TEST-12, TEST-16, TEST-22, TEST-27, TEST-32, TEST-37, TEST-41
Test Summary: Test passed as expected. Only few details are worth to mention.
  • I would expect VMs evenly distributed across datastores.  Recall that we are using artificial sequence provisioning of 1GB vDisks per VM. I would expect VMs TEST-01, TEST-06, TEST-11, TEST-16, TEST-21, TEST-26, TEST-31, TEST-36, TEST-41 on Datastore 5. And similar VM numbering on other datastores. But at the end of the day it doesn't seems to be a big deal.
  • Please, note that I observed that different provisioning runs can end-up with slightly different machine placement. I have suspicious that it is because other factors (I/O load, storage usage trend) then are also considered in SDRS algorithm.
  • Datastore free space is 103 MB on all datastores. Recall that we have Storage Space threshold set to 1 GB. That's expected behavior. Storage Space threshold is just a threshold (soft limit) used by SDRS for balancing and defragment. 
Test 4

Test description: Will be new VM provisioned to the datastore with the biggest frees space?

Test prerequisites:
  • Storage Space threshold is 1 GB
  • I/O latency threshold is kept on default 15 ms.
  • One datastore (Datastore1) is "empty" has real capacity 9.5 GB where real free capacity is 8.64 GB because 882 MB is used. 
  • One datastore (Datastore2) has 5.71 GB free capacity.
  • All other datastores (Datastore3, Datastore4, Datastore5) are almost full having only 848 MB empty.
Test steps:
  • Usage of vSphere Web Client to provision one VM with 2GB disk into Datastore Cluster.
  • Observe behavior. We are interested where new VM will be placed.
Test expectations:
  • We expect that new VM disk will be placed on Datastore1 because there is the biggest free (available) space.

Test screenshots:
Datastore cluster capacity before VM provisioning.
Datastore cluster capacity after VM provisioning.
Test Result:
  • New virtual machine was provisioned into Datastore1 where was the bigest available storage capacity.
Test Summary: Initial placement behaves as expected. New VM is placed to the datastore with less used space. However, we should be aware that this test was done just for single VM provisioning. Multiple VM provisioning can behaves differently because of other SDRS calculation factors (I/O load, capacity usage trend) and also because of particular provisioning workflow and exact timing when SDRS recommendation is called and when datastore space is really consumed for next SDRS recommendations.

Next steps

See blog post "Storage DRS Design Considerations".

And as always, any comment is appreciated.

ESXi 6 - manual partitioning for multiple VMFS filesystems on single disk device

I have to test SDRS initial placement exact behavior (blog post here) therefore I need multiple VMFS datastores to form an Datastore Cluster with SDRS. Unfortunately, I'm constraint with storage resources in my home lab therefore I would like to use one local 220GB SSD to simulate multiple VMFS datastores.

Warning: This is not recommended practice for productional systems. It is recommended to have single partition (LUN) per single device.

I did not find GUI way how to create multiple partitions per single disks therefore I used CLI.

You cannot use old good fdisk for VMFS partitions. Instead PartedUtil has to be used because of GPT. PartedUtil is included in ESXi so you can login to ESXi over SSH and use partedUtil.

Note: It is worth to mention that you can have maximally 16 partitions in GUID Partition Table (GPT) in ESXi 6. This is what I have tested. When I tried to create 17 partitions the partedUtil failed with message - "Too many partitions (17)". 

In my lab I have Intel NUC with local SSD identified as "t10.ATA_____INTEL_SSDMCEAW240A4_____________________CVDA4426003E240M____".

To get partition table use following command

 partedUtil getptbl "t10.ATA_____INTEL_SSDMCEAW240A4_____________________CVDA4426003E240M____"  

If disk is empty you should see follwoing output:
msdos
29185 255 63 468862128
I would like to have 5 x 10Gb LUNs.

10Gb is 20971519 in sectors. Sector is 512Bytes

First 2048 should be skipped to keep space for GPT and to be aligned with 512B sectors. Below is my partition plan in format [Partition number, Start sector, End Sector]
P1 2048-20973567
P2 20973568-41945087
P3 41945088-62916607
P4 62916608-83888127
P5 83888128-104859647
Following command is used to create these 5 partitions.

 partedUtil setptbl "t10.ATA_____INTEL_SSDMCEAW240A4_____________________CVDA4426003E240M____" gpt "1 2048 20973567 AA31E02A400F11DB9590000C2911D1B8 0" "2 20973568 41945087 AA31E02A400F11DB9590000C2911D1B8 0" "3 41945088 62916607 AA31E02A400F11DB9590000C2911D1B8 0" "4 62916608 83888127 AA31E02A400F11DB9590000C2911D1B8 0" "5 83888128 104859647 AA31E02A400F11DB9590000C2911D1B8 0"  

Now you can list partitions again and you should see following output:
gpt
29185 255 63 468862128
1 2048 20973567 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
2 20973568 41945087 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
3 41945088 62916607 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
4 62916608 83888127 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
5 83888128 104859647 AA31E02A400F11DB9590000C2911D1B8 vmfs 0 

So we have 5 partitions and the last step is to format it to VMFS5 file system. We have to leverage vmkfstools which is also included in ESXi 6 system.

Let's start with first partition and we will use datastore name Datastore1.

 vmkfstools -C vmfs5 -S Datastore1 t10.ATA_____INTEL_SSDMCEAW240A4_____________________CVDA4426003E240M____:1  

The same procedure has to be repeated for each partition.

Here is Perl script which generates commands for partedUtil and vmkfstools based on several variables.
 #!/usr/bin/perl  
 #  
 my $disk_device = "t10.ATA_____INTEL_SSDMCEAW240A4_____________________CVDA4426003E240M____";  
 my $starting_sector = 2048;  
 my $partition_size_in_sectors = 4000000;  
 my $datastore_prefix = "Datastore";  
 my $num_of_partitions = 16; # Maximum is 16 partitions  
 my $partitions;  
 my $start = $starting_sector;  
 my $end;  
 my $i;  
 # Generate command for GPT partitions  
 for ($i=1; $i<$num_of_partitions; $i=$i+1) {  
     $end = $start + $partition_size_in_sectors;  
     $partitions .= "\"$i $start $end AA31E02A400F11DB9590000C2911D1B8 0\" ";  
     $start = $end + 1;  
 }  
 print "partedUtil setptbl \"$disk_device\" gpt $partitions\n";  
 # Generate commands to format partitions with VMFS5 file system  
 #  
 for ($i=1; $i<$num_of_partitions; $i=$i+1) {  
     print "vmkfstools -C vmfs5 -S $datastore_prefix$i $disk_device:$i\n";  
 }  

If you want to unmount 16 datastores it is very handy to use following command line:

 esxcli storage filesystem unmount -l Datastore1   

Hope this helps at least one other vSphere dude in his lab exercises. 

Sunday, May 08, 2016

How to manage VCSA services via CLI

VMware vCenter Server Appliance (aka VCSA) is composed from several services. These services are  manageable through Web Client but in case you would need or want to use CLI here are some tips.

First of all you have to connect to VCSA via ssh and enable shell.
shell.set –enabled True
shell
Run the below command to list the services currently present on the VCSA.
service-control --list
If you want to check the status of the services, then run the below command.
service-control --status
Command above will list all the services that are present on the VCSA, even the ones that are not running listed at the end.

If you want start particular service you will have to use following syntax
service-control --start
to stop service
service-control --stop
If you want to start or stop all service use following commands
service-control --start --all
service-control --stop --all
Not very difficult, right?

VCSA is the only VMware vSphere management of the future, so don't be afraid and go VCSA!

Friday, April 29, 2016

NSX Edge Services Gateway Form Factors

NSX ESGs are automatically deployed from NSX Manager and are available in following form factors:

Compact
  • 1 vCPU
  • 512 MB RAM
  • 4,5 GB vDisk + 4 GB swap vDisk
  • 64K Connections
  • 2K Firewall rules
  • 50 concurrent sessions
  • Up to 50 users can be authenticated/login via SSL VPN Plus
Large
  • 2 vCPU
  • 1 GB RAM
  • 1M Connections
  • 2K Firewall rules
  • Up to 100 users can be authenticated/login via SSL VPN Plus
Quad Large
  • 4 vCPU
  • 1 GB RAM
  • 1M Connections
  • 2K Firewall rules
  • Up to 100 users can be authenticated/login via SSL VPN Plus
Extra Large
  • 6 vCPU
  • 8 GB RAM
  • 1M Connections
  • 2K Firewall rules
  • Up to 1000 users can be authenticated/login via SSL VPN Plus
For further details read this - NSX EDGE FEATURE AND PERFORMANCE MATRIX

Friday, April 22, 2016

VMware Tools 10.0.8 is now GA

VMware Tools 10.0.8  is now GA and live on www.vmware.com and available to all Customers.

Resolved Issues
Virtual machine performance issues after upgrading VMware tools version to 10.0.x in NSX and VMware vCloud Networking and Security 5.5.x

While upgrading VMware Tools version to 10.0x in a NSX 6.x and VMware vCloud Networking and Security 5.5.x environment, the performance of the guest operating system in the virtual machine becomes slow and unresponsive. A number of operations like, logging in and logging off through an RDP session, response for an IIS website and launching applications become slow and unresponsive.
This issue occured due to a known issue with VMware Tools version 10.0.x. This issue is resolved in this release. For more information see KB 2144236.

Full Release Notes are available here.

Broader context ...
In the past, some customers who did not needed vShield components did not installed these VMtools components to mitigate these performance and unavailability risks. It is possible to remove vShield component from installation process.

 VMware-tools-9.x.x-yyyy.exe /v /qb-! REINSTALLMODE=vomus ADDLOCAL=All REMOVE=Hgfs,WYSE,Audio,BootCamp,Unity,VShield REBOOT=ReallySuppress  

Please be aware, that in newer VMtools version vShiled component was split to two more specific components -  FileIntrospection and NetworkIntrospection.

 VMware-tools-9.4.12-2627939-x86_64.exe /v /qb-! REINSTALLMODE=vomus ADDLOCAL=All REMOVE=Audio,BootCamp,FileIntrospection,Hgfs,NetworkIntrospection,Unity REBOOT=R  

For further information of all Names of VMware Tools Components Used in Silent Installations see. VMware vSphere documentation here.

Keywords:
vmtools, vshield,  fileintrospection, networkintrospection

Friday, April 15, 2016

PowerCLI - Recent servers file is corrupt

This is just short post because I have experiences PowerCLI warning "Recent servers file is corrupt" depicted below.

 PS C:\Users\Administrator> C:\Users\Administrator\Documents\scripts\Cluster_hosts_vCPU_pCPU_report.ps1  
 WARNING: Recent servers file is corrupt: C:\Users\Administrator\AppData\Roaming\VMware\PowerCLI\RecentServerList.xml  
 UTC date time: 04/15/2016 12:32:52 Cluster: Cluster ESX name: esx01.home.uw.cz.Name pCPUs: 2 vCPUs: 19 vCPU/pCPU ratio: 9.5  
 UTC date time: 04/15/2016 12:32:52 Cluster: Cluster ESX name: esx02.home.uw.cz.Name pCPUs: 2 vCPUs: 12 vCPU/pCPU ratio: 6  

Google returns just one result here.

The solution to get out warning message is fairly simple.

Just remove corrupted file  C:\Users\Administrator\AppData\Roaming\VMware\PowerCLI\RecentServerList.xml which will be created during the next PowerCLI run.

Hope this helps some other folks.

Wednesday, April 06, 2016

ESXi host vCPU/pCPU reporting via PowerCLI to LogInsight

Some time ago I had a discussion with one of my customers how to achieve vCPU/pCPU ratio 1:1 on their Tier 1 cluster. Unfortunately, there is not any out-of-the box vSphere policy to achieve it. You can try to use vSphere HA Cluster admission control with advanced settings to achieve such requirement but it is based on CPU reservations in MHz so it would be tricky settings anyway with some additional risks for example after physical server hardware replacement.

At the end of the day we agreed that the goal can be achieved by Monitoring and Capacity Planning process. One can probably leverage VMware vRealize Operations Manager (aka vROps) or similar monitoring platform but because my customer does not have vROps and I'm not vROps expert I realized there is very simple alternative.

Let's leverage PowerShell/PowerCLI to report vCPU/pCPU ratio of ESXi hosts. 

As you can see in the script below it is pretty easy task to prepare PowerCLI report however the question is how to visualize it and send alerts in case of exceeded threshold.

And that's another simple idea. Why not leverage vRealize LogInsight?

All my readers most probably know what VMware's LogInsight (LI) is but just in case - LI is highly available and scalable syslog server appliance which main business value is an excellent reporting capabilities from unstructured data. I don't want to describe LogInsight in this blog post but another interesting feature of LI besides syslog messages it also accepts JSON messages sent via API. For more details look here.

So the whole solution is conceptually pretty easy. Bellow is the high level process.

  1. PowerCLI : Go through each ESXi and calculates vCPU/pCPU ratio
  2. PowerCLI : Compose a message including vCPU/pCPU ratio together with additional context information like timestamp, cluster name, ESXi name, number of vCPU and pCPU
  3. PowerCLI : Send the message to LogInsight via REST API
  4. LogInsight : Prepare custom analytics and create Dashboard
  5. LogInsight: Create alert to send e-mail message or trigger web hook when threshold is exceeded    
The latest script version is at GITHUB. Below is complete PowerCLI script ...

 #################################  
 # vCenter Server configuration  
 #  
 $vcenter = “vc01.home.uw.cz“  
 $vcenteruser = “readonly“  
 $vcenterpw = “readonly“  
 $loginsight = "192.168.4.51"  
 #################################  
   
 $o = Add-PSSnapin VMware.VimAutomation.Core  
 $o = Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false  
   
 #################################  
 # Connect to vCenter Server  
 $vc = connect-viserver $vcenter -User $vcenteruser -Password $vcenterpw  
   
 #################################  
 # Send Message to LogInsight  
 function Send-LogInsightMessage ([string]$ip, [string]$message)  
 {  
  $uri = "http://" + $ip + ":9000/api/v1/messages/ingest/1"  
  $content_type = "application/json"  
  $body = '{"messages":[{"text":"'+ $message +' "}]}'  
  $r = Invoke-RestMethod -Uri $uri -ContentType $content_type -Method Post -Body $body  
 }  
   
 #################################  
 # Count vCPU/pCPU Ratio  
 foreach ($esx in (Get-VMHost | Sort-Object Name)) {  
  $pCPUs = $esx.NumCpu  
  $vCPUs = ($esx | get-vm | Measure-Object -Sum NumCPU).Sum  
  $CPU_ratio = $vCPUs / $pCPUs  
  $date = (Get-Date).ToUniversalTime()  
  $cluster_name = get-cluster -VMHost $esx  
   
  $message = "UTC date time: $date Cluster: $cluster_name ESX name: $esx.Name pCPUs: $pCPUs vCPUs: $vCPUs vCPU/pCPU ratio: $CPU_ratio"  
  Write-Output $message  
  Send-LogInsightMessage "192.168.4.51" $message  
 }  

 disconnect-viserver -Server $vc -Force -Confirm:$false  

The PowerCLI script running in my home lab generate messages depicted below ...

 PS C:\Users\Administrator\Documents\scripts> .\Cluster_hosts_vCPU_pCPU_report.ps1  
 UTC date time: 04/06/2016 12:49:32 Cluster: Cluster ESX name: esx01.home.uw.cz.Name pCPUs: 2 vCPUs: 18 vCPU/pCPU ratio: 9  
 UTC date time: 04/06/2016 12:49:33 Cluster: Cluster ESX name: esx02.home.uw.cz.Name pCPUs: 2 vCPUs: 12 vCPU/pCPU ratio: 6  

I use scheduled tasks to send these messages periodically to LogInsight. You can see LogInsight messages in screenshot below ...

LogInsight Interactive Analysis
It is very simple to create dashboard from the analytic ...

LogInsight vCPU/pCPU Dashboard

And the last task is to create alert in LogInsight when vCPU/pCPU ratio is higher then 1 or you can be informed little bit earlier so you can set an alert when ratio is higher then 0.8 ...


Pretty easy, right?
Hope this helps broader VMware community.

And as always, any comments and thoughts are very welcome.

Monday, March 21, 2016

How to update ESXi via CLI

If you don't want to use VMware Update Manager (VUM) you can leverage several CLI update alternatives.

First of all you should download patch bundle from VMware Product Patches page available at http://www.vmware.com/go/downloadpatches. It is important to know that patch bundles are cumulative. That means you need to download and install only the latest Patch Bundle to make ESXi fully patched.

ESXCLI
You can use esxcli command on each ESXi host.

To list image profiles that are provided by the Patch Bundle use following command
esxcli software sources profile list -d /path/to/.zip
The output will look like this:
[root@esx01:~] esxcli software sources profile list -d /vmfs/volumes/NFS-SYNOLOGY-SATA/ISO/update-from-esxi6.0-6.0_update02.zip
Name                              Vendor        Acceptance Level
--------------------------------  ------------  ----------------
ESXi-6.0.0-20160301001s-no-tools  VMware, Inc.  PartnerSupported
ESXi-6.0.0-20160302001-standard   VMware, Inc.  PartnerSupported
ESXi-6.0.0-20160301001s-standard  VMware, Inc.  PartnerSupported
ESXi-6.0.0-20160302001-no-tools   VMware, Inc.  PartnerSupported
Now you can update the system with a specific profile:
esxcli software profile update -d /vmfs/volumes/NFS-SYNOLOGY-SATA/ISO/update-from-esxi6.0-6.0_update02.zip -p ESXi-6.0.0-20160302001-no-tools 
The output will look like this:
[root@esx01:~] esxcli software profile update -d /vmfs/volumes/NFS-SYNOLOGY-SATA/ISO/update-from-esxi6.0-6.0_update02.zip -p ESXi-6.0.0-20160302001-no-tools 
Update Result   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.   Reboot Required: true

The last task is to reboot ESXi host as seen in the output above.
[root@esx01:~] reboot 
After reboot, you can ssh to ESXi host and verify current version.
[root@esx01:~] esxcli system version get   Product: VMware ESXi   Version: 6.0.0   Build: Releasebuild-3620759   Update: 2   Patch: 34

Note 1: The VMware online software depot is located at https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml, therefore you can use this online depot instead of local depot downloaded manually from VMware download site. To allow outgoing HTTP (tcp ports 80,443) you have to enable httpClient rule in ESXi firewall.
esxcli network firewall ruleset set -e true -r httpClient

To list profiles ...
esxcli software sources profile list -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

To update ESXi host into a particular profile ...
esxcli software profile update -d
https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml
-p ESXi-6.0.0-20160302001-no-tools 

you can disable it after update
esxcli network firewall ruleset set -e false -r httpClient

Note 2: You can run an ESXCLI vCLI command remotely against a specific host or against a vCenter Server system.


ESXCLI over PowerCLI
The same can be done via PowerCLI. The code below is optimized for ESXCLI-Version2 releases in PowerCLI 6.3 R1.

#get esxcli object on particular host
$esxcli = Get-EsxCli -VMhost -V2

#list profiles in patch bundle
$arguments = $esxcli2.software.profile.list.CreateArgs()
$arguments.depot = "vmfs/volumes///update-from-esxi6.0-6.0_update02.zip"
$esxcli2.software.profile.update.Invoke($arguments)

#update to patch bundle profile
$arguments = $esxcli2.software.profile.update.CreateArgs()
$arguments.depot = "vmfs/volumes///update-from-esxi6.0-6.0_update02.zip"
$arguments.profile = "ESXi-5.5.0-profile-standard"
$esxcli2.software.profile.update.Invoke($arguments)

PowerCLI Install-VMHostPatch
You can also use special PowerCLI cmdlet Install-VMHostPatch

  1. Download the Update file “ESXi Offline Bundle” update-from-esxi6.0-6.0_update02.zip
  2. Extract the ZIP file and upload the resulting folder to a datastore on the Virtual Host.
  3. Put host in to maintenance mode
  4. Open PowerCLI
  5. Connect-VIServer
  6. Install-VMHostPatch -HostPath /vmfs/volumes/Datastore/update-from-esxi6.0-6.0_update02/metadata.zip
Note: For Install-VMHostPatch method Patch Bundle must be explicitly unzipped. 

References:
  • VMware Product Patches
  • VMware : Are ESXi Patches Cumulative 
  • Andreas Peetz : Are ESXi 5.x patches cumulative?
  • Quickest Way to Patch an ESX/ESXi Using the Command-line
  • Install-VMHostPatch
  • Home Lab Upgrade to 6.0u2
  • Friday, March 18, 2016

    What's new in PowerCLI 6.3 R1?

    PowerCLI 6.3 R1 introduces the following new features and improvements:

    Get-VM is now faster than ever!
    The Get-VM Cmdlet has been optimized and refactored to ensure maximum speed when returning larger numbers of virtual machine information. This was a request which we heard time and time again, when you start working in larger environments with thousands of VMs the most used cmdlet is Get-VM so making this faster means this will increase the speed of reporting and automation for all scripts using Get-VM. Stay tuned for a future post where we will be showing some figures from our test environment but believe me, it’s fast!


    New-ContentLibrary access
    New in this release we have introduced a new cmdlet for working with Content Library items, the Get-ContentLibraryItem cmdlet will list all content library items from all content libraries available to the connection. This will give you details and set you up for deploying in our next new feature…. 
    The New-VM Cmdlet has been updated to allow for the deployment of items located in a Content Library. Use the new –ContentLibrary parameter with a content library item to deploy these from local and subscribed library items, a quick sample of this can be seen below:

    $CLItem = Get-ContentLibraryItem TTYLinux
    New-VM -Name "NewCLItem" -ContentLibraryItem $CLItem -Datastore datastore1 -VMHost 10.160.74.38
    Or even simpler….
    Get-ContentLibraryItem -Name TTYLinux | New-VM -Datastore datastore1 -VMHost 10.160.74.38

    ESXCLI is now easier to use
    Another great feature which has been added has again come from our community and users who have told us what is hard about our current version, the Get-Esxcli cmdlet has now been updated with a –V2 parameter which supports specifying method arguments by name.
    The original Get-ESXCLI cmdlet (without -v2) passes arguments by position and can cause scripts to not work when working with multiple ESXi versions or using scripts written against specific ESXi versions.

    A simple example of using the previous version is as follows:
    $esxcli = Get-ESXCLI -VMHost (Get-VMhost | Select -first 1)
    $esxcli.network.diag.ping(2,$null,$null,“10.0.0.8”,$null,$null,$null,$null,$null,$null,$null,$null,$null)

    Notice all the $nulls ?  Now check out the V2 version:

    $esxcli2 = Get-ESXCLI -VMHost (Get-VMhost | Select -first 1) -V2
    $arguments = $esxcli2.network.diag.ping.CreateArgs()
    $arguments.count = 2
    $arguments.host = "10.0.0.8"
    $esxcli2.network.diag.ping.Invoke($arguments)

    Get-View, better than ever
    For the more advanced users out there, those who constantly use the Get-View Cmdlet you will be pleased to know that a small but handy change has been made to the cmldet to enable it to auto-complete all available view objects in the Get-View –ViewType parameter, this will ease in the use of this cmdlet and enable even faster creation of scripts using this cmdlet.

    Updated Support
    As well as the great enhancements to the product listed above we have also updated the product to make sure it has now been fully tested and works with  Windows 10 and PowerShell v5, this enables the latest versions and features of PowerShell to be used with PowerCLI.
    PowerCLI has also been updated to now support vCloud Director 8.0 and vRealize Operations Manager 6.2 ensuring you can also work with the latest VMware products.

    More Information and Download
    For more information on changes made in vSphere PowerCLI 6.3 Release 1, including improvements, security enhancements, and deprecated features, see the vSphere PowerCLI Change Log. For more information on specific product features, see the VMware vSphere PowerCLI 6.3 Release 1 User’s Guide. For more information on specific cmdlets, see the VMware vSphere PowerCLI 6.3 Release 1 Cmdlet Reference.

    You can find the PowerCLI 6.3 Release 1 download HERE. Get it today!

    Wednesday, March 16, 2016

    General recommendations for stretched vSphere HA Cluster aka Metro Cluster Storage (vMSC)

    This is just a brief blog post with general recommendations for VMware vSphere Metro Cluster Storage (aka vMSC). For more holistic view, please read white paper "VMware vSphere Metro Storage Cluster Recommended Practices"

    vSphere HA Cluster Recommended Configuration Settings:
    • Set Admission Control - Failover capacity by defining percentage of the cluster (50% for CPU and Memory)
    • Set Host Isolation Response - Power Off and Restart VMs
    • Specify multiple host isolation addresses - Advanced configuration option das.isolationaddressX
    • Disable default gateway as host isolation address - Advanced configuration option das.useDefaultIsolationAddress=false
    • Change the default settings of vSphere HA and configure it to Respect VM to Host affinity rules during failover - Advanced configuration option das.respectVmHostSoftAffinityRules=true
    • The minimum number of heartbeat datastores is two and the maximum is five. VMware recommends increasing the number of heartbeat datastores from two to four in a stretched cluster environment Advanced configuration option das.heartbeatDsPerHost=4
    • VMware recommends using "Select any of the cluster datastores taking into account my preferences" for heartbeat datastores and choose two datastores (active distributed volumes/LUNs) on each site
    • PDL and APD considerations depends on stretched cluster mode (uniform/non-uniform). However, VMware recommends to configure PDL/APD responses therefore VM Component Protection (VMCP) must be enabled and response should be set to "Power Off and Restart VMs - Conservative". Detail configuration should be discussed with particular storage vendor. 
    vSphere DRS Recommended Configuration Settings:
    • DRS mode - Fully automated
    • Use DRS VM/Host rules to set VM per site locality
    • Use DRS "Should Rules" and avoid the use of "Must Rules"
    SIOC/SDRS

    • Based on KB 2042596 SIOC is not supported
    • Based on KB 2042596 SDRS is only supported when the IO Metric function is disabled.

    Distributed (stretched) Storage Recommendations:
    • Always consult your configuration with your storage vendor
    • VMware highly recommends to use storage witness (aka arbitrator, tie-braker, etc.) in third site.
    Custom automation for compliance check and / or operational procedures Recommendations:
    • VMware recommends manually defining “sites” by creating a group of hosts that belong to a site and then adding VMs to these sites based on the affinity of the datastore on which they are provisioned. 
    • VMware recommends automating the process of defining site affinity by using tools such as VMware vCenter OrchestratorTM or VMware vSphere PowerCLITM. 
    • If automating the process is not an option, use of a generic naming convention is recommended to simplify the creation of these groups. 
    • VMware recommends that these groups be validated on a regular basis to ensure that all VMs belong to the group with the correct site affinity.
    Other relevant references:

    Friday, March 04, 2016

    How to show vCenter Instance configuration?

    Login to vCenter Server Appliance (VCSA) via ssh.

    Enable BASH access: "shell.set --enabled True"
    Launch BASH: "shell"

    Run following command to list vCenter Instance configuration.

    vc01:/etc/vmware-vpx # cat /etc/vmware-vpx/instance.cfg 
    applicationDN=dc\=virtualcenter,dc\=vmware,dc\=int
    instanceUuid=b7cc1468-6d27-4117-943f-7b1b4485028b
    ldapPort=389
    ldapInstanceName=VMwareVCMSDS
    ldapStoragePath=/etc/vmware-vpx/

    vCenter UUID is very important identifier which is unique identification of particular instance in external systems like Vmware Platform Service Controller (PSC), vROps, SRM, etc.

    UUID is in our example b7cc1468-6d27-4117-943f-7b1b4485028b

    Cisco Virtual Switch Update Manager

    Do you have Cisco Nexus 1000V in your vSphere environment? Then VSUM can be pretty handy toll for you.

    VSUM is a free virtual appliance from Cisco that integrates into the vSphere Web Client. Once deployed, VSUM allows you to do the following actions from the web client:

    • Deploy Nexus 1000v and Application Virtual Switch (AVS)
    • Upgrade the 1000v and AVS
    • Migrate virtual networking from vSwitch/VDS
    • Monitor your 1000v/AVS environment                              

    In other words, Cisco VSUM is a virtual appliance that is registered as a plug-in to VMware vCenter Server. The Cisco VSUM user interface is an integral part of VMware vSphere Web Client. The Cisco VSUM enables you to install, migrate, monitor, and upgrade the VSMs in high availability (HA) or standalone mode and the VEMs on ESX/ESXi hosts.



    Wednesday, February 17, 2016

    How to identify from the guest OS on which vCenter is virtual machine registered?

    One my customer asked me how to identify - from the VM guest operating system - in which vCenter server is that particular virtual machine registered.

    They use VM deployment from VM Templates with Customization Specifications and they would like to use vCenter locality information for additional tasks during VM deployment process.

    I was thinking about several possibilities. Considered options are listed below.

    Considered options:

    • OPTION 1: Define specific customization profile for each vCenter and have a special guest OS specific command in Customization Specification to run after sysprep and save vCenter identification somewhere to the guest file system.
    • OPTION 2: Use VM mac address for identification of vCenter Server Instance.
    • OPTION 3: leverage PowerCLI or vCLI running in guest os to communicate with vCenter.
    • OPTION 4: leverage custom VM guestinfo properties which can be read inside Guest OS.

    Option 3 is not good option at all because you would need to have network connectivity from production VMs to vCenter (management network) and therefore it has negative impact on overall security.

    Option 4 is described by William Lam here. It would need special VM templates having custom VM property like guestinfo.vcenter=VC01 which is visible in the guest info through vmtools. The command in the guest would look like
    vmtoolsd --cmd "info-get guestinfo.vcenter"
    Option 1 is relatively easy and it is leveraging the fact that Customization Specification for deployment of VM templates can run some script in guest after template deployment. I think that Option 1 is relatively good option. The only drawback is that vSphere admin would need to manage more customization specification and specific scripts to store vCenter identification somewhere in guest os filesystem which introduces some additional management overhead but it is acceptable if you ask me.

    Option 2 intrigued me technically so let's elaborate on this option. Option 2 is leveraging the fact that "vCenter Server instance ID" is used for generating virtual machine MAC addresses and MAC address is well known digital identifier which can be relatively simply identified in any operating system.  So what this "vCenter Server instance ID" is? Each vCenter Server system has a vCenter Server instance ID. This ID is a number between 0 and 63 which is randomly generated at installation time, but can be reconfigured after installation. Here in vSphere 6.0 documentation is written that ... According to this scheme, a MAC address has the following format:
    00:50:56:XX:YY:ZZ
    where 00:50:56 represents the VMware OUI,
    XX is calculated as (80 + vCenter Server Instance ID),
    and YY:ZZ is a random number.
    Note 1:
    The formula above (80 + vCenter Server Instance ID) is in hexadecimal format therefore in decimal format it is 128 + vCenter Server Instance ID.
    Note 2:
    vCenter Server unique ID is generated randomly during vCenter installation. It can be changed after installation in the Runtime Settings section from the General settings of the vCenter Server instance and restart it. Please be aware, that existing Virtual Machines MAC addresses are not changed automatically after ID reconfiguration therefore it is good idea to change vCenter Server unique ID immediately after vCenter Server installation. There are methods how to regenerate VM mac addresses but it requires VM downtime. For more information look at VMware KB 1024025
    Below is PowerShell script example of in-guest calculation of vCenter Server Instance ID.
    $mac_str = Get-CimInstance win32_networkadapterconfiguration | where {$_.ServiceName -eq "vmxnet3ndis6"} | select macaddress | Out-String
    $mac_arr = $mac_str.split(':')
    $XX_hex = $mac_arr[3];
    $XX_dec = [Convert]::ToInt32($XX, 16)
    $VC_instance_ID = $XX_dec - 128
    $VC_instance_ID
    The script above is just an example written for Windows OS (Win2012R2) and PowerShell (4.0) to show how to automate the trick described in this blog post. Similar scripts can be prepared for other guest operating systems.

    Disclaimer:
    The script above is just an example and it works in my lab environment. You should carefully test if your script inspired by this blog post works correctly in your particular environment. I don't take any responsibility for the script and you use it in your own risk. I have spent just few minutes to write this script and I would definitely recommend to invest some more time on development and test if you want to use such script in production environment.
    Know caveats of option 2 (vCenter identification based on VM MAC address) :

    • This solution will only work for dynamically assigned MAC addresses by vCenter and not for statically configured MAC addresses by administrator
    • This solution will not work correctly for cross vCenter vMotioned VMs because they are keeping the MAC address from original vCenter
    • I didn't test how behaves VM's recovered by VMware SRM (Site Recovery Manager). If recovered VM's keep original MAC address then this solution will not work for these recovered VMs. Unfortunately, I don't have access to SRM lab to verify SRM behavior. 

    I would recommend to my customer to consider between options (1) and (2).

    Hope this helps to broader IT community and as always ... your feedback is very welcome so don't hesitate to use comments, twitter or e-mail to share your opinions and other solution alternatives.