Friday, September 01, 2017

ESXi Physical NIC Capabilities for NSX VTEP

NSX VTEP encapsulation significantly benefits from physical NIC offload capabilities. In this blog post, I will show  how to identify NIC capabilities.

Check NIC type and driver

esxcli network nic get -n vmnic4
[dpasek@esx01:~] esxcli network nic get -n vmnic4
   Advertised Auto Negotiation: false
   Advertised Link Modes: 10000BaseT/Full
   Auto Negotiation: false
   Cable Type: FIBRE
   Current Message Level: 0
   Driver Info: 
         Bus Info: 0000:05:00.0
         Driver: bnx2x
         Firmware Version: bc 7.13.75
         Version: 2.713.10.v60.4
   Link Detected: true
   Link Status: Up 
   Name: vmnic4
   PHYAddress: 1
   Pause Autonegotiate: false
   Pause RX: true
   Pause TX: true
   Supported Ports: FIBRE
   Supports Auto Negotiation: false
   Supports Pause: true
   Supports Wakeon: false
   Transceiver: internal
   Virtual Address: 00:50:56:59:d8:8c
   Wakeon: None
[dpasek@czchoesint203:~] 

esxcli software vib list | grep bnx2x
[dpasek@esx01:~] esxcli software vib list | grep bnx2x
net-bnx2x                      2.713.10.v60.4-1OEM.600.0.0.2494585   QLogic     VMwareCertified   2017-05-10  
[dpasek@czchoesint203:~]

Driver parameters can be listed by command …
esxcli system module parameters list -m bnx2x
[dpasek@esx01:~] esxcli system module parameters list -m bnx2x
Name                                  Type          Value  Description                                                                                                                                                                                                                                                                                    
------------------------------------  ------------  -----  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RSS                                   int                  Controls the number of queues in an RSS pool. Supported Values 2-4.                                                                                                                                                                                                                            
autogreeen                            uint                  Set autoGrEEEn (0:HW default; 1:force on; 2:force off)                                                                                                                                                                                                                                        
bnx2x_vf_passthru_wait_event_timeout  uint                 For debug purposes, set the value timeout value on VF OP to complete in ms                                                                                                                                                                                                                     
debug                                 uint                  Default debug msglevel                                                                                                                                                                                                                                                                        
debug_unhide_nics                     int                  Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1.  In SRIOV mode expose the PF                                                                                                                                                                   
disable_feat_preemptible              int                  For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1                                                                                                                                                                                                                            
disable_fw_dmp                        int                  For debug purposes, disable firmware dump  feature when set to value of 1                                                                                                                                                                                                                      
disable_iscsi_ooo                     uint                  Disable iSCSI OOO support                                                                                                                                                                                                                                                                     
disable_rss_dyn                       int                  For debug purposes, disable RSS_DYN feature when set to value of 1                                                                                                                                                                                                                             
disable_tpa                           uint                  Disable the TPA (LRO) feature                                                                                                                                                                                                                                                                 
disable_vxlan_filter                  int                  Enable/disable vxlan filtering feature. Default:1, Enable:0, Disable:1                                                                                                                                                                                                                         
dropless_fc                           uint                  Pause on exhausted host ring                                                                                                                                                                                                                                                                  
eee                                                        set EEE Tx LPI timer with this value; 0: HW default; -1: Force disable EEE.                                                                                                                                                                                                                    
enable_default_queue_filters          int                  Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode]                                                                                                                                                                                   
enable_geneve_ofld                    int                  Enable/Disable GENEVE offloads. 1: [Default] Enable GENEVE Offloads. 0: Disable GENEVE Offloads.                                                                                                                                                                                               
enable_live_grcdump                   int                  Enable live GRC dump 0x0: Disable live GRC dump, 0x1: Enable Parity/Live GRC dump [Enabled by default], 0x2: Enable Tx timeout GRC dump, 0x4: Enable Stats timeout GRC dump                                                                                                                    
enable_vxlan_ofld                     int                  Allow vxlan TSO/CSO offload support.[Default is enabled, 1: enable vxlan offload, 0: disable vxlan offload]                                                                                                                                                                                    
heap_initial                          int                  Initial heap size allocated for the driver.                                                                                                                                                                                                                                                    
heap_max                              int                  Maximum attainable heap size for the driver.                                                                                                                                                                                                                                                   
int_mode                              uint                  Force interrupt mode other than MSI-X (1 INT#x; 2 MSI)                                                                                                                                                                                                                                        
max_agg_size_param                    uint                 max aggregation size                                                                                                                                                                                                                                                                           
max_vfs                               array of int         Number of Virtual Functions: 0 = disable (default), 1-64 = enable this many VFs                                                                                                                                                                                                                
mrrs                                  int                   Force Max Read Req Size (0..3) (for debug)                                                                                                                                                                                                                                                    
multi_rx_filters                      int                  Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue
native_eee                            int                                                                                                                                                                                                                                                                                                                 
num_queues                            int                   Set number of queues (default is as a number of CPUs)                                                                                                                                                                                                                                         
num_queues_on_default_queue           int                  Controls the number of RSS queues ( 1 or more) enabled on the default queue. Supported Values 1-7, Default=4                                                                                                                                                                                   
num_rss_pools                         int                  Control the existance of an RSS pool. When 0,RSS pool is disabled. When 1, there will be an RSS pool (given that RSS>0).                                                                                                                                                                       
poll                                  uint                  Use polling (for debug)                                                                                                                                                                                                                                                                       
pri_map                               uint                  Priority to HW queue mapping                                                                                                                                                                                                                                                                  
psod_on_panic                         int                   PSOD on panic                                                                                                                                                                                                                                                                                 
rss_on_default_queue                  int                  RSS feature on default queue on eachphysical function that is an L2 function. Enable=1, Disable=0. Default=0                                                                                                                                                                                   
skb_mpool_initial                     int                  Driver's minimum private socket buffer memory pool size.                                                                                                                                                                                                                                       
skb_mpool_max                         int                  Maximum attainable private socket buffer memory pool size for the driver.                                                                                                                                                                                                                      

To get driver parameter value
esxcfg-module --get-options bnx2x 
[dpasek@esx01:~] esxcfg-module -g bnx2x
bnx2x enabled = 1 options = ''

-->


HCL device identifiers

vmkchdev -l | grep vmnic
[dpasek@esx01:~] vmkchdev -l | grep vmnic
0000:02:00.0 14e4:1657 103c:22be vmkernel vmnic0
0000:02:00.1 14e4:1657 103c:22be vmkernel vmnic1
0000:02:00.2 14e4:1657 103c:22be vmkernel vmnic2
0000:02:00.3 14e4:1657 103c:22be vmkernel vmnic3
0000:05:00.0 14e4:168e 103c:339d vmkernel vmnic4
0000:05:00.1 14e4:168e 103c:339d vmkernel vmnic5
0000:88:00.0 14e4:168e 103c:339d vmkernel vmnic6
0000:88:00.1 14e4:168e 103c:339d vmkernel vmnic7
So, in case of vmknic4 there is
·       VID:DID SVID:SSID
·       14e4:168e 103c:339d 

Check TSO configuration

esxcli network nic tso get
[dpasek@esx01:~] esxcli network nic tso get
NIC     Value
------  -----
vmnic0  on   
vmnic1  on   
vmnic2  on   
vmnic3  on   
vmnic4  on   
vmnic5  on   
vmnic6  on   
vmnic7  on   

esxcli system settings advanced list -o /Net/UseHwTSO
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/UseHwTSO
   Path: /Net/UseHwTSO
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: When non-zero, use pNIC HW TSO offload if available

If you want disable TSO, use following commands …
esxcli network nic software set --ipv4tso = 0 -n vmnicX
esxcli network nic software set --ipv6tso = 0 -n vmnicX

Guest OS TSO settings in Linux OS can be changed by command …
ethtool -K ethX tso on/ off

Check LRO configuration

esxcli system settings advanced list -o /Net/TcpipDefLROEnabled
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/TcpipDefLROEnabled
   Path: /Net/TcpipDefLROEnabled
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: LRO enabled for TCP/IP

vmxnet settings can be validated by command …
esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
[dpasek@esx01:~] esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
   Path: /Net/Vmxnet3HwLRO
   Type: integer
   Int Value: 1
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value: 
   Default String Value: 
   Valid Characters: 
   Description: Whether to enable HW LRO on pkts going to a LPD capable vmxnet3

Set guest OS LRO settings in linux OS …
ethtool -k ethx lro on/ off.

Check Checksum offload configuration

ESX settings
esxcli network nic cso get
[dpasek@esx01:~] esxcli network nic cso get
NIC     RX Checksum Offload  TX Checksum Offload
------  -------------------  -------------------
vmnic0  on                   on                 
vmnic1  on                   on                 
vmnic2  on                   on                 
vmnic3  on                   on                 
vmnic4  on                   on                 
vmnic5  on                   on                 
vmnic6  on                   on                 
vmnic7  on                   on                 
[dpasek@czchoesint203:~] 

The following command can be used for disabling CSO for a specific pNIC:
esxcli network nic cso set -n vmnicX



Check VXLAN offloading

vsish -e get /net/pNics/vmnic4/properties
dpasek@esx01:~] vsish -e get /net/pNics/vmnic4/properties
properties {
   Driver Name:bnx2x
   Driver Version:2.713.10.v60.4
   Driver Firmware Version:bc 7.13.75
   System Device Name:vmnic4
   Module Interface Used By The Driver:vmklinux
   Device Hardware Cap Supported:: 0x493c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_ENCAP VMNET_CAP_GENEVE_OFFLOAD VMNET_CAP_SCHED
   Device Hardware Cap Activated:: 0x403c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_SCHED
   Device Software Cap Activated:: 0x30800000 -> VMNET_CAP_RDONLY_INETHDRS VMNET_CAP_IP6_CSUM_EXT_HDRS VMNET_CAP_TSO6_EXT_HDRS
   Device Software Assistance Activated:: 0 -> No matching defined enum value found.
   PCI Segment:0
   PCI Bus:5
   PCI Slot:0
   PCI Fn:0
   Device NUMA Node:0
   PCI Vendor:0x14e4
   PCI Device ID:0x168e
   Link Up:1
   Operational Status:1
   Administrative Status:1
   Full Duplex:1
   Auto Negotiation:0
   Speed (Mb/s):10000
   Uplink Port ID:0x0400000a
   Flags:: 0x41e0e -> DEVICE_PRESENT DEVICE_OPENED DEVICE_EVENT_NOTIFIED DEVICE_SCHED_CONNECTED DEVICE_USE_RESPOOLS_CFG DEVICE_RESPOOLS_SCHED_ALLOWED DEVICE_RESPOOLS_SCHED_SUPPORTED DEIVCE_ASSOCIATED
   Network Hint:
   MAC address:9c:dc:71:db:d0:38
   VLanHwTxAccel:1
   VLanHwRxAccel:1
   States:: 0xff -> DEVICE_PRESENT DEVICE_READY DEVICE_RUNNING DEVICE_QUEUE_OK DEVICE_LINK_OK DEVICE_PROMISC DEVICE_BROADCAST DEVICE_MULTICAST
   Pseudo Device:0
   Legacy vmklinux device:1
   Respools sched allowed:1
   Respools sched supported:1
}

VXLAN offload capability is called 'VMNET_CAP_ENCAP'. That's what you need to look for.
vsish -e get /net/pNics/vmnic4/properties | grep VMNET_CAP_ENCAP
[dpasek@esx01:~] vsish -e get /net/pNics/vmnic4/properties | grep VMNET_CAP_ENCAP
   Device Hardware Cap Supported:: 0x493c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_ENCAP VMNET_CAP_GENEVE_OFFLOAD VMNET_CAP_SCHED

1.1.7     Check VMDq (NetQueue)

esxcli network nic queue filterclass list
This esxcli command shows information about the filters supported per vmnic and used by NetQueue.
[dpasek@esx01:~] esxcli network nic queue filterclass list
NIC     MacOnly  VlanOnly  VlanMac  Vxlan  Geneve  GenericEncap
------  -------  --------  -------  -----  ------  ------------
vmnic0    false     false    false  false   false         false
vmnic1    false     false    false  false   false         false
vmnic2    false     false    false  false   false         false
vmnic3    false     false    false  false   false         false
vmnic4     true     false    false  false   false         false
vmnic5     true     false    false  false   false         false
vmnic6     true     false    false  false   false         false
vmnic7     true     false    false  false   false         false

1.1.8     Dynamic NetQ

The following command will output the queues for all vmnics in your ESXi host.
esxcli network nic queue count get
[dpasek@esx01:~] esxcli network nic queue count get 
NIC     Tx netqueue count  Rx netqueue count
------  -----------------  -----------------
vmnic0                  1                  1
vmnic1                  1                  1
vmnic2                  1                  1
vmnic3                  1                  1
vmnic4                  8                  5
vmnic5                  8                  5
vmnic6                  8                  5
vmnic7                  8                  5

It is possible to disable NetQueue on a ESXi host level using the following command:
esxcli system settings kernel set --setting =" netNetqueueEnabled" --value =" false"

VXLAN PERFORMANCE

RSS can help in case VXLAN is used because VXLAN traffic can be distributed among multiple hardware queues. NICs that offer RSS have a throughput around 9 Gbps but NICs that do not only have a throughput of around 6 Gbps. Therefore, the right choice of physical NIC is critical.

VMworld 2017 interesting sessions available online

This week, VMworld 2017 happened in US, Las Vegas.  For those, who were not able to attend, several session were recorded and published on YouTube.

Here is the list of sessions covering topics I'm interested in ...

COMPUTE

vSphere 6.5 Host Resources Deep Dive: Part 2 (SER1872BU)
available here 

STORAGE

VMworld 2017 STO1264BU - The Top 10 Things to Know About vSAN
https://www.youtube.com/watch?v=UpUDGVS3KlU

VMworld 2017 SER1143BU A Deep Dive into vSphere 6.5 Core Storage Features and Functionality
https://www.youtube.com/watch?v=6o0YDSrm9B0

VMworld 2017 SER2355BU Best Practices for All-Flash Arrays with VMware vSphere
https://www.youtube.com/watch?v=9uxHSGcQ9o8

VMworld 2017 PBO3367BUS - VMware vSphere Virtual Volumes made easy with Pure Storage
https://www.youtube.com/watch?v=Mx1tDoQlSzA

VMworld 2017 ADV3368BUS - Find performance bottlenecks. Understanding ESXi Storage Queueing
https://www.youtube.com/watch?v=RlDo4VtDeow

VMworld 2017 STO2446BE - Virtual Volumes Technical Deep Dive
https://www.youtube.com/watch?v=39VKlETQsXU&feature=youtu.be

VMworld 2017 STO3305BES - Replicating VMware VVols: A technical deep dive into VVol array based
https://www.youtube.com/watch?v=iGMCmyP-5I8&feature=youtu.be

VMworld 2017 STO2115BE - vSphere Storage Best Practices
https://www.youtube.com/watch?v=4TCuttQbSFE&feature=youtu.be

NETWORK

VMworld 2017 NET1345BU VMware NSX in Small Data Centers for Small and Medium Businesses
https://www.youtube.com/watch?v=a6SssLRIRNo

vSPHERE MANAGEMENT

VMworld 2017 SER2958BU Migrate to the VMware vCenter Server Appliance You Should
https://www.youtube.com/watch?v=q_75UkOsUYk

vCenter Performance Deep Dive (SER1504BU)
available here 

VMworld 2017 SER1411BU VMware vSphere Clients Roadmap: HTML5 Client, Host Client, and Web Client
https://www.youtube.com/watch?v=m8H0kkM5svs

AUTOMATION

VMworld 2017 SER2480BU - vSphere PowerCLI 101: Becoming Your Organization's Superhero (SER2480BU)
https://www.youtube.com/watch?v=klfVZdJ7CYM

VMworld 2017 SER2077BU Achieve Maximum vSphere Stability with PowerCLI Assisted Documentation
https://www.youtube.com/watch?v=-KK0ih8tuTo

KUBERNETES

VMworld 2017 - CNA2080BE - Basics of Kubernetes on BOSH: Run Production-grade Kubernetes on the SDDC
https://www.youtube.com/watch?v=x3M-C84L2as&feature=youtu.be

PERFORMANCE

VMworld 2017 SER2724BU Extreme Performance Series: Performance Best Practices
https://www.youtube.com/watch?v=EYggYAwjz3g

VMworld 2017 VIRT1430BU Performance Tuning and Monitoring for Virtualized Database Servers
https://www.youtube.com/watch?v=5EJu2ER-aLI

APPLICATIONS

VMworld 2017 VIRT1374BU Virtualize Active Directory, the Right Way!
https://www.youtube.com/watch?v=pf0o5yyMmGU

VMworld 2017 VIRT1309BU Monster VMs (Database Virtualization) with VMware vSphere 6.5
https://www.youtube.com/watch?v=sXbOoRo_Wn4

VMworld 2017 SER2933BU - Defend Your vSphere Infrastructure from Evil with vCenter High Availability
https://www.youtube.com/watch?v=qPWWIQwrLRU


Note: all sessions above should be available here or on William Lam's github page here.


Tuesday, August 15, 2017

NSX Basic Concepts, Tips and Tricks

NSX and Network Teaming

There are multiple options how to achieve network teaming from ESXi to the physical network. For more information see my another blog post "Back to the basics - VMware vSphere networking".

In a nutshell, there are generally three supported methods how to connect NSX VTEP(s) to the physical network
  1. Explicit failover - only single physical NIC is active at any given time, therefore no load balancing at all
  2. LACP - single aggregated virtual interface where load balancing is done based on hashing algorithm
  3. Switch independent teaming achieved by multiple VTEPs where each VTEP is bind to different ESXi pNIC.
Let's assume we have switch independent teaming with multiple independent uplinks to the physical network. Now the question is how to check VM vNIC to ESXi host pNIC mapping? I'm aware of at least four methods how to check this mapping
  1. ESXTOP
  2. ESXCLI
  3. NSX Controller
  4. NSX Manager
1/ ESXTOP method
  • ssh to ESXi
  • run esxtop
  • Press key [n] to switch to network view
  • Check column TEAM-PNIC – it should be different vmnic (ESXi pNIC) for each VM
2/ ESXCLI method
  • ssh to ESXi
  • Use command “esxcli network vm list” and locate World IDs of VM
  • Use “esxcli network vm port list -w ” and check “Team Uplink” value. It should be different vmnic (ESXi PNIC) for each VM
3/ NSX Controller method
  • Identify MAC address of VM
  • Login to NSX Controller nodes (ssh or console) one by one
  • Use command “show control-cluster logical-switches mac-table ” to show mac-address to VTEP mappings. I assume multi VTEP configuration where each VTEP is statically bound to particular ESXi pNIC (vmnic)
4/ NSX Manager method
  • Identify MAC address of VM
  • Login to NSX Manager (ssh or console)
  • Go through all controllers and show mac address table where is also information behind which VTEP particular mac address is
  • i) show controller list all
  • ii) show logical-switch controller controller-1 vni 10001 mac
  • iii) show logical-switch controller controller-2 vni 10001 mac
  • iv) show logical-switch controller controller-3 vni 10001 mac
The appropriate method is typically chosen based on the role and Role Based Access Control. vSphere Administrator will probably use esxtop or esxcli and Network Administrator will use NSX Manager or Controller.

Distributed Logical Router (DLR)

DLR is a virtual router distributed across multiple ESXi hosts. You can imagine it as a chassis with multiple line cards.  Chassis is virtual (software based) and line cards are software modules spread across multiple ESXi hosts (physical x86 servers).

The basic concept of DLR is that every routing decision is done locally which means that NSX DLR always performs local routing on the DLR instance running in the kernel of the ESXi hosting the workload that initiates the communication. When VM traffic needs to be routed to another logical switch, it first comes to DLR on the same ESXi host where VM is running. Each DLR line card module (ESXi host) has all logical switches (VXLANs) connected locally so DLR forwards the packet to the appropriate destination logical switch and if the target VM runs on another ESXi host the packet is encapsulated on local ESXi host and decapsulated on target ESXi host.

It is good to know, that DLR uses always the same MAC address for default gateway addresses for all logical switches. This MAC address is called VMAC. This is a MAC address used for DLR logical L3 interfaces (LIFs) connected into logical switches (VXLANs).

However, there must be some coordination between multiple DLR "line card" modules (ESXi hosts) therefore each DLR module must also have physical MAC address. This MAC address is called PMAC.

To show DLR PMAC and VMAC run following command on ESXi host
net-vdr -l -C

Distributed Logical Firewall (DFW) - firewall rules

NSX Distributed Firewall applies firewall rules directly to VM vNICs. In the vNIC is the concept of slots where different services are bind and chain together. NSX DFW sits in slot 2 and for example, the third party firewall sits in slot 4.

So the DFW firewall rules are automatically applied on each vNIC so the question is how to double check what rules are at vNIC level.

There are two methods how to check it
  1. ESXi commands
  2. NSX Manager commands
1/ ESXi method
  • ssh to ESXi
  • Use command “summarize-dvfilter” and locate the VM of your interest and its vNIC name is slot 2 used by agent vmware-sfw
  • grep commands can help us here ... "summarize-dvfilter | grep -A 10 "
  •  vNIC name should looks similar to nic-24565940-eth0-vmware-sfw.2
  • Now you can list firewall rules by command "vsipioctl getfwrules -f nic-24565940-eth0-vmware-sfw.2"

2/ NSX Manager method (https://kb.vmware.com/kb/2125482)
  • Log in to the NSX Manager with the admin credentials
  • To display a summary of DVFilter information, run the command "show dfw host-id summarize-dvfilter"
  • To display detailed information about a vnic, run the command "show dfw host host-id vnic"
  • To display the rules configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name rules"
  • To display the addrsets configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name addrsets"
And again, the appropriate method is typically chosen based on the administrator role and Role Based Access Control. 

Distributed Logical Firewall (DFW) - third party integration and availability considerations

NSX Distributed Firewall supports integration with third party solutions. This integration is also called service chaining. Third party solution is hooked to a particular vNIC slot and usually, some selected or potentially all (not recommended) traffic can be redirected to third-party solution agent running on each ESXi host as a special Virtual Machine. The third-party solution can inspect the traffic and allow or deny the traffic. However,  what happens when agent VM is not available? It is easy to test it, you can Power Off Agent VM and see what happens. Actually, the behavior depends on Service failOpen/failClosed policy.  You can check policy setting as depicted on the screenshot below ...

Service failOpen/failClosed policy
If failOpen is set to false then the virtual machine traffic will be dropped in case the agent is unavailable. It has a negative impact on availability but positive impact on security. If failOpen is set to true then the VM traffic will be allowed and everything works even the agent is not available. In such situation, the security policy cannot be enforced and there is a potential security risk. So this is typical design decision point where a decision is dependent on customer specific requirements.

Now the question is how failOpen setting can be changed. Well, my understanding is that it depends on third party solution. Here is the link to TrendMicro how to - "Set vNetwork behavior when appliances shut down"  

Monday, August 14, 2017

Remote text based console to ESXi over IPMI SOL

I have just bought another server into my home lab. I already have 6 Intel NUCs but a lot of RAM is needed for full VMware SDDC with all products like LogInsight, vROps, vRNI, vRA, vRO, ...  but that's another story.

Anyway, I have decided to buy used Dell rack server (PowerEdge R810) with 256 GB RAM mainly because of the amount of RAM but also because of all Dell servers older than 9 Generation support IPMI which is very useful. The server can be remotely managed (power on, power off, etc.) over IPMI and it also supports SOL which stands for Serial-over-LAN for server consoles. IPMI SOL is an inexpensive alternative to the iDRAC enterprise virtual console.

You can read more about IPMI on links below

So, if you will follow instructions on links above, you will be able to use IMPI SOL to see and manage server during the boot process and change for example BIOS settings. I have tested it and it works like a charm. You see the booting progress, you can go to the BIOS and change anything how you want. Console redirection works and the keyboard can be used to control the server during POST. However, after the server POST phase and boot loading of ESXi, the ESXi console was not, unfortunately, redirected to SOL. I think it is because ESXi DCUI is not pure text based console. Instead, it is a graphics mode simulating text mode. A graphics mode consoles cannot be, for obvious reasons, transferred over IPMI SOL.

So there is another possibility. ESXi Direct Console (aka DCUI) can be redirected to a Serial Port. The setup procedure is nicely described in the documentation here. It is done via ESXi host advanced setting "VMkernel.Boot.tty2Port" to the value "com2". It is worth to mention that server console redirection and ESXi DCUI redirection cannot be done on the same time for obvious reasons. So I have unconfigured server console redirection and configured ESXi DCUI redirection. It worked great, but the keyboard was not working. It is pretty useless to see ESXi DCUI without the possibility to use it, right? To be honest, I do not know why my keyboard did not work over IPMI SOL.

So what is the conclusion? Unfortunately,  I have hit another AHA effect ...
"Aha, IPMI SOL will not help me too much with remote access to ESXi DCUI console."
And as always, any feedback or tips and tricks are more than welcome as comments to this blog post.

Update: I have just found and bought very cheap iDRAC Enterprise Remote Access Card on Ebay, which supports remote virtual console and media. So, it is hardware workaround to my software problem :-)

iDrac6 on Ebay





Sunday, June 25, 2017

Start order of software services in VMware vCenter Server Appliance 6.0 U2

vCenter Server Appliance 6.0 U2 services are started in the following order ...

  1. vmafdd (VMware Authentication Framework)
  2. vmware-rhttpproxy (VMware HTTP Reverse Proxy)
  3. vmdird (VMware Directory Service)
  4. vmcad (VMware Certificate Service)
  5. vmware-sts-idmd (VMware Identity Management Service)
  6. vmware-stsd (VMware Security Token Service)
  7. vmware-cm (VMware Component Manager)
  8. vmware-cis-license (VMware License Service)
  9. vmware-psc-client (VMware Platform Services Controller Client)
  10. vmware-sca (VMware Service Control Agent)
  11. applmgmt (VMware Appliance Management Service)
  12. vmware-netdumper (VMware vSphere ESXi Dump Collector)
  13. vmware-syslog (VMware Common Logging Service)
  14. vmware-syslog-health (VMware Syslog Health Service)
  15. vmware-vapi-endpoint (VMware vAPI Endpoint)
  16. vmware-vpostgres (VMware Postgres)
  17. vmware-invsvc (VMware Inventory Service)
  18. vmware-mbcs (VMware Message Bus Configuration Service)
  19. vmware-vpxd (VMware vCenter Server)
  20. vmware-eam (VMware ESX Agent Manager)
  21. vmware-rbd-watchdog (VMware vSphere Auto Deploy Waiter)
  22. vmware-sps (VMware vSphere Profile-Driven Storage Service)
  23. vmware-vdcs (VMware Content Library Service)
  24. vmware-vpx-workflow (VMware vCenter Workflow Manager)
  25. vmware-vsan-health (VMware VSAN Health Service)
  26. vmware-vsm (VMware vService Manager)
  27. vsphere-client ()
  28. vmware-perfcharts (VMware Performance Charts)
  29. vmware-vws (VMware System and Hardware Health Manager) 


Thursday, June 22, 2017

CLI for VMware Virtual Distributed Switch

A few weeks ago I have been asked by one of my customers if VMware Virtual Distributed Switch (aka VDS) supports Cisco like command line interface. The key idea behind was to integrate vSphere switch with open-source tool Network Tracking Database (NetDB) which they use for tracking MAC addresses within their network. I have been told by customer that NetDB can telnet/ssh to Cisco switches and do screen scraping so would not it be cool to have the most popular switch CLI commands for VDS? These commands are

  • show mac-address-table
  • show interface status
The official answer is NO, but wait a minute. Almost anything is possible with VMware API. So my solution is leveraging VMware's vSphere Perl SDK to pull information out of Distributed Virtual Switches. I have prepared PERL script vdscli.pl which currently supports two commands mentioned above. It goes through all VMware Distributed Switches on single vCenter.

Script along with shell wrappers are available on GITHUB here https://github.com/davidpasek/vdscli
See screenshots below to get an idea what script does.

The output of the command
vdscli.pl --server=vc01.home.uw.cz --username readonly --password readonly --cmd show-port-status
looks as depicted in screenshot below.


and output of the command
vdscli.pl --server=vc01.home.uw.cz --username readonly --password readonly --cmd show-mac-address-table

So now we have Perl scripts to get information from VMware Distributed Virtual Switch which is nice, however, we would like to have Interactive CLI to have the same user experience as we have on physical switches CLI, right? For Interactive CLI I have decided to use Python ishell (https://github.com/italorossi/ishell) to emulate Cisco like CLI. To start interactive VDSCLI shell you must have Python with iShell installed and then you can simply run script

./vdscli-ishell.py

which is just a wrapper around vdscli.pl The screenshot of VDSCLI shell is in the figure below

VDSCLI Interactive Shell
And the last step is to allow SSH or Telnet access to VDSCLI shell. It can be very easily done via standard Linux possibility to change a shell for the particular user. The VDSCLI over ssh is depicted on the screenshot below.

VDSCLI Interactive Shell over SSH
To operationalize all these scripts, I would highly encourage you to read my another blog post ...
"CLI for VMware Virtual Distributed Switch - implementation procedure".

Hope somebody else in VMware community will find it useful.

Tuesday, June 13, 2017

Storage DRS integration with storage profiles

This is a very quick blog post. In vSphere 6.0, VMware has introduced Storage DRS integration with storage profiles (aka SPBM - Storage Policy Based Management).

Here is the link to official documentation.

Generally, it is about SDRS advanced option EnforceStorageProfiles. Advanced option EnforceStorageProfiles takes one of these integer values, 0,1 or 2 where the default value is 0.

  • When option is set to 0, it indicates that there is NO storage profile or policy enforcement on the SDRS cluster.
  • When option is set to 1, it indicates that there is storage profile or policy SOFT enforcement on the SDRS cluster. It is analogous with DRS soft rules. SDRS will comply with storage profile/policy in the optimum level. However if required, SDRS will violate the storage profile compliant.
  • When option is set to 2, it indicates that there is storage profile or policy HARD enforcement on the SDRS cluster. It is analogous with DRS hard rules. In any case, SDRS will not violate the storage profile or policy compliant.

Please note that at the time of writing this post, SDRS Storage Profiles Enforcement works only during initial placement and NOT for already provisioned VMs during load balancing. Therefore, when iVM Storage Policy is changed for particular VM, SDRS will not make it automatically compliant nor throw any recommendation.

Another limitation is that vCloud Director (vCD) backed by SDRS cluster does NOT support  Soft (1) or Hard (2) storage profile enforcements. vCloud Director (vCD) will work well with Default (0) option

Relevant references to other resources:

Wednesday, June 07, 2017

VMware Photon OS with PowerCLI

Photon OS is linux distribution maintained by VMware with multiple benefits for virtualized form factor, therefore any virtual appliance should be based on Photon OS.

I have recently tried to play with Photon OS and here are some my notes.

IP Settings

Network configuration files are in directory
/etc/systemd/network/
IP settings are leased from DHCP by default. It is configured in file  /etc/systemd/network/10-dhcp-en.network

File contains following config
[Match]
Name=e*
[Network]
DHCP=yes
To use static IP settings it is good to move DHCP config file down in alphabetical order and create config file with static IP settings.
mv 10-dhcp-en.network 99-dhcp-en.networkcp 99-dhcp-en.network 10-static-en.network
file  /etc/systemd/network/10-static-en.network should looks similar to

[Match]Name=eth0
[Network] 
Address=192.168.4.56/24 
Gateway=192.168.4.254 
DNS=192.168.4.4

Network can be restarted by command
systemctl restart systemd-networkd
and network settings can be checked by command

networkctl

Package management

Photon OS uses TDNF  (Tiny DNF) package manager. It is based on Fedora's DNF.  This is a development by VMware that comes with compatible repository and package management capabilities. Note that not every dnf command is available but the basic ones are there.

Examples:
  • tdnf install libxml2
  • tdnf install openssl-devel
  • tdnf install binutils
  • tdnf install pkg-config
  • tdnf perl-Crypt-SSLeay
  • tdnf install cpan
  • tdnf libuuid-devel
  • tdnf install make
Update of the whole operating system can be done by command
tdnf update

Log Management

You will not find typical linux /var/log/messages
Instead, journald is used and you have to use command journalctl

Equivalent to tail -f /var/log/messages is
journalctl -f 

System services

System services are control by command systemctl

To check service status use
systemctl status docker
To start service use
systemctl start docker
To enable service after system start use
systemctl enable docker

Docker and containerized PowerCLI

One of key use cases for Photon OS is to be a docker host, therefore, docker is preinstalled in Photon OS. You can see further Docker information by command
docker info
If Docker is running on your system, you can very quickly spin up docker container. Let's use example of containerized PowerCLI. To download container image from DockerHup use command
docker pull vmware/powerclicore
to check all downloaded images use the command
docker images -a   
 root@photon-machine [ ~ ]# docker images -a    
 REPOSITORY      TAG         IMAGE ID      CREATED       SIZE  
 vmware/powerclicore  latest       a8e3349371c5    6 weeks ago     610 MB  
 root@photon-machine [ ~ ]#   

Now you can run powercli container interactively (-i) and in allocated pseudo-TTY (-t). Option -rm stands for "Automatically remove the container when it exits".
docker run --rm -it vmware/powerclicore 
 root@photon-machine [ ~ ]# docker run --rm -it --name powercli vmware/powerclicore         
 PowerShell   
 Copyright (C) Microsoft Corporation. All rights reserved.erclicore --name powercl  
      Welcome to VMware vSphere PowerCLI!  
 Log in to a vCenter Server or ESX host:       Connect-VIServer  
 To find out what commands are available, type:    Get-VICommand  
 Once you've connected, display all virtual machines: Get-VM  
     Copyright (C) VMware, Inc. All rights reserved.  
 Loading personal and system profiles took 3083ms.  
 PS /powershell#   

Now you can use PowerCLI running on linux container. The very first PowerCLI command is usually Connect-VIServer, but you can get following warning and error messages

 PS /powershell> Connect-VIServer                                                                         
 cmdlet Connect-VIServer at command pipeline position 1  
 Supply values for the following parameters:  
 Server: vc01.home.uw.cz  
 Specify Credential  
 Please specify server credential  
 User: cdave  
 Password for user cdave: *********  
 WARNING: Invalid server certificate. Use Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you'd like to connect once or to add  
  a permanent exception for this server.  
 Connect-VIServer : 06/07/2017 19:25:44     Connect-VIServer          An error occurred while sending the request.       
 At line:1 char:1  
 + Connect-VIServer  
 + ~~~~~~~~~~~~~~~~  
   + CategoryInfo     : NotSpecified: (:) [Connect-VIServer], ViError  
   + FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_Exception,VMware.VimAutomation.ViCore.Cmdlets.Commands.ConnectVIServer  
 PS /powershell>   

To solve the problem you have to adjust PowerCLI configuration by
Set-PowerCLIConfiguration -InvalidCertificateAction ignore -confirm:$false -scope All
The command above changes PowerCLI configuration for all users.

To use other docker commands you can open another ssh session, and for example list running containers

 root@photon-machine [ ~ ]# docker ps -a     
 CONTAINER ID    IMAGE         COMMAND       CREATED       STATUS       PORTS        NAMES  
 6ecccf77891e    vmware/powerclicore  "powershell"    7 minutes ago    Up 7 minutes              powercli  
 root@photon-machine [ ~ ]#   

... or issue any other docker command.

That's cool, isn't it?

Tuesday, June 06, 2017

VMware VVOLs scalability

I'm personally a big fan of VMware Virtual Volumes concept. If you are not familiar with VVOLs check this blog post with the recording of VMworld session and read VMware KB Understanding Virtual Volumes (VVols) in VMware vSphere 6.0

We all know that the devil is always in details. The same is true with VVOLs. VMware prepared the conceptual framework but implementation always depends on storage vendors thus it varies around storage products.

Recently, I have had VVOLs discussion with one of my customers and he was claiming that their particular storage vendor supports a very small number of VVOLs. That discussion inspired me to do some research.

Please, note that numbers below are valid at the moment of writing this article. You should always check current status with your particular storage vendor.

Vendor / Storage ArrayMaximum VVOLs / Snapshots or Clones
DELL / Compellent SC 80002,000 / TBD
EMC / Unity 3009,000 / TBD
EMC / Unity 4009,000 / TBD
EMC / Unity 50013,500 / TBD
EMC / Unity 60030,000 / TBD
EMC / VMAX 364,000 / TBD
HPE / 3PARI have been told that it depends on specific model and firmware versions but it is roughly about 10,000 (the info gathered at November 2018)
Hitachi / VSP G2002,000 / 100,000
Hitachi / VSP G4004,000 / 100,000
Hitachi / VSP G6004,000 / 100,000
Hitachi / VSP G80016,000 / 100,000
Hitachi / VSP G100064,000 / 1,000,000

Numbers above are very important because single VM have minimally 3 VVOLs (home, data, swap) and usually even more (snapshot) or more data disks. If you will assume 10 VVOls for single VM you will end up with just 200 VMs on Dell Compellent or Hitachi VSP G200. On the other hand, EMC Unity 600 would give you up to 3,000 VMs which is not bad and enterprise storage systems (EMC VMAX and Hitachi G1000) would give you up to 6,400 VMs which is IMHO very good scalability.

So as always, it really depends on what storage system do you have or planning to buy.

If you know numbers for other storage systems, please share it in comments below this blog post.

Keywords: vvol, vvols, virtual volumes 

Wednesday, May 31, 2017

vROps & vSphere Tags, Custom Attributes

As many of my customers started to recently customize their vROps and together we are working on various use-cases I find it useful to summarize my notes here and possibly help others during their investigation and customization.

This time I will focus on custom descriptions for the objects in vROps. When you are providing an access to vRealize Operations to your company management, many times they are not familiar with IT naming convention and it is very hard for them to analyze why some object is marked as red and if it is important at all.

We've been thinking this through with David for a bit and there are two very easy alternatives to tackle this use case. vSphere Tags and Custom Attributes in vSphere. In the following lines I will explain step-by-step procedure to use these and tackle possible problems you might hit on the way.

1) Create preferred description in vSphere. For Custom Attributes can be used local (object based) or global definitions - both works fine. At the end of this article you can see how the vSphere Tags and Custom Attributes looks like and what is better to cover your specific use-case.


2) Afterwards switch to vROps and check, if the metric is being propagated to the object. Bear in mind that it might take couple of minutes for metric to be collected.


3) After the metric being available you can start working with it for example in your Views. For this post I've created couple of Tags on my vCenter appliance called APPL_vCenter; therefore selecting Virtual machine as a subject of view creation is logical choice.


4) Now the tricky part I had personally a problem (I would like to thank our great vROps consultant Oleg Ulyanov for helping me out) was that the metric was simply not available in a view. The thing here is that if you have big environment with hundreds of VMs, vROps will randomly chose few (I think the number was 5) and based on those 5 show a merge of available metrics. If you would be lucky as me and APPL_vCenter would not be among them, Tags will not be available. To force vROps to use specific machine, you can use the square next to the Metrics/Properties button.


In newly opened Window you can filter out a VM you want.


5) Afterwards just chose the VM you've created Tag on (in my case again APPL_vCenter) and metric should be now visible.


6) In the final screenshot I would like to compare both solutions - vSphere Tags and Custom Attributes (for some reason in vROps marked as Custom Tag).


vSphere Tags are consoliadted into one Field. I've created Tag "Purpose" and Tag "OS" for the vCenter Appliance. On the other hand Custom Attributes are always separated so doing the same would create two Custom Tags with just a value in it. In case you would need for example filtering or any other logic behind the Tags, Custom Attributes seems to be a better choice.


Sunday, May 14, 2017

VM Snapshots Deep-Dive

Author: Stan Jurena

A while ago I received interesting question regarding snapshot consolidation from one of my customers and as I was not 100% sure about the particular details (file naming, consolidation, pointers, etc.) I went to do some testing in a lab. The scenario was pretty simple; create a virtual machine with non-linear snapshot tree and start removing the snapshots.

Lessons learned: When doing such tests, it is always good to add some files or something a bit more sizable into the each snapshot. My initial work started with just creating the folders named snap[1-7] which during consolidation was really not helpful identifying where the data from snapshot actually went.

The non-linear snapshot tree I mentioned earlier looks like this:


First confusion which was sort of most important and took me a while to turn my brain around was the file naming convention. More or less file SnapTest-flat.vmdk is a main data file of the Server, in this case C: drive of the Microsoft Windows server with size around 26GB. This file is not visible in Web Client as only the descriptor <VM name>.vmdk (in our case SnapTest.vmdk) is directly visible. When you will create a first snapshot this is a file which is being used by it as you can see in the following image:


Command grep -E 'displayName|fileName' SnapTest.vmsd is listing all lines containing displayName and/or fileName from the file SnapTest.vmsd. Going through the vSphere documentation you will find:
A .vmsd file that contains the virtual machine's snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot.

With that being said above output of the command is listing our predefined snapshot names (I used the number of the snapshot and the size of the file I've added) and its respected file. So first created snapshot is named Snap1+342MB and using file SnapTest.vmdk.


Using the 2nd useful command during this test grep parentFileNameHint SnapTest-00000[0-9].vmdk is going through all the snapshot files and listing parentFileNameHint. As you probably guessed it, it is a snapshot it is depending on (parent file).


List of tests I performed:
1) Remove Snapshot 5 (Snap5+366MB)
2) Remove Snapshot 4 (Snap4+356MB)
3) Remove Snapshot 3 (Snap3+337MB)
4) Remove Snapshot 2 (Snap2+348MB)
5) Move Here You Are
6) Remove Snapshot 6 (Snap6+168MB)
7) Remove Snapshot 7 (Snap7+348MB)

Now In more details per every case.

1) Remove Snapshot 5 (Snap5+366MB)
Result can be seen in this visualisation. After removing the Snapshot 5 within the Web Client, Snapshot 6 and Snapshot 5 vmdk files were consolidated, size updated accordingly same as the snapshot's vmdk file.


As for the fist example I will add also the command exports here for illustration. Following scenarios should be understandable even without such.



2) Remove Snapshot 4 (Snap4+356MB)
I did this test just to proof myself the proper functionality, so it is very similar to the previous part.

3) Remove Snapshot 3 (Snap3+337MB)
Now with removing Snapshot 3, things are becoming a bit more challenging. On snapshot 3 are currently depending 3 more snapshots (Snap6, Snap7 and You Are Here). As the consolidation in this case would need to be performed with each of them it would be very "costly" operation. The result was that the Snapshot was removed from GUI but the files remained on the disk and all the dependencies were preserved.


4) Remove Snapshot 2 (Snap2+348MB)
Although it might seem complicated on the "paper" the remove process for Snapshot 2 was very similar with every other snapshot removal only in this case Snapshot 2 was consolidated with temporary file preserved from the previous step.


5) Move "Here You Are"
Moving active state over virtual machine named as "Here You Are" is also quite simple operation. I was performing this test more or less to validate, how many snapshots can be dependent on the parent snapshot until the snapshots are consolidated. To spoil the surprise it has to be just one file as in this case on the temporary file are depending only Snapshot 6 and Snapshot 7.


6) Remove Snapshot 6 (Snap6+168MB)
As mentioned in the previous step if there is only one child snapshot to the parent snapshot and the parent snapshot is being removed, data are being consolidated. Otherwise there would be preserved temporary file for child snapshots to work with.


7) Remove Snapshot 7 (Snap7+348MB)The final step was to remove the last Snapshot 7 and be left with just one snapshot Snap1+342MB and the main file. If this file would be removed all the data would be consolidated into the main VMDK and there would be no delta file for "You Are Here" state and therefore no point to get back to.


Overall the work with the snapshots is not a rocket science but my test today showed me a in a bit more detail what is happening in the background with the file names, snapshots IDs in the vmdk files, data consolidation. It also showed that there are temporary parent files left behind if there is more than one direct child snapshot depending on it. It also forced me to refresh the knowledge about the Space Efficient Sparse Virtual Disks (or SE Sparse Disks for short) which was well explained by my colleague Cormac Hogan in late 2012.

Thursday, April 20, 2017

Back to the basics - VMware vSphere networking

As a software-defined networking (VMware NSX) is getting more and more traction I have been recently often asked to explain the basics of VMware vSphere networking to networking experts who do not have experience with VMware vSphere platform. First of all, networking team should familiarize them self with vSphere platform at least from a high level. Following two videos can help them to understand what vSphere platform is.

vSphere Overview Video
https://youtu.be/EvXn2QiL3gs

What is vCenter (Watch the first two minutes)
https://youtu.be/J0pQ2dKFLbg

When they understand basic vSphere terms like vCenter and  ESXi we can start talking about virtual networking.

First thing first, VMware vSwitch is not a switch. Let me repeat it again ...
VMware vSwitch is not a typical ethernet switch.
It is not a typical network (ethernet) switch because not all switch ports are equal. In VMware vSwitch you have to configure switch uplinks (physical NICs) and internal switch ports (software constructs). If the ethernet frame is coming from the physical network via uplink, vSwitch will never forward such frame to any other uplink but only to internal switch ports, where virtual machines are connected. This behavior guarantees that vSwitch will never cause the L2 loop problem.  It also means that vSwitch does not need to implement and participate in spanning tree protocol (STP) usually running in your physical network. Another different vSwitch behavior compared to traditional ethernet switch is that vSwitch does not learn external MAC addresses. It only knows about MAC addresses of virtual machines running on particular ESXi host (hypervisor). Such devices are often called port extenders. For example, CISCO FEX (fabric extender) is a physical device having the same behavior.

Now let's talk about network redundancy. In production environments, we usually have a redundant network where multiple NICs are connected to different physical switches.

Each NIC connected to different physical switch
vSwitch network redundancy is achieved by NIC teaming. NIC teaming is also known as link aggregation, link bundling, port channeling, ethernet bonding or NIC teaming. VMware is using the term Network teaming or NIC teaming. So what teaming options do we have in VMware vSphere platform? It depends on what edition (license) you have and what vSwitch you want to use. VMware offers two types of vSwitches.
  • VMware vSphere standard switch (aka vSwitch or vSS)
  • VMware vSphere distributed virtual switch (aka dvSwitch or vDS)
Let's start with VMware's standard switch available on all editions.

VMware vSphere standard switch (vSS)

VMware vSphere standard switch supports multiple switch independent active/active and active/standby teaming methods and also one switch dependent active/active teaming method.

The standard switch can use following switch independent load balancing algorithms:
  • Route based on originating virtual port - (default) switch independent active/active teaming where the traffic is load balanced in round-robin fashion across all active network adapters (NICs) based on internal vSwitch port id where virtual machine vNIC's or ESXi vmKernel ports are connected.
  • Route based on source MAC hash - switch independent active/active teaming where the traffic is load balanced in round-robin fashion across all active network adapters (NICs) based on source MAC address identified in standard vSwitch.
  • Use explicit failover order - is another switch independent teaming but active/passive. Only one adapter from all active adapters is used and if it fails the next one is used. In other words, it always uses the highest order uplink from the list of Active adapters which passes failover detection criteria.
and only one switch dependent load balancing algorithm
  • Route based on IP hash - switch dependent active/active teaming where the traffic is load balanced based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. This is switch dependent teaming, therefore, the static port-channel (aka ether-channel) has to be configured on the physical switch side otherwise, it will not work.
It is worth to mention that for all active/active teaming methods you can add additional standby adapters which are used just in case the active adapter fails and you can also define unused adapters which you do not want to use at all. For further information, you can read VMware vSphere documentation.

VMware vSphere distributed switch (vDS)

If you have vSphere Enterprise Plus license or VSAN license you are eligible to use VMware vSphere distributed switch. VMware distributed switch key advantages are
  • centralized management
  • advanced enterprise functionality
When you use virtual distrubuted switch, you do not need to configure each vSwitch individually but instead, you have single distributed vSwitch across multiple ESXi hosts and you can manage it centrally. On top of centralized management you will get following advanced enterprise functionalities:
  • NIOC (Network I/O Control) which allows QoS and marking (802.1p tagging, DSCP)
  • PVLAN
  • LBT (Load Based Teaming) - official name is "Route based on physical NIC load"
  • LACP - dynamic switch dependent teaming
  • Route based on physical NIC load - another switch independent teaming with optimized load balancing
  • ACLs - Access Control Lists
  • LLDP
  • Port mirroring
  • NetFlow
  • Configuration backup and restore
  • and more
So in vDS you have two additional teaming options

  1. Switch Independent - LBT
  2. Switch Dependent - LACP

LBT is proprietary VMware solution where vmkernel checks every 30sec the stats from the relevant pNIC's and the calculation use the stats for the 30sec interval and get an average over the 30sec (to normalize the average and to eliminate spikes). If the bandwidth is above 75%, it will mark the pNIC as saturated. If a pNIC have been marked as saturated, the vmkernel will not move any more traffic onto the saturated pNIC. Read blog post "LBT (Load Based Teaming) explained" for more details (see links to references below this article).

LACP is standardized link aggregation protocol. It is worth to mention, that when LACP is used you can leverage significantly enhanced load balancing algorithms to more optimal bandwidth usage of physical NICs. Theoretically, you can use for single VM more bandwidth than is the bandwidth of single pNIC. But to see it in the real world, multiple flows must be initiated from that single VM and LACP the result of the hash algorithm must route it across multiple links bundled in a LAG.

vSphere 6.0 LACP supports following twenty (20) hash algorithms:
  1. Destination IP address
  2. Destination IP address and TCP/UDP port
  3. Destination IP address and VLAN
  4. Destination IP address, TCP/UDP port and VLAN
  5. Destination MAC address
  6. Destination TCP/UDP port
  7. Source IP address
  8. Source IP address and TCP/UDP port
  9. Source IP address and VLAN
  10. Source IP address, TCP/UDP port and VLAN
  11. Source MAC address
  12. Source TCP/UDP port
  13. Source and destination IP address
  14. Source and destination IP address and TCP/UDP port
  15. Source and destination IP address and VLAN
  16. Source and destination IP address, TCP/UDP port and VLAN
  17. Source and destination MAC address
  18. Source and destination TCP/UDP port
  19. Source port ID
  20. VLAN
Note: Advanced LACP settings are available via esxcli commands.  
esxcli network vswitch dvs vmware lacp
esxcli network vswitch dvs vmware lacp config get
esxcli network vswitch dvs vmware lacp status get
esxcli network vswitch dvs vmware lacp timeout set

Hope this was informative and useful.

References to other useful resources

Sunday, April 02, 2017

ESXi Host Power Management

I have just listened to Qasim Ali's  VMworld session "INF8465 - Extreme Performance Series: Power Management's Impact on Performance" about ESXi Host Power Management (P-States, C-States, TurboMode and more) and here are his general recommendations
  • Configure BIOS to allow ESXi host the most flexibility in using power management features offered by the hardware
  • Select "OS Control mode", "Performace per Watt", or equivalent 
  • Enable everything P-States, C-States and Turbo mode
  • To achieve the best performance per watt for most workloads, leave the power policy at default which is "Balanced"
  • For applications that require maximum performance, switch to "High Performance" from within ESXi host
Ali's VMworld session linked above is really worth to watch. I encourage you to watch it by yourself. 

Saturday, March 18, 2017

VMware vSphere 6.5 products enhancements and basic concepts behind

VMware Tech Marketing have produced a bunch of cool vSphere 6.5 related whiteboard videos. Great stuff to review to understand VMware products enhancements and basic concepts behind.


It is definitely worth to watch it but please, keep in mind that the devil is in details so be prepared for further planning, designing and testing before you implement it in to the production.

Friday, March 10, 2017

High level introduction to VMware products

My blog posts usually go to low level technical details and are targeted to VMware subject matter experts. However, sometime is good to step back and watch things from high level perspective. It can be especially helpful when you need to explain VMware products to somebody who is not an expert in VMware technologies.

vSphere Overview Video
https://youtu.be/EvXn2QiL3gs

What is vCenter (Watch the first two minutes)
https://youtu.be/J0pQ2dKFLbg

HTML5 Web Client (This is how vSphere is managed now - no more client. Minute 3 shows you how to create a virtual machine)
https://youtu.be/sLveCbyqrvE

vR Ops Overview
https://youtu.be/0o_xfw4C_bo

Troubleshooting VM Performance in vR Ops
https://youtu.be/ZpidQUZ_8J8

How to Build Blueprints in vRA - Single Machine, Application, and with AWS
https://youtu.be/94ZJqfBIXWI

NSX - Network Concepts Overview (Watch up until minute 4)
https://youtu.be/RQzCEC4ieZE

NSX - Microsegmentation (Watch 2:50 to 4:40)
https://youtu.be/2Un-SrEF5is

vSAN Overview
https://youtu.be/pjU4EnG1mWc

Hope you find it useful! Either way, sharing is welcome!

Sunday, March 05, 2017

ESXi localcli

I have just read very informative blog post "Adding new vNICs in UCS changes vmnic order in ESXi". The author (Michael Rudloff) is using localcli with undocumented functions to achieve correct NIC order. So what is this localcli? All vSphere admins probably know esxcli command for ESXi configuration. esxcli manages many aspects of an ESXi host. You can run ESXCLI commands remotely or in the ESXi Shell.

You can use esxcli in following three ways
  • vCLI package.Install the vCLI package on the server of your choice, or deploy a vMA virtual machine and target the ESXi system that you want manipulate. You can run ESXCLI commands against a vCenter Server system and target the host indirectly. Running against vCenter Server systems by using the -vihost parameter is required if the host is in lockdown mode.
  • ESXi shell. Run ESXCLI commands in the local ESXi shell to manage that host.
  • You can also run ESXCLI commands from the vSphere PowerCLI prompt by using the Get-EsxCli cmdlet.
So esxcli is well known but what about localcli. Based on VMware documentation, it is a set of commands for use with VMware Technical Support. localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart.

Warning: If you use a localcli command in other situations, an inconsistent system state and potential failure can result.
So it is obvious that usage of LOCALCLI is unsupported and should be used only when instructed by VMware Support.
However, the command is very interesting because when you use special internal plugin directory some undocumented namespaces will appear. You can browse these namespaces and discover some cool functionality. Just login to your ESXi and use command localcli --plugin-dir /usr/lib/vmware/esxcli/int/

 [root@esx11:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/   
 Usage: localcli [disp options]    
 For esxcli help please run localcli --help  
 Available Namespaces:   
 boot       operations for system bootstrapping                                          
 debug       Options related to VMkernel debugging. These commands should be used at the direction of VMware Support Engineers.   
 device      Device manager commands                                                
 deviceInternal  Device layer internal commands                                             
 elxnet      elxnet esxcli functionality                                              
 esxcli      Commands that operate on the esxcli system itself allowing users to get additional information.            
 fcoe       VMware FCOE commands.                                                 
 graphics     VMware graphics commands.                                               
 hardware     VMKernel hardware properties and commands for configuring hardware.                          
 hardwareinternal VMKernel hardware properties and commands for configuring hardware, which are not exposed to end users.        
 iscsi       VMware iSCSI commands.                                                 
 network      Operations that pertain to the maintenance of networking on an ESX host. This includes a wide variety of commands   
          to manipulate virtual networking components (vswitch, portgroup, etc) as well as local host IP, DNS and general   
          host networking settings.  
 networkinternal  Operations used by partner software, but are not exposed to the end user. These operations must be kept compatible   
          across releases.  
 rdma       Operations that pertain to remote direct memory access (RDMA) protocol stack on an ESX host.              
 rdmainternal   Operations that pertain to the remote direct memory access (RDMA) protocol stack on an ESX host, but are not   
          exposed to the end user. These operations must be kept compatible across releases.  
 sched       VMKernel system properties and commands for configuring scheduling related functionality.               
 software     Manage the ESXi software image and packages                                      
 storage      VMware storage commands.                                                
 system      VMKernel system properties and commands for configuring properties of the kernel core system and related system   
          services.  
 systemInternal  Internal VMKernel system properties andcommands for configuring properties of the kernel core system.         
 user       VMKernel properties and commands for configuring user level functionality.                       
 vm        A small number of operations that allow a user to Control Virtual Machine operations.                 
 vsan       VMware Virtual SAN commands                                              
 Available Commands:   

Let me tell you again that this command is unsupported, therefore do not use it in production. On the other hand, it is very cool to test it in our labs ...