Wednesday, September 14, 2016

VMworld 2016 US sessions worth to watch

Here is the list of VMworld 2016 sessions from US event I watched or still have to watch during next days and weeks. After watching the session I do categorization and brief description of sessions. I'm also assigning category labels and technical level to each session.

Category labels:
  • Strategy
  • Architecture
  • Operations
  • High Level Product Overview
  • Deep Dive Product Overview
  • Technology Preview
  • Idea for Improvement

Technical levels:
  • Basic
  • Middle
  • Advanced

Note: If you will have trouble with re-play particular session from my links below go to OFFICIAL VMWORLD SITE and search for particular session by session code. You will need to register there but it still should be free of charge.

Already watched and categorized sessions

Session code: INF9044
Speaker(s): Emad Younis
Category labels: Technology Preview
Technical level: Basic-Middle
Brief session summary: Introduction and demo of vCenter migration tool for easy migration from Windows based vCenter to vCenter Server Appliance (VCSA).

Session code: INF8260
Speaker(s): William Lam, Alan Renouf
Category labels: Technology Preview
Technical level: Middle-Advanced
Brief session summary: William explains current automation possibilities of automated VCSA deployment and Alan presents what is coming soon. In nutshell, REST API is coming and it is really great if you ask me.

Session code: INF8108
Speaker(s): Ravi Soundararajan, Priya Sethuraman
Category labels: Deep Dive Product Overview
Technical level: Advanced
Brief session summary: At the begging there are presented some numbers explaining vCenter Server Appliance performance benefits over Windows based vCenter. At 00:07:00 vCenter deep dive starts with explanation of vCenter internal architecture (vsphere-client, sso, directory-service, vpxd, vpxd-svcs,  vmware-sps [aka storage profile services], perfcharts, eam [esx agent manager] ). The new info for me was that vCenter does not use inventory services anymore and since 2015 it is replaced by vpxd-svcs. Then vsphere-client plugin architecture is explained.
Single vCenter Internal Architecture
Later search use cases and consequences for different vCenter/SSO topologies are explained.
Search in Single Site multi vCenter Topology
Search in Multi Site multi vCenter Topology
In next section, other PSC and vCenter performance considerations are discussed. For PSC 2 vCPU and 4 GB is sufficient for any environment.

vCenter 6  has several hard limits good to know:
  • 640 concurrent operations before incoming requests are queued
  • 2000 concurrent sessions (user sessions + incoming requests +   remote console sessions)
ESXi host limits
  • A host can perform up to 8 provisioning operations at once (provisioning = clone, vMotion, relocate, snapshot, etc.)
  • If host is source and destination then host can only do 4 operations at once
Datastore limits
  • A datastore can perform up to 128 vMotions at once
  • A datastore can perform up to 8 Storage vMotions at once
In presentation is highlighted the fact that, vSphere 6 supports dedicated vmknic with special Provisioning TCP/IP Stack for Cold Migration, Cloning, and Snapshots. I did not know that! It is pretty cool management and potential performance optimization trick.
Dedicated vmknic configuration for provisioning
Higher latency between vCenter and ESXi does NOT have so big impact on vCenter operations (up to 100ms is good) but latency between vCenter and database is critical. Therefore embedded database is recommended and preferred configuration.

vCenter performance stats levels have big impact on vCenter performance. The biggest performance drop (4x) is between level 1 and level 2. Therefore, there is recommendation to keep stats on level 1 and use some external monitoring solution (like vROps) for historical performance monitoring and capacity planning or size your database performance appropriately.

If you have VCSA you can use tools like vimtop and cloudvm-ram-size for performance troubleshooting and tuning.

At the end of presentation, there are presented general conclusions. Generally, more hardware resources improves vCenter 6 performance significantly more then vCenter 5. Additional vCenter plugins and extensions can require more hardware resources.  Future releases of vCenter provide better scale and performance. Use the appliance (VCSA) to achieve better performance and simplified management.

vSPHERE HA


Session code: INF8045
Speaker(s): Manoj Krishnan, Matthew Meyer
Category labels: Technology Preview
Technical level: Middle-Advanced
Brief session summary: At the beginning of presentation, vSphere HA basics (master/slaves, heart beating, host failure scenarios, partitioning, host isolation, ) are explained by Matthew and Manoj. After basic HA intro Matthew explains VM Component Protection (HA APD and PDL responses). Matthew explanation of PDL (00:12:20) is little bit strange because he is explaining it on example with LUN masking on fibre channel switches which is not the right case. LUN masking is done on storage arrays. In fibre channel switches we usually do zoning which is different technique. Matthew claims ESXi host will get PDL after wrong SAN fabric reconfiguration which is IMHO not the case because PDL has to be sent from storage through each particular path. Therefore if there is not path between ESXi host (initiator) and storage front-end port (target) ESXi cannot get PDL from storage array. Yes, you can get PDL from storage array when you will do LUN masking mistake on storage array. Slides are correct, just explanation example is not correct. The more interesting is Manoj explanation of APD timeouts (00:13:45).  APD VM recovery (VMCP APD response) is executed after ESXi APD Timeout (140 seconds) + VMCP APD Timeout (default is 3 minutes).  VMCP APD Timeout can be configured through vSphere Web Client. Another option is "Response for APD recovery after APD timeout". This response (reset or no) is done in case datastore appears back to ESXi host during "VMCP APD Timeout" period. In such case affected VMs are not restarted on different host in cluster but they are restarted on the same ESXi host. This prevents potential VM panic state inside VM guest OS'es because of storage unavailability. Next presented topic are networking and storage recommendations. Presentation continues with explanation of HA differences when using conventional storage or VSAN. When conventional storage is used then management network is used for network heart beating, any datastore connected to more then one host can be used for storage heartbeat and host is claimed as isolated when isolation address cannot be pinged. When VSAN is used, then VSAN network for network heart beating, storage heart beating can be used only if traditional datastores exists as well and host isolation is claimed when host cannot ping isolation addresses over VSAN network. At 00:28:43 presentation covers Admission Control details and HA integration with DRS. HA can ask DRS for cluster defragmentation in case there is not any available space for failed VMs. At 00:34:31 starts technology preview of new things VMware is working on - vSphere HA priorities (5 priorities instead of 3), HA orchestrated restart (dependencies among VMs ), FTT Admission Control (Percentage based but calculated automatically), Proactive HA (vMotion in case of hardware health degradation).

ESXi

Session name: vSphere 6.x Host Resource Deep Dive
Session code: INF8430
Speaker(s): Frank Denneman, Niels Hagoort
Category labels: Architecture
Technical level: Advanced
Brief session summary: Frank is presenting his findings about NUMA and implications for vSphere Architecture. Frank is also explaining the difference between RAM, local NVMe and external storage proxiimity and impact on access times. Niels is very nicely explaining how hardware features like VXLAN, RSS, VMDq, device drivers and drivers parameters impacts performance of virtual overlay networking (VXLAN).   

VIRTUAL SAN

Session name: VSAN Networking Deep Dive and Best Practices
Session code: STO8165R
Speaker(s): Ankur Pai, John Nicholson
Category labels: Architecture, Deep Dive Product Overview
Technical level: Advanced
Brief session summary: John starts presentation and he discuss what kind of virtual switch to use. You can use standard or distributed. Distributed virtual switch is highly recommended because of configuration consistency and advanced features like NIOC, LACP, etc. Next there is discussion about multicast used for VSAN. Multicast is used just for state and metadata. VSAN data traffic is transfer via unicast. IGMP snooping for L2 should be always configured to optimize metada traffic and PIM just in case the VSAN traffic should go over L3. VSAN uses two multicast addresses - 224.2.3.4 port 23451 (Agent Group Multicast Address) and 224.1.2.3 port 12345 (Master Group Multicast Address). These default multicast addresses has to be changed in case two VSAN clusters are running in the same broadcast domain. John is sharing esxcli commands how to change default multicast addresses but generally it is recommended to keep each VSAN cluster in dedicated non routable VLAN (single broadcast domain). John explains that VSAN is supported on both L2 and L3 topologies but keep it in L2 topology is significantly simpler. Physical network equipment, cabling and subscription ratios matters. John warns against CISCO FEX (port extender) and Blade Chassis switches where oversubscription is usually expected. Next topic is troubleshooting and presenters introduce VSAN health-check plugin which removes the need for cli (Ruby vSphere Console aka RVC) troubleshooting commands. VSAN does not need Jumbo Frames but you can use it if dictated by some other requirement. LLDP/CDP should be enabled for better visibility and easier troubleshooting.  After 24 minute John pass the presentation to Ankur. Ankur is presenting Virtual SAN stretched cluster networking considerations. He starts with high level overview of stretched VSAN topology and witness requirement in third location. L2 stretch network is recommended even L3 is supported but more complex. Cross site round trip has to be less then 5 ms and recommended bandwidth is 10Gbps. Witness can be connected over L3 with roundtrip less then 200ms and bandwidth 100Mbps (2Mbps per 1000 VSAN components). Communication between the witness and main site is L3 unicast - tcp 2233 (IO) and udp 23451 (cluster) both directions. Periodic heartbeating between the witness and the main site occurs every second. Failure is declared after 5 consecutive failures. Static routes have to be configured between data hosts and witness host because custom ESXi TCP/IP stack for VSAN is not supported at the moment. Ankurs presents three form factors of witness appliance (Large, Medium, Tiny). Later Ankurs explains bandwidth calculations for cross site interconnect and another calculation for witness network bandwidths requirements. At the end there is Q&A for about 5 minutes.   

Session name: Extreme Performance Series: Virtual SAN Performance Troubleshooting
Session code: STO8743
Speaker(s): Zach Shen, Ruijin Zhou
Category labels: Deep Dive Product Overview
Technical level: Advanced
Brief session summary: Zach and Ruijin are from VMware Performance Engineering team. Zach started presentation by quick VSAN architecture overview depicted below.
VSAN Architecture
VSAN Architecture - LSOM detail
At 7:10 Ruijin takes the stage and explains what VSAN Observer is. VSAN observer has to be initiated from RVC (Ruby vSphere Console) and then you can access live data via web URL: https://vc-hostname:8010. Ruijin goes through VSAN Observer GUI and explains what is what. At 25:25 presentation continues by troubleshooting tool #2, which is VSAN Performance Service.  VSAN Perfo rmance Service is integrated in vSphere Web Client. One can monitor common metrics (IOPS, latency, throughput) in 5 minutes granularity. Single plot range 24 hours and 90 days are available before roll over.


VIRTUAL VOLUMES (VVOLs)

Session name: Top 10 Thing You MUST Know Before Implementing Virtual Volumes
Session code: STO5888
Speaker(s): Eric Siebert
Category labels: Architecture, Deep Dive Product Overview
Technical level: Middle
Brief session summary: Eric Siebert from HPE explains VVOLs 1.0 architecture in pretty nice detail with some HP 3PAR specific implementation details which is very good for understanding how particular VVOL implementation fits into VMware VVOL general framework.

BUSINESS CRITICAL APPLICATIONS

Session name: Performance Tuning and Monitoring for Virtualized  Database Servers
Session code: VIRT7511
Speaker(s): David Klee, Thomas LaRock
Category labels: Architecture, Operations
Technical level: Middle
Brief session summary: TBD

vSPHERE CONFIGURATION MANAGEMENT

Session name: Enforcing a vSphere Cluster Design with PowerCLI Automation
Session code: INF8036
Speaker(s): Duncan Epping, Chris Wahl
Category labels: Idea for Improvement
Technical level: Middle to Advanced
Brief session summary: TBD


SITE RECOVERY MANAGER

Session name: SRM with NSX: Simplifying Disaster Recovery Operations & Reducing RTO
Session code: STO7977
Speaker(s): Rumen Colov, Stefan Tsonev
Category labels: High Level Product Overview
Technical level: Basic-Medium
Brief session summary: Rumen Colov is SRM Product Manager so his part of session is more about High Level introduction to SRM and NSX integration. Session really starts from 8:30 min. Before it is just introduction. Then Rumen explains main SRM use cases - Disaster Recovery, Datacenter Migration, Disaster Avoidance - and differentiation among them. The latest SRM versions supports zero-downtime VM mobility in Active/Active datacenter topology.  Then there is explained Storage Policy-based Management integration with SRM and NSX 6.2 interoperability with NSX stretched logical wire (NSX Universal Logical Switch) across two vCenters. Ruman also covers product licensing editions required for integration.  In 0:22:53 Rumen says that network inventory auto-mapping for NSX stretched networking works only with storage based replication (SBR) and not with host based replication (aka HBR or vSphere Replication). It seems to me the reason is because for SBR information about network connectivity are saved in VMX file which is replicated by storage but for HBR we would need manual inventory mappings. In 0:24:00 presentation is handed over to Stefan Tsonev (Director R&D).  Stefan shows several screenshots with SRM and NSX integration. It seem that Stefan is very deeply SRM oriented with basic NSX knowledge but it is very nice introduction into SRM and NSX integration basics. 

LOG INSIGHT

Session name: Insight into the World of Logs with VMware vRealize Log Insight
Session code: MGT7685R
Speaker(s): Iwan Rahabok, Karl Fultz, Manny Sidhu
Category labels: Operations, Deep Dive Product Overview
Technical level: Middle
Brief session summary: Very nice walkthrough VMware LogInsight use cases and real examples how to analyze logs. At the end of presentation are explained some architecture topologies for log management in financial institution.

Session name: vSphere Logs Grow Up! Tech Preview of Actionable Logging with vRealize Log Insight
Session code: INF8845
Speaker(s): Mike Foley, Antoan Arnaudov
Category labels: Technology Preview
Technical level: Basic
Brief session summary: In this session, Antoan and Mike will show you what is coming in next release of vSphere from logging perspective. We can expect significantly improved details in vCenter events which are sent to syslog server and if you use something like LogInsight you can do a lot of magics with these details. It can significantly helps with security audits, configuration management, etc. 

PSO and CoE

Session name: How to Relocate Your Physical Data Center Without Downtime in a Few Clicks, Thanks to Automation
Session code: INF8837
Speaker(s): Rene-Francois Mennecier (PSO), Constantin Natchev (CoE)
Category labels: Operations, Idea for Improvement
Technical level: Middle
Brief session summary: Rene-Francois and Constantin presents the datacentre migration project they have delivered to one of the largest banks in Europe. The project was fully automated by vRealize Orchestrator and enhanced storage vMotion (vMotion share nothing) was leveraged for workload migrations.

Still have to watch and categorize sessions below

STO7650 - Duncan Epping, Lee Dilworth : Software-Defined Storage at VMware Primer
Downtime in a Few Clicks, Thanks to Automation
INF8858 - vSphere Identity: Multifactor Authentication Deep Dive
INF9089 - Managing vCenter Server at Scale? Here's What You Need to Know
INF8553 - The Nuts and Bolts of vSphere Resource Management
INF8959 - Extreme Performance Series: DRS Performance Deep Dive—Bigger Clusters, Better Balancing, Lower Overhead
VIRT8530R - Deep Dive on pNUMA & vNUMA - Save Your SQL VMs from Certain DoomA!
INF8780R - vSphere Core 4 Performance Troubleshooting and Root Cause Analysis, Part 1: CPU and RAM
INF8701R - vSphere Core 4 Performance Troubleshooting and Root Cause Analysis, Part 2: Disk and Network
INF9205R - Troubleshooting vSphere 6 Made Easy: Expert Talk
INF8755R - Troubleshooting vSphere 6: Tips and Tricks for the Real World
INF8850 - vSphere Platform Security
VIRT8290R - Monster VMs (Database Virtualization) Doing IT Right
VIRT7654 - SQL Server on vSphere: A Panel with Some of the World's Most Renowned Experts
VIRT7621 - Virtualize Active Directory, the Right Way!
INF8469 - iSCSI/iSER: HW SAN Performance Over the Converged Data Center
SDDC7808-S - How I Learned to Stop Worrying and Love Consistency: Standardizing Datacenter Designs
INF9048 - An Architect's Guide to Designing Risk: The VCDX Methodology
INF8644 - Getting the Most out of vMotion: Architecture, Features, Performance and Debugging
INF8856 - vSphere Encryption Deep Dive: Technology Preview
INF8914 - Mastering the VM Tools Lifecycle in your vSphere Data Center
INF7825 - vSphere DRS and vRealize Operations: Better Together
INF7827 - vSphere DRS Deep Dive: Understanding the Best Practices, Advanced Concepts, and Future Direction of DRS
INF8275R - How to Manage Health, Performance, and Capacity of Your Virtualized Data Center Using vSphere with Operations Management
INF9047 - Managing vSphere 6.0 Deployments and Upgrades
STO7965 - VMware Site Recovery Manager: Technical Walkthrough and Best Practices
STO7973 - Architecting Site Recovery Manager to Meet Your Recovery Goals
STO8344 - SRM with vRA 7: Automating Disaster Recovery Operations
STO8246R - Virtual SAN Technical Deep Dive and What’s New
STO8750 - Troubleshooting Virtual SAN 6.2: Tips & Tricks for the Real World
STO7904 - Virtual SAN Management Current & Future
STO8179R - Understanding the Availability Features of Virtual SAN
STO7557 - Successful Virtual SAN 6 Stretched Clusters
STO7645r - Virtual Volumes Technical Deep Dive
INF8038r - Getting Started with PowerShell and PowerCLI for Your VMware Environment
NET7858R - Reference Design for SDDC with NSX and vSphere: Part 2
CNA9993-S - Cloud Native Applications, What it means & Why it matters
CNA7741 - From Zero to VMware Photon Platform
CNA7524 - Photon Platform, vSphere, or Both?
SDDC7502 - On the Front Line: A VCDX Perspective Working in VMware Global Support Services
VIRT9034 - Oracle Databases Licensing on a Hyper-Converged Platform
VIRT9009 - Licensing SQL Server and Oracle on vSphere
INF9083 - Ask the vCenter Server Experts Panel
INF9151 - Getting to Zero: Zero Downtime, Zero Data loss with vSphere Fault Tolerance
INF9119 - How To manage PSC like Batman
INF8225 - The vCenter Server and PSC guide to the Gallaxy
INF9144 - An Overview of vCenter Server Appliance management interface
INF9128 - Day 2 operations: A vCenter Server Administrator's Diary
INF8172 - vSphere Client Roadmap: Host Client, HTML 5 Client, and web Client
INF8631 - VMware Certificate Management for Mere Mortals
INF8465 - Power Management's Impact on Performance
INF8089 - vSphere Compute and Memory
VIRT7598 - Monster VM Database Performance

All VMworld US 2016 Breakout Sessions

All VMworld US 2016 session are listed here.

If you cannot play session from list above go to OFFICIAL SITE of VMWORLD 2016 US and search for particular session. You will need to register there but it is free of charge.

And here is another repository of all videos for VMworld 2016 US.

Wednesday, September 07, 2016

VMware Virtual Machine Hardware Version and CPU Features

I always thought that only device not virtualized by VMware ESXi is the CPU. It is generally true but I have just been informed by someone that available CPU instructions sets (Features) are dependent on VM hardware version. CPU Features are generally enhanced CPU Instruction sets for special purposes. For more information about CPUID and Features read this.

My regular readers knows that I don't believe anything unless I test it. Therefore I did a simple test. I provisioned new VM with hardware version 4, installed FreeBSD OS and identified CPU features. You can see screenshot below.

VM with hardware version 4 and FreeBSD Guest OS
Next, I provisioned another VM with hardware 10 with FreeBSD OS on the same ESXi hosts and listed CPU Features. See screenshot below.

VM with hardware version 10 and FreeBSD Guest OS
Now, if you compare CPU Features you can see differences. Following CPU Features are added in VM hardware version 10:

  • FMA
  • PCID
  • X2APIC
  • XSAVE
  • OSXSAVE
  • AVX
  • F16C
  • RDRAND
  • HV
Does it matter? Well, it depends if your particular application really needs advanced CPU features.

vCPU in VMware virtualization is really not virtualized but some CPU Features are masked in older VM hardware because VM hardware emulates particular chipset.

I did not know that! I have never thought about it! My bad.

Anyway, this is just another proof that everyday there is some surprise and you can always learn something new even in area where you believe you are good at. We never know everything.

Monday, July 25, 2016

How to read BIOS settings from HP server

Sometimes it is pretty handy how to read BIOS settings from modern HP server. Let's assume you have server ouf-of-band remote management card (aka HP iLO).

HP iLO 4 and above supports RESTful API. Here is the snippet from "HPE iLO 4 User Guide".
iLO RESTful API 
iLO 4 2.00 and later includes the iLO RESTful API. The iLO RESTful API is a management interface that server management tools can use to perform server configuration, inventory, and monitoring via iLO. A REST client, such as the RESTful Interface Tool, sends HTTPS operations to the iLO web server to GET and PATCH JSON-formatted data, and to configure supported iLO and server settings, such as the UEFI BIOS settings.
So you can leverage REST API calls or if you like PowerShell you can simplify it by precooked HP command-lets.

Following PowerShell code should show the level of power versus performance for the system.

 $ilo = 192.168.0.100   
 $bios = Connect-HPBIOS $ilo -Username "username" -Password "password"  
 Get-HPBIOSPowerProfile $ilo  
 Disconnect-HPBIOS $ilo  

Other sources:

Sunday, July 24, 2016

ESXi PSOD and HeartbeatPanicTimeout

A Purple Screen of Death (PSOD) is a diagnostic screen with white type on a purple background that is displayed when the VMkernel of an ESX/ESXi host experiences a critical error, becomes inoperative and terminates any virtual machines that are running.  For more info look here.

Nobody is happy to see PSOD in ESXi host but it is important to say that it is just another safety mechanism how to protect your server workloads because PSOD is intentionally initiated by ESXi's vmkernel in situations when something really bad happens in low level. It is usually related to hardware, firmware or driver issue. You can find further information in VMware KB article - Interpreting an ESX/ESXi host purple diagnostic screen (1004250).

The main purpose of this blog post is to explain the timing of PSOD for just single type of error message - "Lost heartbeat". If there is no heartbeat in some time interval PSOD looks like screenshot below. 

no heartbeat
There is no doubt that something serious has to happened in ESXi vmkernel, however regardless what exactly happened following two vSphere advanced settings are used to control heartbeat time interval in which heartbeat must be received otherwise PSOD is executed.   
  • ESXi - Misc.HeartbeatPanicTimeout
  • VPXD (aka vCenter) - vpxd.das.heartbeatPanicMaxTimeout
Let's start with ESXi advanced setting Misc.HeartbeatPanicTimeout. It defines interval in seconds after which vmkernel goes to panic if no heartbeat is received. Please, don't mixed this "Panic Heartbeat" with "HA network heartbeat". These two heartbeats are very different. "HA network heartbeat" is heart beating mechanism between HA cluster members (master<-><->sleaves) over ethernet network but "Panic Heartbeat" is heartbeat inside single ESXi host between vmkernel and COS software components. You can see "Panic Heartbeat" settings by issuing following esxcli command
esxcli system  settings advanced list | grep -A10 /Misc/HeartbeatPanicTimeout
 [root@esx01:~] esxcli system settings advanced list | grep -A10 /Misc/HeartbeatPanicTimeout  
   Path: /Misc/HeartbeatPanicTimeout  
   Type: integer  
   Int Value: 14  
   Default Int Value: 14  
   Min Value: 1  
   Max Value: 86400  
   String Value:  
   Default String Value:  
   Valid Characters:  
   Description: Interval in seconds after which to panic if no heartbeats received  

I have tested that  Misc.HeartbeatPanicTimeout has different values in different situations. Default value is always 14 seconds but
  1. if you have single standalone ESXi host not connected to HA Cluster effective value is 900 seconds
  2. if you have ESXi host as a member of vSphere HA Cluster then the value is 14 seconds
So now we know that the value in ESXi host with enabled HA is 14 seconds (panicTimeoutMS = 14000) and it usually works without any problem. However, if you will, from whatever reasons, decide to change this value it is worth to know that in HA enabled ESXi host is in HA code hardcoded cap of 60 seconds on this value. It is a cap so it does not change the value if it is already less than 60. However, if you use for example the value 900 it will be caped to 60 seconds anyway. I did a test in vSphere6/ESXi6 and it works exactly like that and I assume it works in the same way in vSphere5/ESXi5.

Side note: It was very different in vSphere4/ESXi4 because HA cluster was rewritten in vSphere 5 from the scratch but it is already a history and I hope nobody use vSphere4 anymore.

Behavior justification:
Behavior described in paragraph above makes perfect sense if you ask me. If you have standalone ESXi host and you are experiencing some hardware issue it is better to wait 900 seconds (15 minutes) before ESXi goes to PSOD state because virtual machines running on top of this ESXi host cannot be automatically restarted in other ESXi hosts anyway. And guess what, if ESXi host have some significant hardware failure, it has most probably negative impact on virtual machines running on top of this particular ESXi host, right? Unfortunately, if you have just a single ESXi host vSphere cannot do anything for you.

On the other hand, if affected ESXi host is a member of vSphere HA cluster then it is better to wait only 14 seconds (by default) or maximally 60 seconds and put ESXi host into PSOD quicker because HA cluster will restart affected virtual machines automatically and helps to mitigate the risk of unavailable virtual machines and with that application services running inside these virtual machines.

So that's the explanation how ESXi setting /Misc/ HeartbeatPanicTimeout behaves. Now we can look what vpxd.das.heartbeatPanicMaxTimeout setting is. My understanding is that vpxd.das.heartbeatPanicMaxTimeout is vCenter (VPXD) global configuration for ESXi advanced setting Misc.HeartbeatPanicTimeout. But don't forget that HA cluster is capping Misc.HeartbeatPanicTimeout value on ESXi hosts as described above.

You can read further details about vpxd.das.heartbeatPanicMaxTimeout in VMware KB 2033250 but I think that following description is little bit misleading.
"This option impacts how long it takes for a host impacted by a PSOD to release file locks and hence allow HA to restart virtual machines that were running on it. If not specified, 60s is used. HA sets the host Misc.HeartbeatPanicTimeout advanced option to the value of this HA option. The HA option is in seconds."
My understanding is that description should be reworded to something like ...
"This option is in seconds and impacts how long it takes for ESXi host experiencing some critical issue to go into a PSOD. Setting vpxd.das.heartbeatPanicMaxTimeout is a global setting used for vCenter managed ESXi advanced option Misc.HeartbeatPanicTimeout however Misc.HeartbeatPanicTimeout is adjusted automatically in certain situations. 
In standalone ESXi host 900s is used. In vSphere HA Cluster ESXi host it is automatically changed to 14s and capped to maximum of 60s. This setting have indirect impact on time when file locks are released and hence allow HA cluster to restart virtual machines that were running on affected ESXi host."
Potential side effects and impacts
  • ESXi HA Cluster restart of virtual machines - if your Misc.HeartbeatPanicTimeout is set to 60 seconds than HA cluster will most probably try to restart VMs on another ESXi hosts because network heartbeat (also 14 seconds) will not be received. However because it is not in PSOD the file lock still exist and VM restart will be unsuccessful. 
  • ESXi Host Profiles - if you use the same host profile for HA protected and also non-protected ESXi hosts then it can report difference of Misc.HeartbeatPanicTimeout against compliance.
Blog posts in blogosphere covering "no heartbeat" issues:

Friday, July 15, 2016

DELL Force10 : DNS, Time and Syslog server configuration

It is generally good practice to have time synchronized on all network devices and configure remote logging (syslog) to centralized syslog server for proper troubleshooting and problem management. Force10 switches are not exceptions therefore let's configure time synchronization and remote logging to my central syslog server - VMware LogInsight in my case.

I would like to use hostnames instead of IP addresses so let's start with DNS resolution, continue with time settings and finalize the mission with remote syslog configuration.

Below are my environment details:

  • My DNS server is 192.168.4.21
  • DNS domain name is home.uw.cz
  • I will use internet following three NTP servers/pools - ntp.cesnet.cz, ntp.gts.cz and cz.pool.ntp.org
  • My syslog server is at syslog.home.uw.cz 

Step 1/ DNS resolution configuration
f10-s60#conf
f10-s60(conf)#ip name-server 192.168.4.21
f10-s60(conf)#ip domain-name home.uw.cz
f10-s60(conf)#ip domain-lookup
f10-s60(conf)#exit
Don't forget to configure "ip domain-lookup" because it is the command which enables domain name resolution.

Now let's test name resolution by ping www.google.com
f10-s60#ping www.google.com            Translating "www.google.com"...domain server (192.168.4.21) [OK]
Type Ctrl-C to abort.
Sending 5, 100-byte ICMP Echos to 172.217.16.164, timeout is 2 seconds:!!!!!Success rate is 100.0 percent (5/5), round-trip min/avg/max = 40/44/60 (ms)
We should also test some local hostname in long format
f10-s60#ping esx01.home.uw.cz        
Translating "esx01.home.uw.cz"
...domain server (192.168.4.21) [OK]
Type Ctrl-C to abort.
Sending 5, 100-byte ICMP Echos to 192.168.4.101, timeout is 2 seconds:
!!!!!
Success rate is 100.0 percent (5/5), round-trip min/avg/max = 0/0/0 (ms)
and short format
f10-s60#ping esx01
Translating "esx01"
...domain server (192.168.4.21) [OK]
Type Ctrl-C to abort.
Sending 5, 100-byte ICMP Echos to 192.168.4.101, timeout is 2 seconds:
!!!!!
Success rate is 100.0 percent (5/5), round-trip min/avg/max = 0/0/0 (ms)
Step 2/ Set current date, time and NTP synchronization

You have to decide if you want to use GMT or local time. The hardware time should be always set to GMT and you can configure timezone and summer-time if you wish. So let's configure GMT time in the first place.
f10-s60#calendar set 15:12:46 july 15 2016
and test it
f10-s60#sho calendar
15:12:39  Fri Jul 15 2016
Ok, so hardware time is set correctly to GMT.

If you really want to play with timezone and summer-time you can do it in conf mode with following commands.

f10-s60(conf)#clock ?
summer-time         Configure summer (daylight savings) time
timezone             Configure time zone   
I prefer to keep GMT time everywhere because it, in my opinion, simplifies troubleshooting, problem management and capacity planning.

Step 3/ Configuration of remote logging

FTOS by default doesn't use date and time for log messages. It uses uptime (time from last boot) therefore you can see when something happened since last system boot. However, because we already have time configured properly it is good idea to change this default behavior to use date and time.
f10-s60(conf)#service timestamps log datetime
To be honest, you generally don't need date and time on log messages because remote syslog server will add date and time to messages but I generally prefer to have both times - time from device and time when message arrived to syslog server. If you want to disable time stamping on syslog messages, use no service timestamps [log | debug].

And now, finally, let's configure remote syslog server by single configuration command
f10-s60(conf)#logging syslog.home.uw.cz
Translating "syslog.home.uw.cz"
Translating "syslog.home.uw.cz"
...domain server (192.168.4.21) [OK]
And we are done. Now you can see incoming log messages in your syslog server. See screenshot of my VMware Log Insight syslog server.

VMware Log Insight with Force10 log messages.
Hope you find it useful and as always - any comment is very appreciated.

Monday, June 27, 2016

ESXi boot mode - UEFI or BIOS

Legacy BIOS bootstrapping along with a master boot record (MBR) is uses with x86 compatible systems for ages. The concept of MBRs was publicly introduced in 1983 with PC DOS 2.0. It is unbelievable that we are still using the same concept after more then 30 years.

However, there must be some limitations in 30 years old technology, isn't it?

BIOS limitations (such as 16-bit processor mode, 1 MB addressable space and PC AT hardware) had become too restrictive for the larger server platforms. The effort to address these concerns began in 1998 and was initially called Intel Boot Initiative later renamed to EFI. In July 2005, Intel ceased its development of the EFI specification at version 1.10, and contributed it to the Unified EFI Forum, which has evolved the specification as the Unified Extensible Firmware Interface (UEFI).

What is EFI (or UEFI) firmware?

UEFI replaces the the old Basic Input/Output System (BIOS). UEFI can be used on

  • physical server booting ESXi hypervisor or 
  • Virtual Machine running on top of ESXi hypervisor. 
This blog post is about ESXi boot mode however just for completeness I would like to mention that VMware Virtual Machine with at least hardware version 7 supports UEFI as well. For further information about VM UEFI look at [1].

Originally called Extensible Firmware Interface (EFI), the more recent specification is known as Unified Extensible Firmware Interface (UEFI), and the two names are used interchangeably.

EFI (Extensible Firmware Interface) is a specification for a new generation of system firmware. An implementation of EFI, stored in ROM or Flash RAM, provides the first instructions used by the CPU to initialize hardware and pass control to an operating system or bootloader. It is intended as an extensible successor to the PC BIOS, which has been extended and enhanced in a relatively unstructured way since its introduction. The EFI specification is portable, and implementations may be capable of running on platforms other than PCs.
 
For more information, see  the Wikipedia page for Unified Extensible Firmware Interface.

General UEFI Advantages

UEFI firmware provides several technical advantages over a traditional BIOS system:

  • Ability to boot from large disks (over 2 TB) with a GUID Partition Table (GPT)
  • CPU-independent architecture
  • CPU-independent drivers
  • Flexible pre-OS environment, including network capability
  • Modular design
  • Since UEFI is platform independent, it may be able to enhance the boot time and speed of the computer. This is especially the case when large hard drives are in use. 
  • UEFI can perform better while initializing the hardware devices.
  • UEFI can work alongside BIOS. It can sit on top of BIOS and work independently.
  • It supports MBR and GPT partition types.

Note: Modern systems are only emulating the legacy BIOS. They are EFI native.

UEFI on ESXi

vSphere 5.0 and above supports booting ESXi hosts from the Unified Extensible Firmware Interface (UEFI). With UEFI, you can boot systems from hard drives, CD-ROM drives, USB media, or network.

UEFI benefits

  • ESXi can boot from a disk larger than 2 TB provided that the system firmware and the firmware on any add-in card that you are using supports it. 

UEFI drawbacks

  • Provisioning with VMware Auto Deploy requires the legacy BIOS firmware and is not available with UEFI BIOS configurations. I hope that this limitation will be lifted soon.

Notes: Changing the host boot type between legacy BIOS and UEFI is not supported after you install ESXi 6.0. Changing the boot type from legacy BIOS to UEFI after you install ESXi 6.0 might cause the host to fail to boot.

Conclusion

UEFI is meant to completely replace BIOS in the future and bring in many new features and enhancements that can’t be implemented through BIOS. BIOS can be used in servers that do not require large storage for boot. To be honest, even you can, it is not very common to use boot disks greater then 2 TB for ESXi hosts therefore you may be using BIOS at the moment, but I would recommend shifting to UEFI, as it is the future while BIOS will fade away slowly.

ESXi hypervisor supports both boot modes therefore if you have modern server hardware and don't use VMware Auto Deploy then UEFI should be your preferred choice.

References:
[1] VMware : Using EFI/UEFI firmware in a VMware Virtual Machine 
[2] VMware : Best practices for installing ESXi 5.0 (VMware KB 2005099)
[3] VMware : Best practices to install or upgrade to VMware ESXi 6.0 (VMware KB 2109712)
[4] Usman Khurshid : [MTE Explains] Differences Between UEFI and BIOS
[5] Wikipedia : Unified Extensible Firmware Interface

Thursday, June 16, 2016

Role and responsibility of IT Infrastructure Technical Architect

In this article, I would like to describe the infrastructure architect role and his responsibility.

Any architect generally leads the design process with the goal to build the product.  The product can be anything the investor would like to build and use. The architect is responsible to gather all investor's goals, requirements, constraints and try to understand all use cases of the final product.

The product of IT technical infrastructure architect is an IT infrastructure system, also known as a computer system, running an IT applications supporting business services. That's very important statement. Designed IT infrastructure system is usually not built just in sake of infrastructure itself but to support business services.

There is no doubt that the technical architect must be a subject matter expert in several technical areas including computing, storage, networking, operating systems and applications but that's just a technical foundation required to fulfill all technical requirements. However, systems are not impacted just by technology but also by other external non-technical factors like business requirements, operational requirements and human factors. It is obvious that the architect's main responsibility is to fulfill all these requirements of the final product, IT infrastructure system in this particular case, although the last mentioned factor, a human factor, usually has the biggest impact on any systems design because we usually build systems for human usage and these systems has to be also maintained and operated by other humans as well.

Now, when we know what the IT Infrastructure Technical Architect does, let's describe what are his typical tasks and activities.

The Architect has to communicate with investor's stakeholders to gather all design factors including requirements, constraints and use cases. Unfortunately, there are usually also some design factors nobody have a specific requirement. These factors has to be documented as assumptions. When all relevant design factors are collected and revalidated with requestors and investor authorities, the architect starts design analysis and prepare conceptual design. The conceptual design is a high level design which helps to understand the overall concept of proposed product. Such conceptual design has to be reviewed by all design stakeholders and when everybody feels comfortable with the concept the architect can start low level design.

Low level design is usually prepared as decomposition of conceptual design. Low level design should be decomposed into several design areas because it is almost always beneficial to divide complex system into sub-systems until these become simple enough to be solved directly. This decomposition approach is also known as "Divide and conquer" method. The main purpose of low level design is to document all details important for successful implementation and operation of the product. Therefore it must be reviewed and validated by particular subject matter experts - other architects, operators, and implementers - for particular area. The low level design is usually divided into logical and physical design. Logical design is detailed technical design but general logical components are used without using a particular suppliers physical product models. materials, configuration details or other physical specifications. The purpose of logical design is to document general principles principles of overall design or particular decomposed, thus simplified, design area. Logical design is also used for proper product sizing and capacity planning. Physical design, on the other hand, is detailed technical design with specific products, materials and implementation details. Physical design is primarily intended to product builders and implementors because the product is build or implemented based on the physical design.

It is good to mention that there is no product or system without a risk. That's another responsibility of the architect. He should identify and document all risks and design limitations associated with proposed product. The biggest threats are not risks in general but unknown risks. Therefore, potential risks documentation and risk mitigation options is very important architect's responsibility. Risk mitigation plan or at least contingency plan should be the part of product design.

At the end of the day, the design should be implemented therefore the implementation plan is just another activity and document the architect must prepare to make the product real even the implementation is usually out of the architect scope.

It is worth to mention, that here is no proven design without design tests. Therefore the Architect should also prepare and perform the test plan. Test plan have to include validation and verification part. Validation part validates design requirements after product build or implementation. Only after validation, the architect can honestly proof that the product really fulfill all requirements holistically. Verification part verifies that everything was implemented as designed and operational personnel knows how to operate and maintain the system.

There is no perfect design nor product, therefore the architect should continually improve even already built product by communication with end users, operators and other investor stakeholders and take their feedback in to account for future improvements. After some period of time, the architect should initiate design review and incorporate all gathered feedback in to the next design version.

Now, when we know what the architect is responsible for let's summarize what skills are important for any good architect. The architect must have following decent skills and expertise:

  • communication skills
  • presentation skills
  • consulting skills
  • cross check validation skills
  • documentation skills
  • systematic, analytical, logical and critical thinking
  • technical expertise
  • ability to think and work in different levels of detail
  • ability to see a big picture but also have attention for detail because the devil is in the details
Even you have read this article to this point, you can ask what is the architect main responsibility. That's a faire question. Here is short answer.

The architect main responsibility is the happiness of all users and actors using designed product during the whole lifecycle of the product.




Wednesday, June 01, 2016

Force10 Operating System 9.10 changes maximum MTU size

Force10 operating system (aka FTOS, DNOS) always had the maximal configurable MTU size per port 12000 bytes. I have just been informed by former colleague of mine that it is not the case since FTOS 9.10 and above. Since FTOS 9.10 the maximum MTU size per switch port is 9216. If you used MTU 12000 then after upgrade to firmware 9.10 the MTU should be adjusted automatically. But I have been told that it is automatically adjusted to standard MTU 1500 therefore if you use Jumbo Frames (9000 bytes payload) it is necessary to change configuration before upgrade from 12000 to 9216.

Disclaimer: I had no chance to test it so I don't guarantee all information on this post are correct. 

UPDATE: Please read comments below this article for further information and great Martin's explanation of real MTU behavior. Thanks Martin and Kaloyan for your comments.

Martin's comment:
MTU 12000 in configuration was not reflecting real hardware MTU of underlaying chipset, after upgrade to 9.10 it's just adjusted to reflect real hardware MTU. Tested on S4048 9.10(0.1). When you boot into 9.10 you can see log messages saying that configuration is adjusted to reflect real maximum hardware MTU.
Also in configuration

 ethswitch1(conf-if-te-1/47)#mtu ?  
 <594-12000> Interface MTU (default = 1554, hardware supported maximum = 9216)  
 ethswitch1(conf-if-te-1/47)#mtu