VCDX #200 The Ultimate Way to VirtualizeBlog of one VMware Infrastructure Designer

Friday, September 13, 2013

Troubleshooting Storage Performance in vSphere

Very good blog post series introduction to storage performance troubleshooting in VMware vSphere infrastructures.

Part 1 - The Basics
Part 2 - Troubleshooting Storage Performance in vSphere
Part 3 - SSD Performance

Everybody should read these storage basics before deep diving in to storage performance in shared infrastructures.

Wednesday, September 11, 2013

NFS or Fibre Channel Storage for VMware vSphere?

Final decision depends what do you want to get from your storage. Check out my newly uploaded presentation on SlideShare: http://www.slideshare.net/davidpasek/design-decision-nfsversusfcstorage-v03 where I'm trying to compare both options with special requirements from real customer engagement.

If you have any storage preference, experience or question please feel free to speak up in the comments.

What type of NIC teaming, loadbalancing and physical switch configuration to use for VMware's VXLAN?

As a former CISCO UCS Architect I'm observing VXLAN initiative almost 2 years so I was looking forward to do the real customer project. Finally it is here. I'm working on vSphere design for vCloud Director (vCD). To be honest I'm responsible just for vSphere design and someone else is doing vCD Design because I'm not vCD expert and I have just conceptual and high-level vCD knowledge. I'm not planning to change it in near future because I'm more focused on next generation infrastructure and vCD is in my opinion just another software for selling IaaS. I'm not saying it is not important. It is actually very important because IaaS is not just technology but business process. However nobody knows everything and I leave some work for other architects :-)

We all know that vCD sits on top of vSphere providing multi-tenancy and other IaaS constructs and since vCD 5.1 the network multi-tenancy segmentation is done by VXLAN network overlay. Therefore I have finally opportunity to plan, design and implement VXLANs for real customer.

Right now I'm designing network part of vSphere architecture and I describe VXLAN oriented design decision point bellow.

VMware VXLAN Information sources:

S1: VMware vShiled Administration Guide [Official source]
S2: VMware KB 2050697 [Official source]
S3: Duncan Epping blog post here. [Unofficial source]
S4: VMware VXLAN Deployment Guide available here. [Official source]

I would like to thanks Duncan for his blog post back in October 2012 right before Barcelona VMworld 2012 where VXLANs were officially introduced by VMware. Even it is unofficial information source it is very informative and I'm verifying it against official VMware documentation and white papers. Unfortunately I have realized that there is a lack of trustful and publicly available technical information till today and some information are contradictory. See bellow what confusion I'm facing and I would be very happy if someone help me to jump out from the circle.

Design decision point:
What type of NIC teaming, loadbalancing and physical switch configuration to use for VMware's VXLAN?

Requirements:

R1: Fully supported solution
R2: vSphere 5.1 and vCloud Director 5.1
R3: VMware vCloud Network & Security (aka vCNS or vShield) with VMware distributed virtual switch
R4: Network Virtualization and multi-tenant segmentation with VXLAN network overlay
R5: Leverage standard access datacenter switches like CISCO Nexus 5000, Force10 S4810, etc.

Constraints:

C1: LACP 5-tuple hash algorithm is not available on current standard access datacenter physical switches mentioned in requirement R5
C2: VMware Virtual Port ID loadbalancing is not supported with VXLAN Source: S3
C3: VMware LBT loadbalancing is not supported with VXLAN Source: S3
C4: ~~LACP must be used with 5-tuple hash algorithm Source: S3, S2, S1 on Page 48.~~ ~~[THIS IS STRANGE CONSTRAINT, WHY IT IS HASH DEPENDENT?]~~ Updated 2013-09-11: It looks like there is a bug in VMware documentation and KB Article. Thanks @DuncanYB and @fojta for confirmation and internal VMware escalations.

Available Options:

Option 1: Virtual Port ID
Option 2: Load based Teaming
Option 3: LACP
Option 4: Explicit fail-over

Option comparison:

Option 1: not supported because of C1
Option 2: not supported because of C2
Option 3: supported
Option 4: supported but not optimal because only one NIC is used for network traffic.

Design decision and justification:
Based on available information options 3 and 4 complies with requirements and constraints. Option 3 is better because network traffic is load balanced across physical NICs. That's not a case for option 4.

~~Other alternatives not compliant with all requirements:~~

~~Alt 1: Use physical switches with 5-tuple hash loadbalancing. That means high-end switch models like Nexus 7000, Force10 E Series, etc.~~
~~Alt 2: Use CISCO Nexus 1000V with VXLAN. They support LACP with any hash algorithm. 5-tuple hash is also recommended but not strictly required.~~

Conclusion:
I hope some information in constraints C2, C3, and C4 are wrong and will be clarified by VMware. I'll tweet this blog post to some VMware experts and hope someone will help me to jump out from the decision circle.
~~If you have any official/unofficial topic related information or you see anything where I'm wrong, please feel free to speak up in the comments.~~
Updated 2013-09-11: Constraint C4 doesn't exists and VMware doc will be updated.
Based on updated information LACP and "Explicit fail-over" teaming/load-balancing is supported for VXLANs. LACP is better way to go and "Explicit fail-over" is alternative in case LACP is not achievable on your environment.

Tuesday, September 10, 2013

Storage System Performance Analysis with Iometer

Excellent write up about IOmeter usage is here.

Quick troubleshooting of ESX and 10Gb Broadcom NeXtreme II negotiated only to 1Gb

I have just realized that my vmnic(s) in one DELL blade server M620 (let's call him BLADE1) is connected only at 1Gb speed even I have 10Gb NIC(s) connected to Force10 IOA blade module(s). It should be connected at 10Gb and another blade (let's call him BLADE2) with the same config is really connected at 10Gb speed.

So quick troubleshooting ... we have to find where is the difference

Let's go step by step ...

NIC ports on ESX vSwitch in BLADE1 are configured to use auto negotiation so no problem here
Ports on Force 10 IOA are also configured for auto negotiation and configuration is consistent across all ports in switch modules so that's not a problem.
ESX builds are the same on both blade servers.
What are NIC firmwares? On BLADE1 there is 7.2.14 and on BLADE2 7.6.15

Bingo!!! Let's upgrade NIC firmwares on BLADE1 and check if this was the root cause of the problem ...

Monday, September 09, 2013

Using SSL certificates for VMware vSphere Components

Streaming the certificate replacement and management process in a VMware environment can be challenging at times. For instance, changing certificates for a vCenter 5.1 is a hugely laborious process. And in a typical environment where there are a large number of hosts running, tracking and managing their certificates is difficult and time consuming. More importantly, security breaches due to lapsed certificates can prove to be very expensive to the organization. vCert Manager from VSS Labs provides fully automated management of SSL Certificates in a VMware environment across the entire lifecycle.

VSS Labs has solution to simplify SSL management. For more info look at http://vsslabs.com/vCert.html

To be honest I had no chance to test it because I avoid signed SSL certificates if possible. However when I'll have a customer who requires SSL I definitely have to evaluate VSS Labs solution.

Wednesday, September 04, 2013

OpenManage Integration for VMware vCenter 2.0

OpenManage Integration for VMware vCenter 2.0 is new generation of DELL vCenter Management Plugin targeted as plugin for vSphere 5.5 Web Client.

Looking forward to test it with vSphere 5.5 in my lab.

Monday, September 02, 2013

Configure Force10 S4810 for SNMP

Enable SNMP in Force10 S4810 switches is straight forward. Bellow is configuration sample.

conf

! Enable SNMP for read only access
snmp-server community public ro

! Enable SNMP traps and send it to SNMP receiver 192.168.12.70
snmp-server host 192.168.12.70 version 1
snmp-server enable traps

Configuring Dell EqualLogic management interface

All credits go to Mike Poulson because he published this procedure back in 2011.
[Source: http://www.mikepoulson.com/2011/06/configuring-dell-equallogic-management.html]

I have just rewrote, formated, and slightly changed the most important steps for EqualLogic out-of-band interface IP configuration.

The Dell EqualLogic iSCSI SAN supports an out-of-band management network interface. This is for managing the device from a separate network than the iSCSI traffic is on. So this is a quick set of commands that are used to configure the management (in this case eth2) interface on the device.

The web interface is nice and all but you have to have your 10Gig network setup before you can access it. Also the "setup" does not really give you an easy option to configure the management interface.

Steps:
Login to Console Port with grpadmin username and grpadmin password.

After you run setup you will need to know the "member name". You can get your member name by running the command

member show

This will list the name, Status, Version, Size information for each member configured on the array. Here is example

grpname> member show
Name Status Version Disks Capacity FreeSpace Connections
---------- ------- ---------- ----- ---------- ---------- -----------
member01 online V4.3.6 (R1 16 621.53GB 0MB 0
grpname>

The member name for my device is member01.

Once you know the member name you will need to set the IP address for your management interface. This IP address will need to be one that you can access from your management network. The port is an untagged port similar to other out-of-band management ports on devices (network switches).

To configure the IP use steps described below.

First set the interface to be management ONLY. Use the member command again.

member select member01 eth select 2 mgmt-only enable

Set the IP address and Network Mask

member select member01 eth select 2 ipaddress xxx.xxx.xxx.xxx netmask 255.255.255.0

Enable the interface (by default the MGMT (eth2) interface is disabled and will not provide a LINK).

member select member01 eth select 2 up

Then you will be asked to confirm that you wish to enable the Management port

This port is for group management only. If enabling, make sure it is connected to a dedicated management network. If disabling, make sure you can access the group through another Ethernet port.

Do you really want to enable the management interface? (y/n) [n] y

To view current IP and state of an Eth interface use

member select member01 show eths

Once that is complete you can use the management IP address to establish an http or https connection to the Array.

Veeam Backup Components Requirements

Veeam is excellent backup software for virtualized environments. Veeam is relatively easy to install and use. However when you have bigger environment and looking for better backup performance is really important to know infrastructure requirements and size appropriately your backup infrastructure.

Here are hardware requirements for particular Veeam components.

Veeam Console
Windows Server 2008 R2
4GB RAM + 0.5GB per concurrent backup job

Veeam Proxy
Windows Server 2008 R2
2GB RAM + 0.2GB per concurrent task

Veeam WAN Accelerator
Windows Server 2008 R2
8GB RAM
Disk for cache

Veeam Repository
Windows Server 2008 R2, Linux or NAS (CIFS)
4GB RAM + 2GB per each concurrent ingress backup job

Saturday, August 31, 2013

DELL Force10 S6000 as a physical switch for VMware NSX

Based on this document http://www.vmware.com/files/pdf/products/nsx/vmw-nsx-dell-systems.pdf
DELL Force10 S6000 is going to be fully integrated with VMware NSX (NSX is software defined networking platform).

Dell Networking provides:

Data center switches for robust underlays for L2 overlays
CLI for virtual and physical networks
Network management and automation with Active Fabric Manager
S6000 Data Center Switch Gateway for physical workloads to connect to virtual networks
Complete end-to-end solutions that include server, storage, network, security, management and services with world wide support

Dell S6000 use cases:

Extend virtual networks to physical servers - S6000 works as VXLAN gateway to VLANs on physical network (VXLAN VTEP).
Connect physical workloads reachable on a specific VLAN to logical networks via an L2 service
Connect physical workloads reachable on a specific port to logical networks via an L2 service
Connect to physical workloads in a Physical to virtual migration
Migration from existing virtualized environments to public clouds, creating hybrid clouds
Access physical router, firewall, load balancer, WAN optimization and other network resources

I cannot wait to test it in my lab or on customer PoC engagement. After hands-on experience I'll share it on this blog.

Tuesday, August 27, 2013

What’s New in vSphere 5.5

On this article I'll try to collect all important (at least for me) vSphere 5.5 news and improvements announced at VMworld 2013. I wasn't there so I rely on other blog posts and VMware materials.

Julian Wood reported about vCloud Suite 5.5 news announced at VMworld 2013 at
http://www.wooditwork.com/2013/08/26/whats-new-vcloud-suite-5-5-introduction/

Chris Wahl wrote deep dive blog posts into vSphere 5.5 improvements at
http://wahlnetwork.com/category/deep-dives/5-5-vsphere-improvements/

Cormac Hogan listed storage improvements in vSphere 5.5 at
http://blogs.vmware.com/vsphere/2013/08/whats-new-in-vsphere-5-5-storage.html?utm_source=twitterfeed&utm_medium=linkedin&goback=%2Egde_3217230_member_269944857#%21

Thanks Julian, Chris, and Cormac for excellent blog posts and keep informed as who was not able to attend VMworld 2013.

BTW: Official VMware What's New paper is at http://www.vmware.com/files/pdf/vsphere/VMware-vSphere-Platform-Whats-New.pdf

Here are few citations with my comments from above blog posts. I'll mention just improvements which are important and/or interesting for me. I will concentrate on these topics and in near future I have to find and test more hidden details.

Management: VMware is strongly recommending using a single VM for all vCenter Server core components (SSO, Web Client, Inventory Service and vCenter Server) or to use the appliance rather than splitting things out which just add complexity and makes it harder to upgrade in the future. << "This is excellent approach and I really like it."
Management: The vCenter Appliance has also been beefed up and with its embedded database supports 300 hosts and 3000 VMs or if you use an external Oracle DB the supported hosts and VMs are the same as for Windows. << "Finally"
Storage: vSphere 5.5 now supports VMDK disks larger than 2TB. Disks can be created up to 63.36TB in size on both VMFS and NFS.The max disk size needs to be about 1% less than the datastore file size limit. << "The last vSphere storage limit disappeared however how big datastores we will create?"
Storage: vSphere Flash Read Cache leveraging local SSDs to eliminate read IO operations from datastores and save storage performance (IOPSes) for other purposes (writes, other workloads, etc.) For more info look at http://wahlnetwork.com/2013/08/26/vsphere-5-5-improvements-part-5-vsphere-flash-read-cache-vflash/ or http://www.yellow-bricks.com/2013/08/26/introduction-to-vsphere-flash-read-cache-aka-vflash/ << "Sounds good but pernixdata.com looks better."
Storage: vSphere vSAN leveraging SSD and SATA server internal disks and form it into shared storage pool. VMware promised it is match better than VSA (VMware Storage Appliance). For more info look at http://wahlnetwork.com/2013/08/26/vsphere-5-5-improvements-part-4-virtual-san-vsan/ << "We will see. have you tested VSA? I still believe real storage is real storage. At least now. However if someone considers vSAN I would recommend to invest into really good server disks and SSDs."
Storage: PDL AutoRemove in vSphere 5.5 automatically removes a device with PDL from the host. PDL stands for Permanent Device Lost and receive it from storage array as a SCSI Sense Code. << "It would be beneficial when some storage admin removes empty LUN. Then nothing should be done on vSphere in case storage send appropriate SCSI Sense Code. MUST BE CAREFULLY TEST IT!!!"
Networking: LACP in 5.5 gives you over 22 load balancing algorithms and you are now able to create 32 LAGs per host so you can bond together all those physical Nics. << "Finally, Nexus 1000v had it already from the beginning."
Networking: Flow based marking and filtering provides granular traffic marking and filtering capabilities from a simple UI integrated with VDS UI. You can provide stateless filtering to secure or control VM or Hypervisor traffic. Any traffic that requires specific QoS treatment on physical networks can now be granularly marked with COS and DSCP marking at the vNIC or Port group level. << "Nice improvement, but I have never had such requirement so far."
High Availability: Someone mentioned to me that VMware announced vSphere 5.5 Multi-processor Fault Tolerance (FT) in VMworld 2013. << "This would be interesting but must be validated as I cannot find any official statement or some blog post about it. It seems to me it was Fault Tolerance tech preview like in VMworld 2012 session I attended last year."
Authentication: SSO 2.0 is now a multi-master model. Replication between SSO servers is automatic and built-in. SSO is now site aware. The SSO database is completely removed. For more info look at http://wahlnetwork.com/2013/08/26/vsphere-5-5-improvements-part-7-single-sign-on-completely-redesigned/ << "Finally, previous SSO 1.0 was a nightmare!!!"
Disaster Recovery: VMware Replication (VR) now supports more VR Server Appliances responsible for replication, more point in time instances (aka snapshots), the ability to use Storage vMotion on protected VMs, and vSphere Web Client will show you details on your vSphere Replication status when you click on the vCenter object. For more info look at http://wahlnetwork.com/2013/08/26/vsphere-5-5-improvements-part-6-site-recovery-manager-srm-and-vsphere-replication/ << "Cool. Good evolution."

Sunday, August 25, 2013

DELL OpenManage Essentials (OME)

OpenManage Essentials (OME) is a systems management console that provides simple, basic Dell hardware management and is available as a free download.

DELL OME can be downloaded at https://marketing.dell.com/dtc/ome-software?dgc=SM&cid=259733&lid=4682968

Patch 1.2.1 downloadable at
http://www.dell.com/support/drivers/us/en/555/DriverDetails?driverId=P1D4C

For more information look at DELL Tech Center.

Data Center Bridging

DCB 4 key protocols:

Priority-based Flow Control (PFC): IEEE 802.1Qbb
Enhanced Transmission Selection (ETS): IEEE 802.1Qaz
Congestion Notification (CN or QCN): IEEE 802.1Qau
Data Center Bridging Capabilities Exchange Protocol (DCBx)

PFC - provides a link level flow control mechanism that can be controlled independently for each frame priority. The goal of this mechanism is to ensure zero loss under congestion in DCB networks.PFC is independent traffic priority pausing and enablement of lossless packet buffers/queuing for particular 802.1p CoS.

ETS - provides a common management framework for assignment of bandwidth to frame priorities. Bandwidth can be dynamic based on congestion and relative ratios between defined flows. ETS provides minimum, guaranteed bandwidth allocation per traffic class/priority group during congestion and permits additional bandwidth allocation during non-congestion.

CN - provides end to end congestion management for protocols that are capable of transmission rate limiting to avoid frame loss. It is expected to benefit protocols such as TCP that do have native congestion management as it reacts to congestion in a more timely manner. Excellent blog post about CN is here.

DCBX - a discovery and capability exchange protocol that is used for conveying capabilities and configuration of the above features between neighbors to ensure consistent configuration across the network. Performs discovery, configuration, and mismatch resolution using Link Layer Discovery Protocol (IEEE 802.1AB - LLDP).

DCBX can be leveraged for many applications.
One DCBX application example is iSCSI application priority - Support for the iSCSI protocol in the application priority DCBX Type Length Value (TLV). Advertises the priority value (IEEE 802.1p CoS, PCP field in VLAN tag) for iSCSI protocol. End devices identify and tag Ethernet frames containing iSCSI data with this priority value.

Friday, August 23, 2013

DELL Force10 I/O Aggregator 40Gb Port Question

Today I have received question how to inter connect DELL Force10 IOA 40Gb uplink with DELL Force10 S4810 top of rack switches.

I assume the reader is familiar with DELL Force10 datacenter networking portfolio.

Even if you have 40Gb<->40Gb twinax cable with QSFPs between IOA and Force10 S4810 switch it is in IOA side configured by default as 4x10Gb links grouped in Port-Channel 128.

If you connect it directly into 40Gb port in Force10 S4810 switch the 40Gb port is by default configured as 1x40Gb interface.

That’s the reason why it doesn’t work out-of-the-box. Port speeds are simply mismatched.

To make it correct you have to change 40Gb switch port to 4x10Gb port. Here is the S4810 command to change switch port from 1x40Gb to 4x10Gb:

stack-unit 0 port 48 portmode quad

Here is the snip from S4810 configuration where 40Gb port 0/48 is configure as 4x10Gb port in port-channel 128

interface TenGigabitEthernet 0/48
no ip address
!
port-channel-protocol LACP
port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/49
no ip address
!
port-channel-protocol LACP
port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/50
no ip address
!
port-channel-protocol LACP
port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/51
no ip address
!
port-channel-protocol LACP
port-channel 128 mode active
no shutdown

interface Port-channel 128
no ip address
portmode hybrid
switchport
no shutdown

Tuesday, August 20, 2013

Best Practices for Faster vSphere SDK Scripts

Reuben Stump published excellent blog post at http://www.virtuin.com/2012/11/best-practices-for-faster-vsphere-sdk.html about performance optimization of PERL SDK Scripts.

The main takeaway is to minimize the ManagedEntity's Property Set.

So instead of

my $vm_views = Vim::find_entity_views(view_type => "VirtualMachine") ||

die "Failed to get VirtualMachines: $!";

you have to use

# Fetch all VirtualMachines from SDK, limiting the property set

my $vm_views = Vim::find_entity_views(view_type => "VirtualMachine",

properties => ['name', 'runtime.host', 'datastore']) ||

die "Failed to get VirtualMachines: $!";

This small improvement have significant impact on performance because it eliminates big data (SOAP/XML) generation and transfer between vCenter service and the SDK script.

It helped me improve performance of my script from 25 seconds to just 1 second. And the impact is even better for bigger vSphere environment. So my old version of script was almost useless and this simple improvement help me so much.

Thanks Reuben for sharing this information.

Monday, August 19, 2013

DELL Blade Chassis power consumption analytics in vCenter Log Insight

DELL Blade Chassis has a capability to send power consumption information via syslog messages. I have never understood how to practically leverage this capability. When VMware released vCenter Log Insight I have immediately realized how to leverage this tool to visualize blade chassis power consumption.

I prepared short video how to create blade chassis power consumption graph in vCenter Log Insight. The video is located at http://youtu.be/fda1cLW8enA

Wednesday, August 14, 2013

ESXi Advanced Settings for NetApp NFS

Here are NetApp
Net.TcpipHeapSize=30
Net.TcpipHeapMax=120
NFS.MaxVolumes=64
NFS.HeartbeatMaxFailures=10
NFS.HeartbeatFrequency=12
NFS.HeartbeatTimeout=5

Enabled SIOC or if you don't have Entrprise+ license set NFS.MaxQueueDepth=64, 32 or 16 based on storage workload and utilization

Sunday, August 11, 2013

Unified Network, DCB and iSCSI challenges

iSCSI SAN is Storage Area Network. Storage need lost less fabric. If, for any reason, unified fabric need to be used then quality of ethernet/IP network is crucial for problem less storage operation.

For example DELL EqualLogic supports and leverage DCB (PFC, ETS and DCBX).
iSCSI-TLV is a part of DCBX. However the DCB protocol primitives must be supported end to end so if one member of the chain doesn't support it than it is useless.

How DCB makes iSCSI better is deeply explained here.

So think twice if you really want converged network (aka unified fabric) or dedicated iSCSI network is better option for you.

Tuesday, August 06, 2013

DELL EqualLogic general recommendations for VMware vSphere ESXi

Bellow are eight major recommendations for DELL EqualLogic implementation with vSphere ESXi:

Delayed ACK disabled
LRO disabled
If using Round Robin, set IOs to 3
If ENTERPRISE or ENTERPRISE+ license and ONLY with Enterprise/Enterprise+ install MEM 1.1.2
Extend login_timeout to 60 seconds.
Don’t have multiple VMDKs on a single Virtual SCSI controller. (major cause of latency alerts)
Align partitions on 64K boundary
Format with 64K cluster (allocation unit) size with Windows

Monday, August 05, 2013

CISCO Nexus 1000v - Quality Of Service configuration

class-map type queuing match-any n1kv_control_packet_mgmt_class
match protocol n1k_control
match protocol n1k_packet
match protocol n1k_mgmt

class-map type queuing match-all vmotion_class
match protocol vmw_vmotion

class-map type queuing match-all vmw_mgmt_class
match protocol vmw_mgmt

class-map type queuing match-any vm_production
match cos 0

policy-map type queuing uplink_queue_policy
class type queuing n1kv_control_packet_mgmt_class
   bandwidth percent 10
class type queuing vmotion_class
   bandwidth percent 30
class type queuing vmw_mgmt_class
   bandwidth percent 10
class type queuing vm_production
   bandwidth percent 40

port-profile type ethernet uplink
service-policy type queuing output uplink_queue_policy

Wednesday, July 24, 2013

How to downgrade IBM V7000 (Storwize) firmware

Sometimes, especially when you do a problem management, you have a need to downgrade firmwares on some system components. I have such need for IBM V7000 storage array. Downgrade process is not documented in IBM official documentation so here is the downgrade process step by step:

Double check you have IP addresses on management interfaces of both canisters (controllers)
Login to management interface of one particular canister over https. https://[ip_mgmt_canister]. You have to use superuser credentials. Default IBM Storwize superuser password is passw0rd
Switch node to serrvice state. You should wait 15-20 minutes
Login to second node management interface.
Switch second node to service state. You should wait another 15-20 minutes
Double check both nodes are in service state
Login to one node and choose action "Reinstall software". Browse and upload firmware image via web browser. Software reinstallation takes a while. You have to wait approximattely one or two hours. In the mean time you can ping canyster management IP addresses to check when nodes comming back.
Repeat software reinstallation for second node.
Please be aware that storage configuration is lost after software reinstallation. Therefore you have to use default password for superuser. Recall it is passw0rd
When both nodes are up and running login to one canister node management interface and exit both nodes from service state. It can takes another 15-20 minutes.
When nodes are active you have to regenerate Cluster ID. You have to go to "Configure Enclosure" and enable checkbox "Reset System ID".
After all these actions you have Storwize ready to form a new Cluster. So create cluster and assign cluster virtual IP address you will use for standard storage management.

Sunday, July 14, 2013

ESX host remote syslog configuration

For remote CLI you can use vMA or vCLI. Here is the example how to configure ESX host (10.10.1.71) to send logs remotely to syslog server listening on IP address 10.10.4.72 on tcp port 514.

First of all we have to instruct ESX where is the syslog server.

esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog config set --loghost='tcp://10.10.4.72:514'

Then syslog service on ESX host must be restarted to accept configuration change.

esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog reload

ESX firewall must be reconfigured to allow syslog traffic

esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall refresh

If you want to test or troubleshoot syslog logging you can login to ESX host and use logger command to send test message to syslog.

logger "Test syslog over network"

Tuesday, July 09, 2013

Excellent article: "Anatomy of an Ethernet Frame"

Trey Layton (aka EthernetStorageGuy) wrote excellent article about MTU sizes and Jumbo Frame settings. The article is here. In the article you will learn what MTU size parameters you have to configure in the path among server, network gear and storage. It is crucial to understand difference between payload (usually 1500 or 9000) and different frame sizes (usually 1522 or 9018 or 9022 or 9216) on networking equipment.

Here is summation of Trey's deep Ethernet frame anatomy in to the simple best practice. "If you want to implement Jumbo Frames use pure datacenter networking equipment and setup MTU size to the device maximum which is usually 9216."

DELL Open Manage Essentials 1.2 has been released

Dell OpenManage Essentials is a 'one to many' console used to monitor Dell Enterprise hardware. It can discover, inventory, and monitor the health of Dell Servers, Storage, and network devices. Essentials can also update the drivers and BIOS of your Dell PowerEdge Servers and allow you to run remote tasks. OME can increase system uptime, automate repetitive tasks, and prevent interruption in critical business operations.

It can be downloaded here.

Fixes & Enhancements
Fixes:

Multiple defect fixes and performance improvements

Enhancements:

Support for Discovery, Inventory and Map View for Dell PowerEdge VRTX devices.
Addition of Microsoft Windows Server 2012 as a supported operating system for the management station.
Context sensitive Search functionality.
Ability to configure OpenManage Essentials to send the warranty status of your devices through email at periodic intervals.
Ability to configure OpenManage Essentials to generate a warranty scoreboard based on your preference and display a notification icon in the heading banner when the warranty scoreboard is available.
Enhanced support for Dell Compellent, Dell Force10 E-Series and C-Series, Dell PowerConnect 8100 series, Dell PowerVault FS7500, and PowerVault NX3500 devices.
Support for installing OpenManage Essentials on the domain controller.
Device Group Permissions portal.
Additional reports: Asset Acquisition Information, Asset Maintenance Information, Asset Support Information, and Licensing Information.
Addition of a device group for Citrix XenServers and Dell PowerEdge C servers in the device tree.
Availability of storage and controller information in the device inventory for the following client systems: Dell OptiPlex, Dell Latitude, and Dell Precision.
CLI support for discovery, inventory, status polling, and removal of devices from the device tree.
Availability of sample command line remote tasks for uninstalling OpenManage Server Administrator and applying a server configuration on multiple managed nodes.
Support for SUDO users in Linux for system updates and OMSA deploy tasks.
Display of a notification icon in the heading banner to indicate the availability of a newer version of OpenManage Essentials.
Support for enabling or disabling rebooting after system update for out-of band (iDRAC) system updates.
Support for re-running system update and OpenManage Server Administrator (OMSA) deployment tasks.
Support for Single Sign-On (SSO) for iDRAC and CMC devices.
Ability to log on as a different user.

Tuesday, July 02, 2013

How to change default path selection policy for particular storage array?

Sometimes the firmware in storage array has some problems and you have to "downgrade" functionality to achieve operable system. That's sometimes happen for some ALUA storage systems where Round Robin path policy or Fixed path policy (aka FIXED) should work but doesn't because of firmware issue.

So relatively simple solution is to switch back from more advanced round robin policy to legacy - but properly functioning - Most Recently Used path policy (aka MRU) normally used for active/passive storage systems.

Note: Please be aware that some storage vendors saying they have active/active storage even they have not. Usually and probably more precisely they call it "dual-active storage" which is not same as active/active. Maybe I should write another post about this topic.

You can change Path Selection Policy by several ways and as always the best option depends on your specific requirements and constrains.

However, if you have only one instance of some storage type connected to your ESX hosts you can simply change default path selection policy for this particular SATP type. Let's assume you have some LSI storage.

Below is simple esxcli command how to do it ...

esxcli storage nmp satp set --default-psp=VMW_PSP_MRU --satp=VMW_SATP_LSI

... and then your default PSP for VMW_SATP_LSI is now VMW_PSP_MRU

One thing you must be aware … if you in the past explicitly changed any devices (disks) to another path selection policy then default PSP will not change even you have another default path policy. There is not esxcli mechanism how to change devices back to accept default PSP for particular SATP type. Only solution is to edit /etc/vmware/esx.conf

All previous explicit changes are written in /etc/vmware/esx.conf so it is pretty simple to find it and remove these lines form config file. I silently assume you do such operations in maintenance mode so after ESX reboot all your paths for your devices will follow default SATP path selection policy.

BTW: That’s why I generally don’t recommend to change PSP for particular device when not necessary. Sometimes it is necessary for example for RDM’s participating in MSCS cluster. But usually it is abused by admins and implementation engineers. I strongly believe it is always better to set default PSP to behave as required.

Do you want to test Heavy Load? Try Heavy Load tool.

Bring your PC to its limits with the freeware stress test tool HeavyLoad. HeavyLoad puts your workstation or server PC under a heavy load and lets you test whether they will still run reliably.

Look at http://www.jam-software.com/heavyload/

Sunday, June 30, 2013

Simple UNIX Shell Script for generating disk IO trafic

Here is pretty easy unix shell script for disk I/O generation.

#!/bin/sh
dd_threads="0 1 2 3 4 5 6 7 8 9"
finish () {
killall dd
for i in $dd_threads
do
rm /var/tmp/dd.$i.test
done
exit 0;
}
trap 'finish' INT
while true
do
for i in $dd_threads
do
dd if=/dev/random of=/var/tmp/dd.$i.test bs=512 count=100000 &
done
done

Generated IOs (aka TPS - transaction per second) can be watched by following command

iostat -d -c 100000

Script can be terminated by pressing CTRL-C.

Thursday, June 27, 2013

Calculating optimal segment size and stripe size for storage LUN backing vSphere VMFS Datastore

Colleague of mine (BTW very good Storage Expert) asked me what is the best segment size for storage LUN used for VMware vSphere Datastore (VMFS). Recommendations can vary among storage vendors and models but I think the basic principles are same for any storage.

I found IBM RedBook [SOURCE: IBM RedBook redp-4609-01] explanation the most descriptive, so here it is.

The term segment size refers to the amount of data that is written to one disk drive in anarray before writing to the next disk drive in the array, for example, in a RAID5, 4+1 array with a segment size of 128 KB, the first 128 KB of the LUN storage capacity is written to the first disk drive and the next 128 KB to the second disk drive. For a RAID1, 2+2 array, 128 KB of an I/O is written to each of the two data disk drives and to the mirrors. If the I/O size is larger than the number of disk drives times 128 KB, this pattern repeats until the entire I/O is completed. For very large I/O requests, the optimal segment size for a RAID array is one that distributes a single host I/O across all data disk drives.

The formula for optimal segment size is:
LUN segment size = LUN stripe width ÷ number of data disk drives

For RAID 5, the number of data disk drives is equal to the number of disk drives in the array minus 1, for example:
RAID5, 4+1 with a 64 KB segment size = (5-1) * 64KB = 256 KB stripe width

For RAID 1, the number of data disk drives is equal to the number of disk drives divided by 2, for example:
RAID 10, 2+2 with a 64 KB segment size = (2) * 64 KB = 128 KB stripe width

For small I/O requests, the segment size must be large enough to minimize the number ofsegments (disk drives in the LUN) that must be accessed to satisfy the I/O request, that is, to minimize segment boundary crossings.

For IOPS environments, set the segment size to 256KB or larger, so that the stripe width is at least as large as the median I/O size.

IBM Best practice: For most implementations set the segment size of VMware data partitions to 256KB.

Note: If I decrypting IBM terminology correctly then IBM mentioned term "stripe width" is actually "data stripe size". We need to clear terminology because normally is the term "stripe width" used as number of disks in RAID group. "Data stripe size" is payload without the parity. The parity is stored on another segment(s) dependent on selected RAID level.

For clear understanding terminology I've created RAID 5 (4+1) segment/stripe visualization depicted bellow.

RAID 5 (4+1) striping example

Even I found this IBM description very informative I'm not sure why they recommend to use segment size 256KB for VMware. It is true that the biggest IO size issued from ESX can be by default 32MB because bigger IOs issued from guest OS ESX splits into more IOs (for more information about big IO split see this blog post). However the most important is IO size issued from guest OSes. If you want to monitor max/average/median IO size from ESX you can use tool vscsiStats already included in ESXi for such purpose. It allows you to show histogram which is really cool (for more information about vscsiStats read this excellent blog post). So based on all these assumptions and also my own IO size monitoring in the field it seems to me that average IO size issued from ESX is usually somewhere between 32 and 64KB. So let's use 64KB as average data stripe (IO size issued from OS). Then for RAID 5 (4+1) data stripe will be composed from 4 segments and optimal segment size in this particular case should be 16KB (64/4).

Am I right or I missed something? Any comments are welcome and highly appreciated.

Update 2014/01/31:
We are discussing this topic very frequently with my colleague who work as DELL storage specialist. The theory is nice but only the real test can prove any theory. Recently he performed set of IOmeter tests against DELL PV MD3600f which is actually the same array as IBM DS3500. He found that optimal performance (# of IOPS versus response times) is when segment size is as close as possible to IO size issued from operating system. So key takeaway from this exercise is that optimal segment size for example above is not 16KB but 64KB. Now I understand IBM general recommendation (best practice) to use 256KB segment size for VMware workloads as this is the biggest segment size which can be chosen.

Update 2014/07/23:
After more thinking about this topic I've realized that idea to use the segment size bigger than your biggest IO size can make sense from several reasons

each IO will get single spindle (disk) to handle this IO which will use queues down the route and will be served in spindle latency time which is the minimal one for this single IO, right?
typical virtual infrastructure environment is running several VMs generating several IOs based on queues available in the guest OS, ESX layer disk scheduler settings (see more here on Duncan Epping blog) so at the end of the day you are able to generate lot of IOPSes by different threads and load is evenly distributed across RAID group

However, please note, that all this discussion was related to legacy (traditional) storage architectures. Some modern (virtualized) storages are doing some magic on their controllers like I/O Coalescing. I/O Coalescing is IO optimization leveraging reordering smaller IO writes to another bigger IO in controller cache and sending this bigger IO down to the disks. This can significantly change segment size recommendations so please try to understand particular storage architecture or follow storage vendor best practices and try to understand the reason of these recommendations in your particular use case. I remember EMC Clariions used IO coalescing into 64KB IO blocks.

Related resources:

Aurimas Mikalauskas : Aligning IO on a hard disk RAID – the Theory
David Pasek : Disk queue depth in an ESXi environment

Wednesday, June 26, 2013

IOBlazer

IOBlazer is a multi-platform storage stack micro-benchmark. IOBlazer runs on Linux, Windows and OSX and it is capable of generating a highly customizable workload. Parameters like IO size and pattern, burstiness (number of outstanding IOs), burst interarrival time, read vs. write mix, buffered vs. direct IO, etc., can be configured independently. IOBlazer is also capable of playing back VSCSI traces captured using vscsiStats. The performance metrics reported are throughput (in terms of both IOPS and bytes/s) and IO latency.

IOBlazer evolved from a minimalist MS SQL Server emulator which focused solely on the IO component of said workload. The original tool had limited capabilities as it was able to generate a very specific workload based on the MS SQL Server IO model (Asynchronous, Un-buffered, Gather/Scatter). IOBlazer has now a far more generic IO model, but two limitations still remain:

The alignment of memory accesses on 4 KB boundaries (i.e., a memory page)
The alignment of disk accesses on 512 B boundaries (i.e., a disk sector).

Both limitations are required by the gather/scatter and un-buffered IO models.

A very useful new feature is the capability to playback VSCSI traces captured on VMware ESX through the vscsiStats utility. This allows IOBlazer to generate a synthetic workload absolutely identical to the disk activity of a Virtual Machine, ensuring 100% experiment repeatability.

TBD - TEST & WRITE REVIEW

PXE Manager for vCenter

PXE Manager for vCenter enables ESXi host state (firmware) management and provisioning. Specifically, it allows:

Automated provisioning of new ESXi hosts stateless and stateful (no ESX)
ESXi host state (firmware) backup, restore, and archiving with retention
ESXi builds repository management (stateless and statefull)
ESXi Patch management
Multi vCenter support
Multi network support with agents (Linux CentOS virtual appliance will be available later)
Wake on Lan
Hosts memtest
vCenter plugin
Deploy directly to VMware Cloud Director
Deploy to Cisco UCS blades

TBD - TEST & WRITE REVIEW

vBenchmark

vBenchmark provides a succinct set of metrics in these categories for your VMware virtualized private cloud. Additionally, if you choose to contribute your metrics to the community repository, vBenchmark also allows you to compare your metrics against those of comparable companies in your peer group. The data you submit is anonymized and encrypted for secure transmission.

Key Features:

Retrieves metrics across one or multiple vCenter servers
Allows inclusion or exclusion of hosts at the cluster level
Allows you to save queries and compare over time to measure changes as your environment evolves
Allows you to define your peer group by geographic region, industry and company size, to see how you stack up

TBD - TEST & WRITE REVIEW

Tuesday, June 25, 2013

How to create your own vSphere Performance Statistics Collector

Statsfeeder is a tool that enables performance metrics to be retrieved from vCenter and sent to multiple destinations, including 3rd party systems. The goal of StatsFeeder is to make it easier to collect statistics in a scalable manner. The user specifies the statistics to be collected in an XML file, and StatsFeeder will collect and persist these stats. The default persistence mechanism is comma-separated values, but the user can extend it to persist the data in a variety of formats, including a standard relational database or Key-value store. StatsFeeder is written leveraging significant experience with the performance APIs, allow the metrics to be retrieved in the most efficient manner possible.

White paper located at StatsFeeder: An Extensible Statistics Collection Framework for Virtualized Environments can give you better understanding how it work and how to leverage it.

Monday, June 24, 2013

vCenter Single Sign-On Design Decision Point

When you designing vSphere 5.1 you have to implement vCenter SSO. Therefore you have to make design decision what SSO mode to choose.

There are actually three available options

Basic
HA (don't mix with vSphere HA)
Multisite

Justin King wrote excellent blog post about SSO here and it is worth source of information to make right design decision. I fully agree with Justin and recommending Basic SSO to my customers if possible. SSO Server protection can be achieved by standard backup/restore methods and SSO High Availability can be increased by vSphere HA. All these methods are well known and long time used.

You have to use Multisite SSO when vCenter linked-mode is required but think twice if you really need it and benefits overweight drawbacks.

Thursday, June 20, 2013

Force10 Open Automation Guide - Configuration and Command Line Reference

This document describes the components and uses of the Open Automation Framework designed to run on the Force10 Operating System (FTOS), including:
• Smart Scripting
• Virtual Server Networking (VSN)
• Programmatic Management
• Web graphic user interface (GUI) and HTTP Server

http://www.force10networks.com/CSPortal20/KnowledgeBase/DOCUMENTATION/CLIConfig/FTOS/Automation_2.2.0_4-Mar-2013.pdf

Wednesday, June 19, 2013

Operational Limits for SRM 5.1 and vSphere Replication 5.1

Link to VMware KB article 2034768

Tuesday, June 18, 2013

How to – use vmkping to verify Jumbo Frames

Here is nice blog post about Jumbo Frame configuration on vSphere and how to test it works as expected. This is BTW excellent test for Operational Verification (aka Test Plan).

Architectural Decisions

Josh Odgers – VMware Certified Design Expert (VCDX) #90 is continuously building database of architectural decisions available at http://www.joshodgers.com/architectural-decisions/

It is very nice example of one architecture approach.

Monday, June 17, 2013

PowerCLI One-Liners to make your VMware environment rock out!

Christopher Kusek wrote excellent blog post about PowerCLI useful scripts fit single line. He call it one-liners. These one-liners can significantly help you on daily vSphere administration. On top of that you can very easily learn PowerCLI constructs just from reading these one-liners.

http://www.pkguild.com/2013/06/powercli-one-liners-to-make-your-vmware-environment-rock-out/

Tuesday, June 04, 2013

Software Defined Networking - SDN

SDN is another big topic in modern virtualized datacenter so it is worth to understand what it is and how it can help us to solve real datacenter challenges.

Brad Hedlund's explanation "What is Network Virtualization"
http://bradhedlund.com/2013/05/28/what-is-network-virtualization/
Bred Hedlund is very well known netwoking expert. Now he works for VMware | Nicira participating on VMware NSX product which should be next network virtualisation platform (aka network hypervisor). He is ex-CISCO and ex-DELL | Force10 so there is big probability he fully understand what is going on.

It is obvious that "dynamic service insertion" is the most important thing in SDN. OpenFlow and CISCO vPath is trying to do it but each in different way. Same goal but with different approach. What is better? Who knows? The future and real experience will show us what is better. Jason Edelman's blog post very nicely and clearly compares both approaches.
http://www.jedelman.com/1/post/2013/04/openflow-vpath-and-sdn.html

CISCO as long term networking leader and pioneer has of course its own vision of SDN. Nexus 1000V and Virtual Network Overlays Play for CISCO Pivotal Role in Software Defined Networks. Very nice explanation of CISCO approach is available at
http://blogs.cisco.com/datacenter/nexus-1000v-and-virtual-network-overlays-play-pivotal-role-in-software-defined-networks/

Saturday, May 25, 2013

PernixData: New storage statusquo is comming

Storage SME's knows for ages that storage design begins with performance. The storage performance is usually much more important then capacity. One IOPS cost more money then one GB of storage. Flash disks, EFD's and SSD's changed storage industry already. But the magic and the future is in software. PernixData FVP (Flash Virtualization Platform) looks like very intelligent, fully redundant and reliable software cluster aware storage acceleration platform. It leverages any local flash devices to accelerate any back-end storage used for server virtualization. Right now only VMware vSphere is supported but solution is hypervisor agnostic and it is just a matter of time when it will be ported to another server virtualization platform like Hyper-V, Xen, or KVM.

PernixData setups absolutely new storage quality in virtualized datacenter. If you have issue with storage response time (latency) then look at PernixData FVP. But what impressed me is the future because I believe the platform can be improved significantly and new functionality will come soon. I can imagine data compression and deduplication, data encryption, vendor independent replication, clonning, snapshoting, etc.

So software defined storage virtualization just began.

Happy journey PernixData.

For more information look at
http://www.pernixdata.com/
http://www.pernixdata.com/SFD3/

Wednesday, May 22, 2013

Magic Quadrant for General-Purpose Disk Arrays

http://www.gartner.com/technology/reprints.do?id=1-1ENAPKJ&ct=130325&st=sg

Pretty nice overview and comparison among storage vendors. Because I have privilege to practically design, implement and work with many storage arrays I can't agree with IBM positioning and description. In the past I was also impressed about IBM storage products but reality is little bit different. I was troubleshooting several big issues with IBM mid-range storage array IBM V7000 (Storwize) and also with high-end IBM DS8700 (Shark).

Monday, May 20, 2013

Difference between SCSI-2 and SCSI-3 reservation

SCSI-3 reservations are persistent across SCSI bus resets and support multiple paths from a host to a disk. In contrast, only one host can use SCSI-2 reservations with one path. If the need arises to block access to a device because of data integrity concerns, only one host and one path remain active. The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.

Info retrieve from:
http://sfdoccentral.symantec.com/sf/5.0/hpux/html/vcs_install/ch_vcs_install_iofence4.html

Thursday, May 16, 2013

Reduced vCenter DB by deleting old events and tasks from vCenter database

In vCenter MS-SQL Database is storage procedure called cleanup_events_tasks_proc which deletes old data based on event and task retention settings. vCenter retention settings can be setup in vCenter Settings though vSphere Client or changed directly in database. Using vSphere Client is recommended.

Following example is copied from: http://communities.vmware.com/thread/191227?start=0&tstart=0

c:> "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\OSQL.EXE" -S \SQLEXP_VIM -E
1> use VIM_VCDB
2> go
1> update vpx_parameter set value='' where name='event.maxAge'
2> update vpx_parameter set value='' where name='task.maxAge'
3> update vpx_parameter set value='true' where name='event.maxAgeEnabled'
4> update vpx_parameter set value='true' where name='task.maxAgeEnabled'
5> go
(1 row affected)
(1 row affected)
(1 row affected)
(1 row affected)
1> exec cleanup_events_tasks_proc
2> go
1> dbcc shrinkdatabase ('VIM_VCDB')
2> go
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
      5           1       81080         280       78776          78776
      5           2         128         128         128            128

(2 rows affected)
DBCC execution completed. If DBCC printed error messages, contact your system
administrator.
1> quit

Pages