Sunday, August 17, 2014

Network communications between virtual machines

I was contacted by colleague of mine who pointed to very often mentioned statement about network communication between virtual machines on the same ESXi host. One of such statement is cited below.
"Network communications between virtual machines that are connected to the same virtual switch on the same ESXi host will not use the physical network. All the network traffic between the virtual machines will remain on the host."
He was discussing this topic within his team and even they are very skilled virtualization administrators they had doubts about real behavior. I generally agree with statement above but it is actually correct statement only in specific situation when virtual machines are in the same L2 segment (the same broadcast domain - usually VLAN).

Figure 1 - L3 routing on physical network
I've prepared the drawing above to explain real behavior clearly. Network communication between VM1 and VM2 will stay on the same ESXi host because they are in the same L2 segment however communication between VM1 and VM3 has to go to physical switch (pSwitch)  to be routed between VLAN 100 and VLAN 200 and return back to ESXi host and VM3.

Discussed statement above can be slightly reformulated to be always correct.
"Network communications between virtual machines that are connected to the same virtual switch portgroup on the same ESXi host will not use the physical network. All the network traffic between the virtual machines will remain on the host."
Both VMware standard and also distributed vSwitch are dump L2 switches so L3 routing must be done somewhere else, typically on physical switches. However there can be two scenarios when even L3 traffic between virtual machines on  same ESXi host can stay there and not use physical network.

First scenario is when L3 routing is done on virtual machine running on top of the same ESXi host. Examples of such virtual routers are VMware's vShield Edge, Brocade's  Vyatta, CISCO's CSR, open source router pfSense, or some other general OS with routing services. This scenario, also known as network function virtualization, is depicted on Figure 2.

Figure 2 - L3 routing on virtual machine (Network Function Virtualization)
It is worth to mention that L3 traffic between VM5 and VM6 will go through physical network because L3 router is on another ESXi host.

Second scenario is when distributed virtual router like VMware's NSX is used. This scenario is depicted on Figure 3. In this scenario, L2 and L3 traffic of all virtual machines running on same ESXi host is optimized and will remain on the host without physical network usage.

Figure 3 - Distributed Virtual Routing (VMware NSX)

So in our particular scenario L2 and L3 network communication among VM1, VM2, VM3 and VM4 will stay on the same ESXi host. The same apply to VM5 and VM6.

Hope, I've covered all possible scenarios and this blog post will be helpful to others during similar discussions in virtualization teams. And as always, comments are very welcome.

Tuesday, July 29, 2014

Force10 VLT - Design Verification Test Plan


One my philosophical rule is "Trust, but Verify". Design Verification Test Plan is good approach to be sure how the system you have designed behaves. Typical design verification test plan contains Usability, Performance and Reliability tests.

Force10 VLT domain configuration is actually two node cluster (the system) providing L2/L3 network services. What network services your VLT domain should provides depends on customer requirements. However typical VLT customer requirement is to have high availability and eliminate network down times when some system component fails or is maintained by administrator. Planing and executing reliability tests is good approach to verify that customer's high availability requirements have been achieved.

Bellow are some reliability tests I'm thinking are worth to execute and when my gear will be back in my lab I'll try to find some time and execute tests described below and publish real test results.
 
If you know about some other tests which make sense to perform, please don't be shy, leave the comment and I'll do it for you.

Test #1
Description: 
Simulate VLT Domain secondary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Power Off secondary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD

Test #2
Description: 
Simulate VLT Domain primary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Power Off primary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD

Test #3
Description: 
Simulate one link from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out one cable participating in VLTi static port-channel
Measure network disruption
Expected Results:
VLT Domain should be still working without traffic impact.
Test Result:
TBD
Test #4
Description: 
Simulate all links from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out all cables participating in VLTi static port-channel
Measure network disruption
Expected Results:
Backup link should act as arbiter. VLT Domain should be still working  but in split brain mode and only primary VLT node should handle the traffic.
Test Result:
TBD

Test #5
Description: 
Simulate VLT Domain backup link failure. Backup link configured as IP heartbeat over out-of-band management.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out cable participating in backup link
Measure network disruption
Expected Results:
All traffic should work correctly but VLT should report backup link failure.
Test Result:
TBD 
Test #6
Description: 
Simulate one link failure on some virtual link trunk (aka VLT or virtual port-channel).
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out cable participating in VLT
Measure network disruption
Expected Results:
Port channel should survive this failure.
Test Result:
TBD

These six tests should verify basic high availability and resiliency of Force10 VLT cluster. 

All problems should be notified by SNMP and/or syslog to central monitoring system in case it is configured properly. That can move us to Usability Tests .... but that's another set of tests ...

And please remember that TOO MUCH TESTING WOULD NEVER BE ENOUGH :-)

Wednesday, July 23, 2014

CISCO UDLD alternative on Force10

I've been asked by one DELL System Engineer if we support CISCO's UDLD feature because it was required in some RFI. Well, DELL Force10 Operating System have similar feature solving the same problem and it is called FEFD.

Here is the explanation from FTOS 9.4 Configuration Guide ...

FEFD (Far-end failure detection) is supported on the Force10 S4810 platform. FEFD is a protocol that senses remote data link errors in a network. FEFD responds by sending a unidirectional report that triggers an echoed response after a specified time interval. You can enable FEFD globally or locally on an interface basis. Disabling the global FEFD configuration does not disable the interface configuration.

Figure caption: Configuring Far-End Failure Detection

The report consists of several packets in SNAP format that are sent to the nearest known MAC address. In the event of a far-end failure, the device stops receiving frames and, after the specified time interval, assumes that the far-end is not available. The connecting line protocol is brought down so that upper layer protocols can detect the neighbor unavailability faster.

Update 2015-05-20:
If I understand it correctly CISCO's UDLD main purpose is to detect potential uni-directional links and mitigate the risk of loop in the network because STP cannot help in this scenario. Force10 has another feature to prevent a loop in such situation - STP loop guard.

The STP loop guard feature provides protection against Layer 2 forwarding loops (STP loops) caused by a hardware failure, such as a cable failure or an interface fault. When a cable or interface fails, a participating STP link may become unidirectional (STP requires links to be bidirectional) and an STP port does not receive BPDUs. When an STP blocking port does not receive BPDUs, it transitions to a Forwarding state. This condition can create a loop in the network.

Sunday, July 13, 2014

Heads Up! VMware virtual disk IOPS limit bad behavior in VMware ESX 5.5

I've been informed about strange behavior of  VM virtual disk IOPS limits by one my customer for whom I did vSphere design recently. If you don't know how VM vDisk IOPS limits can be useful in some scenarios read my another blog post - "Why use VMware VM virtual disk IOPS limit?". And because I designed this technology for some of my customers they are very impacted by bad vDisk IOPS limit behavior in ESX 5.5

I've tested VM IOPS limits in my lab to see it by myself. Fortunately I have two labs. Older vSphere 5.0 lab with Fibre Channel Compellent storage and newer vSphere 5.5 lab with iSCSI storage EqualLogic. First of all let's look how it works in ESX 5.0. Same behavior is in ESX 5.1 and this behavior make perfect sense.

By default VM vDisks doesn't have limits as seen on next screen shot.


When I run IOmeter with single worker (thread) on unlimited vDisk I can achieve 4,846 IOPS. That's what datastore (physical storage) is able to give to single thread.


When I run IOmeter with two workers (threads) on unlimited vDisk I can achieve 7,107 IOPS. That's ok because all shared storages have implemented algorithms to limit performance for threads. That's actually protection against single thread abuse of all storage performance.


Now let's try to setup SIOC to 200 IOPS limits on both vDisks on VM as depicted on  picture below.  


Due to settings above IOmeter single worker generated workload is limited to 400 IOPS (2 x 200) per whole VM because all limit values are consolidated per virtual machine per LUN. For more info look at http://kb.vmware.com/kb/1038241. So it behaves as expected because IOmeter IOPS was oscillating between 330 and 400 IOPSes as you can see in picture below.


We can observe similar behavior with two workloads.


So in ESX 5.0 lab everything works as expected. Now let's move to another lab where I have vSphere 5.5. There is iSCSI storage so first of all we will run IOmeter without vDisk IOPS limits to see maximal performance we can get. On picture below we can see that single thread is able to get 1741 IOPSes.


... and two workers can get 3329 IOPSes.


So let's setup vDisk IOPS limits to 200 IOPS limits on both vDisks on VM as in test on ESX 5.0. I have also 2 disks on this VM. Due to these settings IOmeter single worker generated workload should be also limited to 400 IOPS (2 x 200) per whole VM. But unfortunately it is not limited and it can get 2000 IOPSes. It is strange and in my opinion bad behavior.

 

But even worse behavior can be observe when there are more threads. In examples below you can see two and four workers (threads) behavior. VM is getting really slow performance.



ESX 5.5 VM vDisk behavior is really strange and because all typical OS storage workloads (even OS booting) are multi-threaded than VM vDisk IOPS limits technology is unusable. My customer has opened support request so I believe it is a bug and VMware Support will help to escalate it in to VMware engineering.

UPDATE 2014-07-14 (Workaround): 
I've tweet about this issue to   and Duncan moved me immediately in to the right direction.  He reveal me the secret ... ESXi has two disk schedulers old one and new one (aka mClock).  ESXi 5.5 uses new one (mClock) by default. If you switch back to the old one, disk scheduler behaves as expected.  Below is the setting how to switch to the old one.

Go to ESX Host Advanced Settings and set Disk.SchedulerWithReservation=0

This will switch back to the old disk scheduler.

Kudos to Duncan.

Switching back to old scheduler is good workaround which will probably appear in VMware KB but there is definitely some reason why VMware introduced new disk scheduler in 5.5. I hope we will get more information  from VMware engineering so stay tuned for more details ...

UPDATE 2015-12-3:
Here are some references to more information about disk schedulers in ESXi 5.5 and above ...

ESXi 5.5
http://www.yellow-bricks.com/2014/07/14/new-disk-io-scheduler-used-vsphere-5-5/
http://cormachogan.com/2014/09/16/new-mclock-io-scheduler-in-vsphere-5-5-some-details/
http://anthonyspiteri.net/esxi-5-5-iops-limit-mclock-scheduler/

ESXi 6
http://www.cloudfix.nl/2015/02/02/vsphere-6-mclock-scheduler-reservations/

Saturday, July 12, 2014

Why use VMware VM virtual disk IOPS limit?

What is VM IOPS limit? Here is explanation from VMware documentation ....
When you allocate storage I/O resources, you can limit the IOPS that are allowed for a virtual machine. By default, these are unlimited. If a virtual machine has more than one virtual disk, you must set the limit on all of its virtual disks. Otherwise, the limit will not be enforced for the virtual machine. In this case, the limit on the virtual machine is the aggregation of the limits for all virtual disks.
I really like this feature because VM vDisk IOPS limit is excellent mechanism to protect physical storage back-end against overloading by some disk intensive VMs and allows to set up some fair user policy for storage performance. Somebody can argue with usage of VM disk share mechanism. Yes, that's of course possible as well and it can be complementary. However, with shares fair user policy your users will get high performance at the beginning when back-end storage has lot of available performance but their performance will decrease later during time when more VMs will use this particular datastore. It means that performance is not predictive and users can complain.

Let's do simple IOPS limit example. You have datastores provisioned on storage pool with automated storage tiering which can serve up to 25,000 IOPS and you have there 100 virtual disks (vDisks). Setting 250 IOPS limit to each virtual disk ensures that if all VMs will use all their IOPSes back-end datastores will not be overloaded. I agree it is very strict limitation and VMs cannot use more IOPS when performance is available in physical storage. But this is business problem and best vDisk limiting policy depends on your business model and company strategy. Below are listed two business models for virtual disk performance limits I've already used on some my vSphere projects:

  • Service catalog strategy
  • Capacity/performance ratio strategy

Service catalog strategy allows customers (internal or external) increase or decrease vDisk IOPSes as needed and of course pay for it appropriately.

Capacity/performance ratio strategy approach is to calculate ratio between physical storage capacity and performance and use same ratio for vDisks. So if you have storage having 50 TB with 25,000 front-end performance you have 1 GB with 0.5 IOPS. You should define and apply some overbooking ratio because you use shared storage. Let's use ratio 3:1 and we will have 150 IOPSes for 100GB vDisk.

To be honest I prefer service catalog strategy as it is what real world need because each workload is different and service catalog gives better way how to define vDisks to match workloads in your particular environment.

Summary
VM vDisk IOPS limit approach is useful in environments where you want to have guaranteed and long term predictable storage performance (response time) for VMs vDisks. Please, be aware that even this approach is not totally fair because IOPS reality is much more complex and total number of IOPSes on back-end storage is not static number as we use in our example. In real physical storage, the number of front-end IOPSes you can get from storage is function of several parameters like IO size, read/write ratio, RAID type, workload type (sequence or random), cache hit, automated storage algorithm, etc ...

I hope VMware VVOLs will move this approach to the next level in future. However vDisk IOPS limit is technology we can use today.

Monday, July 07, 2014

How social media and community sharing help entrprise customers


I'm always happy when someone finds my blog article or shared document useful. Here is one example of recent email communication from one DELL customer who Googled my DELL OME (Open Manage Essentials is basic system management for DELL hardware inforastructure) document explaining network communication flows among DELL Open Manage components.

All personal information are anonymized so customer real name is changed to Mr.Customer. 


VCDX Defense Timer

If you are prepering for VCDX and you want to do VCDX mock defense you can use the exact timer which is used during real VCDX defense.

The timer is available online at https://vcdx.vmware.com/vcdx-timer

Good luck with your VCDX journey!!!

Wednesday, May 28, 2014

DELL Force10 : VLT - Virtual Link Trunking

Do you know CISCO's Virtual port Channel? Do you want the same with DELL datacenter switches. Here we go.

General VLT overview

Virtual Link Trunking or VLT is a proprietary aggregation protocol developed by Force10 and available in their datacenter-class or enterprise-class network switches. VLT is implemented in the latest firmware releases (FTOS from 8.3.10.2) for their high-end switches like the S4810, S6000 and Z9000 10/40 Gb datacenter switches. Although VLT is a proprietary protocol from Force10, other vendors offer similar features to allow users to set up an aggregated link towards two (logical) different switches, where a standard aggregated link can only terminate on a single logical switch (thus either a single physical switch or on different members in a stacked switch setup).  For example CISCO's similar proprietary protocol is called Virtual Port Channel (aka vPC) and Juniper has another one called Multichassis LAG (MC-LAG).

VLT is a layer-2 link aggregation protocol between end-devices (servers) connected to (different) access-switches, offering these servers a redundant, load-balancing connection to the core-network in a loop-free environment, eliminating the requirement for the use of a spanning-tree protocol.[2] Where existing link aggregation protocols like (static) LAG (IEEE 802.3ad) or LACP (IEEE 802.1ax) require the different (physical) links to be connected to the same (logical) switch (such as stacked switches), the VLT, for example, allows link connectivity between a server and the network via two different switches.

Instead of using VLT between end-devices like servers it can also be used for uplinks between (access/distribution) switches and the core switches.[3]

Above VLT general description is from Wikipedia. Fore more information about VLT see http://en.wikipedia.org/wiki/Virtual_Link_Trunking

DELL published Force10 VLT Reference Architecture (PDF - link cached by google) where VLT is explained in detail so it is highly recommended to read it together with all product documentation and release notes before any real plan, design and implementation.

VLT Basic concept and terminology

The VLT peers exchange and synchronize Layer2-related tables to achieve harmonious Layer2 forwarding among the whole VLT domain, but the mechanism involved is transparent.

VLT is a trunk (as per its name) attaching remote hosts or switches.
VLTi is the interconnect link between the VLT peers. For historical reasons that is also called ICL (InterConnect Link) in the command outputs.

All the following rules apply to the VLT topologies
 2 unit per domain (as of FTOS 8.3.10.2)
 8 links per port-channel or fewer.
 Units should run the same FTOS version
 The backup should employ a different link than the VLTi, and preferably a diverse path

Simple implementation plan

Below I'll write simplified implementation plan for VLT configuration so it should be handy for any lab or proof of concept deployments.

 Implementation plan is divided in to 6 steps.
  1. Check or configure spanning tree protocol
  2. Check or configure LLDP
  3. Check or configure out of band management leveraged for VLT backup link
  4. Configure VLTi link (VLT inter connect)
  5. Configure VLT domain
  6. Configure VLT port-channel

Step 1 - Check or configure spanning tree protocol
Rapid Spanning-Tree should be enabled to prevent configuration and patching mistakes. STP configuration depends on customer environment and spanning tree topology preferences. Below parameters are just examples.

Switch A - configured to become RSTP root
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 4096 (if you want to have this switch as STP root)

Switch B - configured as backup root.
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 8192
Step 2 - LLDP configutration
LLDP must be enabled to advertise theirs configuration and receive configuration information form the adjacent LLDP-enabled device.

Switch A
protocol lldp
  advertise management-tlv system-description system-name
  no disable

Switch B
protocol lldp
  advertise management-tlv system-description system-name
  no disable
Step 3 - VLT backup link
VLT backup link is used to exchange heartbeat messages between the two VLT peers. The Management interface at both VLT peers to activate the backup link.

Switch A
interface management 0/0
  ip address switch-A-IP/switch-A-mask
  no shutdown
Switch B
interface management 0/0
  ip address switch-B-IP/switch-B-mask
  no shutdown

Step 4 - VLTi (interconnect) link
Now we configure the VLTi, the connection between both VLT peers. It is recommended to use a Static Port channel for redundancy reasons. Two 40GbE interfaces are enough and we bound it at the Port channel 127.  No special configuration is required at the interface or Port channel configuration level. To become a VLTi (automatically managed by the system), the port-channel should be in default mode (no switchport).

Switch A
interface port-channel 127
  description "VLTi - interconnect link"
  channel-member VLTi_INTERFACE1
  channel-member VLTi_INTERFACE2
  no ip address 
  mtu 12000
  no shutdown

Switch B
interface port-channel 127
  description "VLTi - interconnect link"
  channel-member VLTi_INTERFACE1
  channel-member VLTi_INTERFACE2
  no ip address 
  mtu 12000
  no shutdown

Note 1: Don't forget to do no shutdown for physical interfaces acting as port-channel members. Your port-channel stay down unless you put them up.
Note 2: Port-channel nor physical ports must NOT be in switchmode to be used for VLTi.
Note 3: If you are planning to use jumbo frames (bigger MTU size) then you have to use it also for VLTi links (max MTU on Force10 is 12000 so it is good idea to set it to max).

Use following configuration for all VLTi interfaces
interface VLTi_INTERFACEx
  no shutdown
  no switchmode

Verify port-channel status on both switches
show int po 127 brief

Port-channel should be up and composed from 2 ports.

Step 5 - VLT domain configuration
 We have to configure the domain number and the VLT domain options described below.
  • We use the peer-link command to select which is the VLTi interface.
  • We have to select the interface for the heartbeat messages exchange we use the back-up destination command with the ip address of the other VLT peer.
  • We should set the primary-priority command to configure the VLT role (primary or secondary). Primary VLT node will be the switch with lower priority. 
  • The system-mac mac-address command must match at both peers in the VLT domain. 
  • The unit id number 0 or 1 with the unit-id command will minimize the time required for the VLT system to determine the unit ID assigned to each peer switch when one peer switch reboots.

Switch A (primary)
vlt domain 1
  peer-link port-channel 127
  back-up destination switch-B-IP
  primary-priority 1
  system-mac mac-address 02:00:00:00:00:01
  unit-id 0
Switch B (secondary)
vlt domain 1
  peer-link port-channel 127
  back-up destination switch-A-IP
  primary-priority 8192
  system-mac mac-address 02:00:00:00:00:01
  unit-id 1
For verification we can use commands below
sh vlt brief
sh vlt statistics
sh vlt backup-link

Step 5 - VLT Port Channel
It is recommended that VLTs that are facing hosts/switches should be preferably built by LACP, to benefit from the protocol negotiations. However static port-channels are also supported.

It is also recommended to configure dampening (or equivalent) on the interfaces of connected hosts/switches (access switches, not VLT peers). The reason to use dampening is that at start-up time, once the physical ports are active a newly started VLT peer takes several seconds to fully negotiate protocols and synchronize (VLT peering, RSTP, VLT backup links, LACP, VLT LAG sync, etc). The attached devices are not aware of that activity and upon activation of a physical interface, the connected device will start forwarding traffic on the restored link, despite the VLT peer unit being still unprepared. It will black-hole traffic. Dampening on connected devices (access switches) will hold an interface temporarily down after a VLT peer device reload. A reload is detected as a flap: the link goes down and then up. Dampening acts as a cold start delay, ensuring that the VLT peers are up most ready to forward before the physical interface is activated, avoiding temporary black holes. Suggested dampening time: 30 seconds to 1 minute. We use 60 seconds in our example.

So let's finally configure the port channel (dynamic LAG) that interconnect the  S4810’s (VLT Domain) to the ustream S60 what is our hypotetical L3 switch (router).

Switch A
interface port-channel 1
  description "Uplink to S60"
  no ip address
  switchport
  vlt-peer-lag port-channel 1
  no shutdown

interface tengigabit 0/PO1-INTERFACE
  port-channel-protocol lacp
    port-channel 1 mode active
  dampening 10 100 1000 60
  no shutdown
Switch B
interface port-channel 1
  description "Uplink to S60"
  no ip address
  switchport
  vlt-peer-lag port-channel 1
  no shutdown

interface tengigabit 0/PO1-INTERFACE
  port-channel-protocol lacp
    port-channel 1 mode active
  dampening 10 100 1000 60
  no shutdown
Hope it is helpful not only for me but also for someone else. Any comments are welcome.

Locally Administered Address Ranges

MAC Addresses
There are  4 sets of Locally Administered Address Ranges that can be used on your network without fear of conflict, assuming no one else has assigned these on your network:
 
x2-xx-xx-xx-xx-xx
x6-xx-xx-xx-xx-xx
xA-xx-xx-xx-xx-xx
xE-xx-xx-xx-xx-xx

Replacing x with any hex value.

See http://en.wikipedia.org/wiki/MAC_address for more information.

Update 2014-10-27: one my reader notify me that some MAC OUI comply with rules above are used by some vendors. See example below:
  02-07-01   (hex) RACAL-DATACOM
  02-1C-7C   (hex) PERQ SYSTEMS CORPORATION
  02-60-86   (hex) LOGIC REPLACEMENT TECH. LTD.
  02-60-8C   (hex) 3COM CORPORATION
  02-70-01   (hex) RACAL-DATACOM
  02-70-B0   (hex) M/A-COM INC. COMPANIES
  02-70-B3   (hex) DATA RECALL LTD
  02-9D-8E   (hex) CARDIAC RECORDERS INC.
  02-AA-3C   (hex) OLIVETTI TELECOMM SPA (OLTECO)
  02-BB-01   (hex) OCTOTHORPE CORP.
  02-C0-8C   (hex) 3COM CORPORATION
  02-CF-1C   (hex) COMMUNICATION MACHINERY CORP.
  02-E6-D3   (hex) NIXDORF COMPUTER CORPORATION

Therefore I would always recommend to validate it at
http://en.wikipedia.org/wiki/MAC_address#Bit-reversed_notation
For example 02-00-00 is not used by anybody so you can most probably use it for internal purpose.

IP Addresses
Private network (internal)
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16

Private network (service provider - subscriber)
100.64.0.0/10
See http://en.wikipedia.org/wiki/Reserved_IP_addresses for more information.
 

Sunday, May 18, 2014

How to convert thick zeroed virtual disk to thin and save storage space

Last week I've been notified by my colleague about long term VMware vSphere issue described in VMware KB 2048016. The issue is that vSphere Data Protection restores a thin-provisioned disk as a thick-provisioned disk. This sounds like relatively big operational impact. However after reading VMware KB I've explained to my colleague that this is not typical issue or bug but it is rather expected behavior of VMware's CBT (change block tracking) technology and VADP (VMware API for data protection) framework. It's important to mention that it should behave like that only when you do initial full backup of thin provisioned VM which has never been powered on. In other words, if VM was ever powered on before initial backup procedure you shouldn't experience this issue.

After above explanation another logical question appeared.  
"How you can convert thick zeroed virtual disk to thin" ... when you experience weird behavior explained above and you restore your originally thin provisioned VM as thick VM. The obvious objection is to save storage space again leveraging VM thin provisioning.
My answer was to use "storage vMotion" which allows change ot vDisk type during migration. But just after my quick answer I realized there can be another potential issue with storage vMotion. If you use VAAI capable storage then storage vMotion is offloaded to the storage and it may not reclaim even zeroed vDisk space. This behavior and resolution is describe in VMware KB 2004155 named as "Storage vMotion to thin disk does not reclaim null blocks". The workaround mentioned in KB is to use offline method leveraging vmkfstools. If you want live storage migration (conversion) without downtime you would need another datastore with different block size. You can do it with legacy VMFS3 filesystem.

I decided to do a test to prove real storage vMotion behavior and know the truth. Everything else would be just speculations. Therefore I’ve test storage vMotion behavior of thick to thin migration and space reclamation in my lab where I have vSphere/ESX 5.5 and EqualLogic storage with VAAI support. To be honest the result surprised me in positive way. It seems that svMotion can save the space even I do svMotion between datastore with the same block and there is VAAI enabled.

You can see thick eager zeroed 40GB disk in screenshot below ...
Provisioned size is 44GB because VM has 4 GB RAM and therefore 4 GB swap file on VMFS.
Used storage is 40GB.

After live storage vMotion with conversion to Thin it saved the space. 
Used storage is just 22 GB.  You can see result at screenshot below ...

So I have just verified that svMotion can do what you need without downtime. And I don’t even need to migrated between datastores with different block size.

It was tested on ESX 5.5, EqualLogic firmware 6.x., and VMFS5 datastores created on thin provisioned LUNs by EqualLogic.  Storage thin provisioning is absolutely transparent to vSphere so this should not have impact on vSphere thin provisioning.


I know that this is just a workaround to the problem of VADP restore of never powered on VMDK but it works.  It converts thick to thin and is able to reclaim unused (zeroed) space insight virtual disks.

Conclusion:
vSphere 5.5 storage vMotion can convert thick VM to thin even between datastores having same block size. At least in tested configuration. Good to know. If someone else can do the test in your environment just leave the comment. It can be beneficial for others.

A/C Controller

A/C Controller is FreeBSD based appliance which monitors environmental temperature and automatically power on/off Air Conditioning units to achieve required temperature. It's distributed as 2GB (204MB zip) pre-installed FreeBSD image.

Project page: https://sourceforge.net/projects/accontrol/
Author: David Pasek

Hardware Infrastructure Monitoring Proxy

Every enterprise infrastructure product like server, blade system, storage array, fibre-channel or ethernet switch has some kind of CLI or API management. Lot of products support SNMP but it usually doesn't return everything what CLI/API offers. This project is set of connectors to different enterprise systems like DELL iDRAC and blade Chassis Management Controller, VMware vCenter and/or ESX, DELL Compellent. The framework is universal and other connectors to other systems can be simply developed.

Available connectors:
DELL CMC (racadm)
DELL DRAC (racadm)
General IPMI (ipmitools)
VMware vCLI (vcli)
Compellent Enterprise manager (odbc to MS-SQL)
Compellent Storage Center (CompCU.jar)
Brocade FC Switch (cli over telnet)

... other connectors and sensors can be simply developed. So if you have any need don't hesitate to contact me.

See video introduction at  https://www.youtube.com/watch?v=JRomfnfymlY
Project page: https://sourceforge.net/projects/monitoringproxy/ 
Author: David Pasek

Friday, May 16, 2014

Unable unmout ESX datastore

I've just been notified about annoying problem by customer for whom I did vSphere 5.5 Design. The datastore was not  posible to unmount. In ESX logs were something similar to message below.
Cannot unmount volume 'Datastore Name: vm3:xxx VMFS uuid: 517c9950-10f30962-931f-00304830a1ea' because file system is busy. Correct the problem and retry the operation.
There is KB about this symptom. VSAN component VSANTRACE was using datastore. That was the reason of busy file system. It was pretty annoying  issue as VSAN was not used nor enabled.

The solution is to disable vsantraced service so it is necessary to issue following command on evey ESX ...
chkconfig vsantraced off 

Not so nice, right? That's the downside of fully integrated VSAN software into general ESX hypervisor. I'm not happy with this approach. In my opinion, it would be much better distribute VSAN as additional software installing as regular VIB (VMware Installable Bundle).

Friday, May 09, 2014

Understanding Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) Terminology

To understand the Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) capabilities of the, you should become familiar with some basic terminology. I have just found excellent single page explaining all important terms from FC and  FCoE worlds. It is here.

Thanks Juniper to prepare it. I'm sure I will come back later for some abbreviation explanation.

 

Tuesday, May 06, 2014

Recovering from a Forgotten Password on the Force10 S series switch

I've just spent several hours to find the recovery procedure from forgotten password. Google returned just one relevant result to the Force10 tech tip page "How Do I Reset the S-Series to Factory Defaults?". However the procedure doesn't work because there is not "Option menu" during system boot. It is most probably old and deprecated procedure.

Here is the new procedure so I hope google will return it for other people looking for correct procedure.

Procedure to recovery from forgotten password on Force10 S-series switches:
  1. Use serial console
  2. Power off and then Power on all of the power modules
  3. Wait for message similar to "Hit Esc key to interrupt autoboot: 5" and press Esc key to go to Boot Loader (uBoot) interactive shell
  4. Change environment variable – setenv stconfigignore true - (uBoot - boot loader interactive shell)
  5. Save the changes - saveenv - (uBoot - boot loader interactive shell)
  6. Continue to boot the system – boot  - (uBoot - boot loader interactive shell)
  7. Default configuration is loaded so console login authentication is disabled by default
  8. Go to EXEC mode - en - (FTOS command line)
  9. Load startup configuration -  copy startup-config running-config - (FTOS command line)
  10. Now you can reconfigure the switch to change or add user login credentials
  11. Save configuration -  copy running-config startup-config - (FTOS command line)
  12. Reload the switch just for verification -  reload - (FTOS command line)

This procedure was tested on Force10 S4810 and S60.

Simple TFTP server for windows

Anybody working with networking equipment need simple tftp server. Typical use case is to download and/or upload switch configuration and to perform firmware upgrades.

I generally like simple tools which allow me to do my work quickly and efficiently.  That's the reason I really like portable version of TFTP32.

Fore more information about TFTP32 go here.


Monday, May 05, 2014

Microsoft Cluster Service (MSCS) support on VMware vSphere

Microsoft Cluster Service (MSCS) is Microsoft cluster technology required shared storage supporting SCSI reservation mechanism. Microsoft has introduced new - perhaps more modern and more descriptive - name for the same technology. New name is "Microsoft Failover Cluster" so don't be confused with different names.

VMware has supplementary documentation called "Setup for Failover Clustering and Microsoft Cluster Service" covering the subject. Here is online documentation for vSphere 5.5 and here for vSphere 5.1. This documentation is a must to read to fully understand what is and what is not possible.

However documentation is relatively old so if you want to know up-to-date information you have to leverage VMware KB. Here are two KB articles related to the topic
  • Microsoft Cluster Service (MSCS) support on ESXi/ESX (1004617)
  • Microsoft Clustering on VMware vSphere: Guidelines for supported configurations (1037959)
And here is general advice.
Plan, plan, plan ... design ... review ... implement ... verify.
I hope you know what I mean.

DELL Force10 : Initial switch configuration

[ Previous | DELL Force10 : Series Introduction ]

I assume you have serial console access to the switch unit to perform initial switch configuration. I guess it will not impressed you that to switch from read mode to configuration mode you  have to use command
conf
... before continue I would like to recap some important basic FTOS commands we will use later in this blog post. If you want to exit from configuration mode or even from deeper configuration hierarchy you can do it with one or several
exit 
commands which will jump to the upper level of configuration hierarchy and eventually exit conf mode. However the easiest way to leave configuration mode is to use command
end
which will exit configuration mode immediately.

The last but very important and very often used command is
write mem
which will write your running switch configuration to the flash and therefore configuration will survive the switch reload. You can do the same with more general command
copy running-config startup-config
If you want to display running configuration you can use command
show running-config
Whole configuration can be pretty long, so if you are interested only on some part of running configuration you can use following commands
show running-config interface managementethernet 0/0
show running-config interface gigabitethernet 0/2
show running-config spanning-tree rstp
show running-config boot
As you can see FTOS command line interface (cli) is very similar to CISCO.

Ok, so after basics let's start with initial configuration. Switch configuration usually begins with host name configuration. It is generally good practice to use unique host names because you know on which system you are logged in.
hostname f10-s60
As a next step I usually configure management IP settings and enable remote access. You have to decide if you will use in-band management leveraging normal IP settings usually configured on dedicated VLAN interface just for system management or you will leverage dedicated out-of-band management port. In example below you can see
  • out-of-band management port for system management IP settings
  • how to create admin user
  • how to enable ssh to allow remote system management
interface ManagementEthernet 0/0
  ip address 192.168.42.101/24
  no shutdown
exit
management route 0.0.0.0/0 192.168.42.1

username admin password YourPassword privilege 15

ip ssh server enable
Now you have to decide if you want to enforce login for users connected via local console. By default there is no login required which can by security risk especially in environments without strict physical security rules. Below is configuration which enforce local login credentials when using serial console.
 aaa authentication login default local
At this point I would like to note that Force10 switch has all capabilities and features disabled in default factory configuration. That's the reason why for example each switch interface must be explicitly enabled before usage because all interfaces are in shutdown state by default.

Before you enable any switch interface it is good practice to enable spanning tree protocol as security mechanism against potential loops in the network. Once again, spanning tree feature is not enabled by default so you have to do it explicitly. Force10 FTOS has implemented all standard and even some non-standard (CISCO proprietary) spanning tree protocols like PVSTP+. On the latest FTOS version following spanning tree protocols are supported:

  • STP (Spanning Tree Protocol)
  • RSTP (Rapid Spanning Tree Protocol)
  • MSTP (Multiple Spanning Tree Protocol)
  • PVSTP+ (Per-VLAN Spanning Tree Plus)
Bellow is configuration example which enables standard rapid spanning tree protocol (aka RSTP) ...
protocol spanning-tree rstp
  no disable 
Another decision you have to do before implementation is the location from where do you want to boot your switch operating system. On some Force10 models (for example on S60) is default primary boot location TFTP server  ...
boot system stack-unit 0 primary tftp://192.168.128.1/FTOS-SC-8.3.3.7.bin
boot system stack-unit 0 secondary system: A:
boot system stack-unit 0 default system: B:
boot system gateway 192.168.128.1
You can see that primary boot location is TFTP server. If you don't have tens or hundreds of switches you usually don't want to load FTOS remotely from TFTP server but from internal flash in the switch. Although default switch configuration would work because if TFTP server boot fails switch boot sequence continue with secondary location  but it's better to configure the switch boot sequence explicitly base on your requirements. Below is typical boot sequence configuration.
boot system stack-unit 0 primary system: A:boot system stack-unit 0 secondary system: B:boot system stack-unit 0 default system: A: no boot system gateway
Next thing you should check is what FTOS version do you have. Below is the command how you can check it ...
f10-s60#show version
Dell Force10 Networks Real Time Operating System Software
Dell Force10 Operating System Version: 1.0
Dell Force10 Application Software Version: 8.3.3.7
Copyright (c) 1999-2011 by Dell Inc.
Build Time: Sat Nov 26 01:23:50 2011
Build Path: /sites/sjc/work/build/buildSpaces/build20/E8-3-3/SW/SRC
f10-s60 uptime is 4 minute(s)
System image file is "system://A"
System Type: S60
Control Processor: Freescale MPC8536E with 2147483648 bytes of memory.
128M bytes of boot flash memory.
  1 48-port E/FE/GE (SC)
 48 GigabitEthernet/IEEE 802.3 interface(s)
  2 Ten GigabitEthernet/IEEE 802.3 interface(s)
You can see FTOS version 8.3.3.7 which is not the latest one as the latest FTOS version at the time of writing this article is 8.3.3.9 and boot loader 1.0.0.5. It is generally good practice to upgrade FTOS to the latest version before performing verification test and going into production. For the latest version you have to go to http://www.force10networks.com and sign in. If you don't have Force10 account you can register there. Please note that each Force10 switch model use different FTOS versions. So there can be FTOS 9.4.x for model S4810 and 8.3.x for S60.

Now I'll show you how to do FTOS and boot loader upgrade.
FTOS should be upgraded first and Boot Loader later ...
upgrade system tftp: A:
upgrade system stack-unit all A:
(applicable only if you have stack configured)
upgrade boot ftp: (applicable only if  new bootloader compatible with FTOS code exists)
reload
You can check current FTOS version
show version
and if you want to know what FTOS version do you have on which boot bank you can 
show boot system stack-unit 0
By the way, have I told you there are two boot banks? Boot bank A: and boot bank B:so you can choose primary and secondary boot location. We have already covered boot configuration but here it is again ...
conf
  boot system stack-unit 0 primary system: A:
  boot system stack-unit 0 secondary system: B:
FTOS is loaded by boot loader and current Boot Loader can be displayed by command below
show system stack-unit 0
Hope this post is helpful for IT community. In case you have any question, suggestion or idea on improvements please share your thoughts in in the comments.

Stay tuned and wait for next article ...

[ Next | DELL Force10 : Interface configuration and VLANs ]

DELL Force10 : Series Introduction

I have just decided to write dedicated blog post series about DELL Force10 networking. Why?

Who knows me in person is most probably aware that my primary professional focus is on VMware vSphere infrastructure and datacenter enterprise hardware. Sometimes I have discussion with infrastructure experts, managers and other IT folks what is the most important/complex/critical/expensive vSphere component. vSphere component is meant by compute (servers), storage and networking.  I thing it is needless discussion because all components are important and have to be integrated into single integrated system fulfilling all requirements, dealing with known constraints and mitigating all potential risks. Such integrated infrastructures are very often called POD which stands, as far as I know, for Performance Optimized Datacenter. These integrated systems are, from my point of view, new datacenter computers having dedicated but distributed computing, storage and networking components. I would prefer to call such equipment as "Optimized Infrastructure Block" or "Datacenter Computer" because it is not only about performance but also about reliability, capacity, availability, manageability, recoverability and security. We call these attributes infrastructure qualities and whole infrastructure block inherits qualities from sub-components. Older IT folks often compare this concept with main frame architectures however nowadays we usually use commodity x86 hardware "a little bit" optimized for enterprise workloads. 

By the way that's one of the reason I like current DELL datacenter product portfolio because DELL has everything I need to build POD - server, storage systems and now also networking so I'm able to design single vendor infrastructure block with unified support, warranty, etc. Maybe someone don't know but DELL acquired EqualLogic and Compellent storage vendors some time ago, but more importantly for this blog post, DELL also acquired well known (at least in US) datacenter networking producer Force10. For official acquisition details look here.

But back to the networking. Everybody would probably agree that networking is very important part of vSphere infrastructure because of several reasons. It provides interconnect between clustered components - think about vSphere networks like Management, vMotion, Fault Tolerance, VSAN, etc. It also routes network traffic to the outside world. And sometimes it even provides storage fabrics (iSCSI, FCoE, NFS). That's actually the reason why I'm going to write this series of blog posts about DELL Force10 networking - because of networking importance. However I don't want to write about  legacy networking but modern networking approach for next generation virtualized and software defined datacenters.

Modern physical networking is not only about hardware (burned intelligence in ASICs with high bandwidth, fast and low latency interfaces) but also in software. The main software sits inside DELL Force10 switches. It is switch firmware called FTOS - Force10 Operating System (see. for more general information about FTOS look here).  However, today it is not only about switch embedded firmwares but also about whole software ecosystem - managements, centralized control planes, virtual distributed switches, network overlays, etc.
 
In future articles I would like to deep dive into FTOS features, configuration examples and virtualization related integrations.

Next, actually first technical article in this series will be about about typical initial configuration of Force10 switch. I know it is not rocket science but we have to know basics before taking off. In the future I would like to write about more complex designs, capabilities and configurations like
  • Multi-Chassis Link Aggregation (aka MC-LAG). In Force10 terminology we call it VLT - Virtual Link Trunking.
  • Virtual Routing and Forwarding (aka VRF). Some S-series Force10 models with FTOS 9.4 support VRF-lite.
  • some Software Define Networking (aka SDN) capabilities like python/perl scripting inside the switch, REST API, VXLAN hardware VTEP, Integration with VMware Distributed Virtual Switch, Integration with VMware NSX, OpenFlow, etc.
If you prefer some topics please let me know in comments and I'll try to prioritize it. Otherwise I'll write future posts based on my preferences.   
So let's finish blog post series introduction and start some technical stuff and begin with deep dive into switch initial configuration! Just click next and below ...

[ Next | DELL Force10 : Initial switch configuration ]

Sunday, April 20, 2014

Potential Network Black Hole Issue

When I do vSphere and hardware infrastructure health checks very often I meet misconfigured networks usually but not only in blade server environments. That's the reason I've decided to write blog post about this issue. The issue is general and should be considered and checked for any vendor solution but because I'm very familiar with DELL products I'll use DELL blade system and I/O modules to show you deeper specification and configurations.  

Blade server chassis typically have switch modules as depicted on figure below.


When blade chassis switch modules are connected to another network layer (aggregation or core) than there is possibility of network black hole which I would like to discuss deeply on this post.

Let's assume you will lost single uplink from I/O module A. This situation is depicted below.


In this situation there is not availability problem because network traffic can flow via second I/O module uplink port. Indeed, there is only half of uplink bandwidth so there is potential throughput degradation and therefore congestion can occur but everything works and it is not availability issue.

But what happen when second I/O switch module uplink port fails? Look at figure below.

   
If I/O switch module is in normal switch mode then uplink ports are in link-down state but downlink server ports are in link-up state and therefore ESX host NIC ports are also up and ESX teaming don't know that something is wrong down the path and traffic is sending to both NIC uplinks. We call this situation "black hole" because traffic routed via NIC1 will never reach the destination and your infrastructure is in trouble.

To overcome this issue some I/O modules in blade systems can be configured as I/O Aggregator. Some other modules are designed as I/O Aggregators by default and it cannot be changed.

Here are examples of DELL blade switch modules which are switches by default but can be configure to work as I/O Aggregators (aka Simple Switch Mode):
  • DELL PowerConnect M6220
  • DELL PowerConnect M6348
  • DELL PowerConnect M8024-k
Example of implicit I/O Aggregator is DELL Force10 IOA.

Another I/O Aggregator like option is to use Fabric Extender architecture implemented in DELL Blade System as CISCO Nexus B22. CISCO FEX is little bit different topic but is also help you to effectively avoid our black hole issue.

When you use "simple switch mode"you have limited configuration possibilities. For example you can use the module just for L2 and you cannot use advanced features like access control lists (ACLs). That can be reason you would like to leave I/O module in normal switch mode. But even you have I/O modules in normal switch mode you can configure your switch  to overcome potential "black hole" issue. Here are examples of DELL blade switches and technologies to overcome this issue:
  • DELL PowerConnect M6220 (Link Dependency)
  • DELL PowerConnect M6348 (Link Dependency)
  • DELL PowerConnect M8024-k (Link Dependency)
  • DELL Force10 MXL (Uplink Failure Detection)
  • CISCO 3130X (Link State Tracking)
  • CISCO 3130G (Link State Tracking)
  • CISCO 3032 (Link State Tracking)
  • CISCO Nexus B22 (Fabric Extender)
If you leverage any of technology listed above then link states of I/O module switch uplink ports are synchronized to the configured downlink ports and ESX teaming driver can effectively do ESX uplink high availability. Such situation is depicted in figure below.


Below are examples of detail CLI configurations of some port tracking technologies described above.


DELL PowerConnect Link Dependency

Link dependency configuration on both blade access switch modules can solve "Network Black Hole" issue.

 ! Server port configuration  
 interface Gi1/0/1  
 switchport mode general  
 switchport general pvid 201  
 switchport general allowed vlan add 201  
 switchport general allowed vlan add 500-999 tagged  
 ! Physical Uplink port configuration   
 interface Gi1/0/47  
 channel-group 1 mode auto  
 exit  
 ! Physical Uplink port configuration   
 interface Gi1/0/48  
 channel-group 1 mode auto  
 exit  
 ! Logical Uplink port configuration (LACP Port Channel)  
 interface port-channel 1  
 switchport mode trunk  
 exit   
 ! Link dependency configuration  
 link-dependency group 1  
 add Gi1/0/1-16  
 depends-on port-channel 1  

Force10 Uplink Failure Detection (UFD)

Force 10 call link dependency feature UFD and here is configuration example

 FTOS#show running-config uplink-state-group  
 !  
 uplink-state-group 1  
 downstream TenGigabitEthernet 0/0  
 upstream TenGigabitEthernet 0/1  
 FTOS#  

The status of UFD can be displayed by "show configuration" command

 FTOS(conf-uplink-state-group-16)# show configuration  
 !  
 uplink-state-group 16  
 description test  
 downstream disable links all  
 downstream TengigabitEthernet 0/40  
 upstream TengigabitEthernet 0/41  
 upstream Port-channel 8  

CISCO Link State Tracking

Link state tracking is a feature available on Cisco switches to manage the link state of downstream ports (ports connected to Servers) based on the status of upstream ports (ports connected to Aggregation/Core switches).

Saturday, April 19, 2014

How to fix broken VMFS partition?

I had a need of more storage space in my lab. The redundancy was not important so I changed RAID configuration of local disk from RAID 1 to RAID 0. After this change the old VMFS partition left on the disk volume. That was the reason I have seen just half of all disk space when I was trying to create new datastore. Another half was still used by old VMFS partition. You can ssh to ESXi host end check disk partition by partedUtil command. 

 ~ # partedUtil get /dev/disks/naa.690b11c0034974001ae597300bc5b3aa  
 Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? This will also fix the last usable sector as per the new size. diskSize (1169686528) AlternateLBA (584843263) LastUsableLBA (584843230)  
 Warning: Not all of the space available to /dev/disks/naa.690b11c0034974001ae597300bc5b3aa appears to be used, you can fix the GPT to use all of the space (an extra 584843264 blocks) or continue with the current setting? This will also move the backup table at the end if is is not at the end already. diskSize (1169686528) AlternateLBA (584843263) LastUsableLBA (584843230) NewLastUsableLBA (1169686494)  
 72809 255 63 1169686528  
 1 2048 584843230 0 0  

Old datastore didn't contain any data so I had no problem to try fix command option by following command.

 ~ # partedUtil fix /dev/disks/naa.690b11c0034974001ae597300bc5b3aa  

When I checked disk partition on disk device I get  following output

 ~ # partedUtil get /dev/disks/naa.690b11c0034974001ae597300bc5b3aa  
 72809 255 63 1169686528  

... and then I was able to create datastore on whole disk.




DELL recommended BIOS Settings for VMware vSphere Hypervisor

Here are a list of BIOS settings  specifically regarding Dell PowerEdge servers:
  • Hardware-Assisted Virtualization: As the VMware best practices state, this technology provides hardware-assisted CPU and MMU virtualization.
    In the Dell PowerEdge BIOS, this is known as “Virtualization Technology” under the “Processor Settings” screen. Depending upon server model, this may be Disabled by default. In order to utilize these technologies, Dell recommends setting this to Enabled.
  • Intel® Turbo Boost Technology and Hyper-Threading Technology: These technologies, known as "Turbo Mode" and "Logical Processor" respectively in the Dell BIOS under the "Processor Settings" screen, are recommended by VMware to be Enabled for applicable processors; this is the Dell factory default setting.
  • Non-Uniform Memory Access (NUMA): VMware states that in most cases, disabling "Node Interleaving" (which enables NUMA) provides the best performance, as the VMware kernel scheduler is NUMA-aware and optimizes memory accesses to the processor it belongs to. This is the Dell factory default.
  • Power Management: VMware states “For the highest performance, potentially at the expense of higher power consumption, set any BIOS power-saving options to high-performance mode.” In the Dell BIOS, this is accomplished by setting "Power Management" to Maximum Performance.
  • Integrated Devices: VMware states “Disable from within the BIOS any unneeded devices, such as serial and USB ports.” These devices can be turned off under the “Integrated Devices” screen within the Dell BIOS.
  • C1E: VMware recommends disabling the C1E halt state for multi-threaded, I/O latency sensitive workloads. This option is Enabled by default, and may be set to Disabled under the “Processor Settings” screen of the Dell BIOS.
  • Processor Prefetchers: Certain processor architectures may have additional options under the “Processor Settings” screen, such as Hardware Prefetcher, Adjacent Cache Line Prefetch, DCU Streamer Prefetcher, Data Reuse, DRAM Prefetcher, etc. The default settings for these options is Enabled, and in general, Dell does not recommend disabling them, as they typically improve performance. However, for very random, memory-intensive workloads, you can try disabling these settings to evaluate whether that may increase performance of your virtualized workloads.
 Source: http://en.community.dell.com/techcenter/virtualization/w/wiki/dell-vmware-vsphere-performance-best-practices.aspx

Tuesday, April 15, 2014

Long network lost during vMotion

We have observed strange behavior of vMotion during vSphere Design Verification tests after successful vSphere Implementation. By the way that's the reason why Design Verification tests are very important before putting infrastructure into production.But back to the problem. When VM was migrated between ESXi hosts leveraging VMware vMotion we have seen long network lost of VM networking somewhere between 5 and 20 seconds. That's really strange because usually you didn't observe any network lost or maximally lost of single ping from VM perspective.

My first question to implementation team was if  virtual switch option "Notify Switch" is enabled as is documented in vSphere design. The importance of this option is relatively deeply explained for example here and here. The answer from implementation team was positive so problem has to be somewhere else and most probably on physical switches. The main reason of option "Notify Switch" is to quickly update physical switch mac-address-table (aka CAM) and help physical switch to understand on which physical port VM vNIC is connected. So as another potential culprit was intended physical switch. In our case it was  virtual stack of two HP blade modules 6125XLG.

At the end it has been proven by implementation team that HP switch was the culprit. Here is very important HP switch global configuration setting to eliminate this issue.

 mac-address mac-move fast-update  

I didn't find any similar blog post or KB article about this issue so I hope it will be helpful for VMware community. 

DELL official response to OpenSSL Heartbleed

The official Dell response to OpenSSL Heartbleed for our entire portfolio of products is listed here.

Monday, April 14, 2014

Code Formatter

From time to time i'm publishing programming code source or configurations on my blog running on google blog platform blogger.com.  I'm always struggling with formatting the code.

I've just found http://codeformatter.blogspot.com/ and I'll try it next time when needed.

Tuesday, April 08, 2014

PRTG alerts phone call notifications

I have been asked by someone how to do phone call notification of critical alerts in PRTG monitoring system. Advantage of phone call notification against Email or SMS is that it can wake up sleeping administrator in night when he has support service and critical alert appears in central monitoring system.

My conceptual answer was ... use PRTG API to monitor alerts and make a phone call when critical alerts exist.

New generation of admins doesn't have problem with APIs but don't know how to dial voice call. That's because they are IP based generation and don't have experience with modems we played extensively back in 80's and 90's ;-)

At the end I promised them to prepare embedded system integrated with PRTG over API and dialing phone configured number in case of critical alerts.

Here is the picture of hardware prototype leveraging soekris computing platform running FreeBSD OS in RAM disk and making phone calls via RS-232 GSM modem.
Soekris computing platform and RS-232 GSM modem.
Here are relevant blog posts describing some technical details little bit deeper