VCDX #200 The Ultimate Way to VirtualizeBlog of one VMware Infrastructure Designer

Tuesday, July 28, 2015

How you understand documenting Conceptual, Logic, Physical?

I have just read following question in Google+ "VCDX Study Group 2015"

As a fellow writer (we architects are not readers, but writers! :) ) wanted to ask you how you understand documenting Conceptual, Logic, Physical.
Can you add all these in a single Architecture design document with all 3 parts as 3 sections or you are better off creating 3 separate documents for each type of design?

I'm hearing very often similar questions about approach how to write good design documentation. So my answer was following ...

As a writer you have to decide what is the best for your readers :-)

When I'm engaged to write Architecture document I use different approaches for different design engagements. It really depends on project size, scope, audience, architecture team, etc... For example, right now I'm working on project where 6 architects are working on single High Level Design covering big picture and each preparing Low Level Design. At the end there is single HLD document and five separate LLD documents covering

Compute,
Storage,
Networking,
vSphere and
Backup.

I had another projects where whole architecture was in single document where each section was targeted to different auditorium. That was a case of my VCDX design documentation.

Generally I believe High Level Design (HLD) is for broader technical audience but also for business owners. Therefore physical design is not required in this stage and only Conceptual and brief Logical design for each area should be in HLD. Low Level Design (LLD) is for technical implementers and technical operational personnel therefore less writing creativity and more deep technical language for specific area should be used there with references to HLD. I recommend to read Greg Ferro's "Eleven Rules of Design Documentation" which IMHO apply very good to LLD.

HLD Conceptual Design should include business and technical requirements, constraints, assumptions, key design decisions, overall high level concept and risk analysis).

HLD Logical Design should include basic logical constructions for different design areas together with capacity planning.

LLD should include Conceptual, Logical and Physical design for specific area(s) or designed system/subsystem. In LLD conceptual design there should be a subset of HLD technical requirements, constraints and assumptions and maybe some other specific requirements irrelevant in HLD. They can be even discovered after HLD and LLD design reviews and additional technical workshops. Logical design can be the same as in HLD or you can go into deeper level but still stay in logical layer without product physical specifications, cabling port IDs, VLAN IDs, IP addressing, etc... These physical details should be in in physical design and if needed referenced in to attachments, excel workbooks, or similar implementation/configuration management documents.

LLD Physical design is usually leveraged by implementer to prepare As-Built documentation.

That's just my $0.02 and your mileage can vary.

At the end I have to repeat ... you, as a writer (Architect), have to decide appropriate documentation format for your target audience.

Don't hesitate to share your thoughts in comments.

Tuesday, July 07, 2015

DELL Force10 : Interface Configuration and VLANs

Physical interface configuration

Physical switch interface configuration is a basic operation with any switch device. DELL Force10 switch is no exception. However, one thing is very unique on Force10 switches. Everything, including physical interfaces, on Force10 switch, is disabled by default, therefore, interfaces are in downstate and must be configured before any use. Someones are saying it is strange behavior but in my opinion, that's pretty good behavior because it is a much more secure approach. You will not disrupt the whole network by connecting and cabling new switch into your enterprise network until you configure something. If you will do bad configuration than it is your fault and not device fault.

Ok, so when you want to use some switch interface you have to enable interface explicitly. Before that, you should be absolutely sure your new Force10 switch is ready to be connected to the network. Think for example about spanning tree protocol configuration. Let's assume you know what you do and you want to enable the particular physical interface. It is easy. I think the example below is self-explanatory

conf
interface tengigabit 0/1
no shutdown

So your interface is up but another important note is that all physical interfaces are Layer 3 by default. You can assign IP address to Layer 3 (routed) interface and your L3 switch is configured as a router device. IP address assignment is shown below.

conf
interface tengigabit 0/1
ip address 192.168.1.11/24
no shutdown

Cool, but there is a chance you want configure Layer 2 interface to work as a switch port and not routed port. It is pretty easy, you have to tell it to your interface to not have IP address and behave as a switch port.

conf
interface tengigabit 0/1
no ip address
switchport
no shutdown

Physical Interface Numbering

So far we have used similar interface identification as follows

interface tengigabit 0/13

General interface identification convention parts have following format

interface “Interface Type” “Stack Unit Number”/”Interface Number”

where

· Interface Type - can have values gigabit (gi), tengigabit (te), fortygigabit (fo)

· Stack Unit Number - is stack ID number if classic stacking is configured otherwise there is 0 as it is a single unit switch

· Interface Number - is a sequential port number on particular stack unit

Interface ranges

You can leverage interface ranges to simplify interface and VLAN configurations.

conf

interface range te 0/1-3, te 0/5-7

interface range vlan 100-110
interface range vlan 4, vlan 11, vlan 22-26, vlan 100

I think you can see the benefit. All configurations are applied to all interfaces on the range.

VLANs

In configurations above everything is happening in default VLAN which is by default VLAN 1. This is a single broadcast domain. In computer networking, a single layer-2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them via one or more routers; such a domain is referred to as a virtual local area network, virtual LAN or VLAN. Source: wikipedia. Nowadays VLANs are used very often for network separation (security) and broadcast domain split (availability, performance). If you are familiar with CISCO VLAN configuration then you have to create VLAN id in VLAN database and you can assign VLAN/VLANs to the particular interface. In Force10 it is a little bit different. You have to create VLAN id, that's the same. However, you are not assigning VLAN per interface but assigning interfaces to VLAN. See example below.

conf
interface vlan 100
description "VLAN for mission critical servers"
untagged TenGigabitEthernet 0/1-3
tagged TenGigabitEthernet 0/0

In the example above we have created VLAN 100 for three mission-critical servers. Servers are connected to ports Te 0/1, Te 0/2 and Te 0/3 without VLAN tagging (aka access mode in Cisco terminology). The particular switch port in "access" mode is configured in Force10 in following way

interface TenGigabitEthernet 0/1

description "Mission critical server 1"

no ip address

switchport

spanning-tree rstp edge-port bpduguard

no shutdown

Switch interface Te 0/0 is uplink to the rest of the network, therefore, more VLANs has to be configured on this particular port, therefore, the port is configured as a trunk port with more tagged VLANs. Switch uplinks are usually configured redundantly in a high availability mode, therefore, there is a big chance you would like to use port-channel (aka LAG) as a switch uplink. LAGs are explained in next section. Port-channel is nothing else as a special virtual interface, therefore, port-channel VLAN configuration is very similar to physical interfaces.

conf
interface vlan 100
description "VLAN for mission critical servers"
untagged TenGigabitEthernet 0/1-3
tagged Port-channel 1

There is another switch port mode which is typical for ESXi hosts. It is the server, but you want one VLAN to be configured as a native (usually for ESXi management) and also to configure trunk of multiple VLANs for virtual networking (VMware portgroups). Force10 call this port configuration as a hybrid. Switch port configuration will look similar as an example below

interface GigabitEthernet 0/11

description ESX11

no ip address

mtu 9252

portmode hybrid

switchport

spanning-tree rstp edge-port

no shutdown

and for such hybrid switch port, we can have one VLAN configured as an untagged (aka native in Cisco terminology) and multiple VLANs as tagged. VLAN configuration should look similar to ...

interface Vlan 4

description DC-MGMT

ip address 192.168.4.254/24

untagged GigabitEthernet 0/4,6,11-14,34-36,41,43

no shutdown

VLAN 4 is a used for ESXi management but multiple other VLANs can be carried to the ESXi host as tagged VLANs. This is depicted in configuration snippet below ...

interface Vlan 22

description VMOTION

name VMOTION

no ip address

tagged GigabitEthernet 0/11-14,34-36

shutdown

interface Vlan 23

description VTEP

name NSX-OVERLAY

ip address 192.168.23.254/24

tagged GigabitEthernet 0/11-14,34-36

no shutdown

interface Vlan 24

description ISCSI

name ISCSI

ip address 192.168.24.254/24

tagged GigabitEthernet 0/11-14,34-36

untagged GigabitEthernet 0/10

no shutdown

interface Vlan 25

description NFS

name NFS

ip address 192.168.25.254/24

tagged GigabitEthernet 0/11-14,34-36

no shutdown

interface Vlan 26

description VSAN

name VSAN

no ip address

tagged GigabitEthernet 0/11-14,34-36

shutdown

interface Vlan 100

description V2P-PEERING

name V2P-PEERING

ip address 172.16.0.254/24

tagged GigabitEthernet 0/11-14,34

no shutdown

So the solution above is one way how to do it. VLAN by VLAN, but what if I would like to configure two new ports into existing VLANs?

Let's assume I have two switch ports (gi 0/29 and gi 0/34) which I want to configure for ESXi hypervisor. Below is the basic configuration of switch ports.

interface GigabitEthernet 0/29
description ESX01-nic1
no ip address
mtu 9216
portmode hybrid
switchport
spanning-tree rstp edge-port
no shutdown

interface GigabitEthernet 0/34
description ESX01-nic0
no ip address
mtu 9216
portmode hybrid
switchport
spanning-tree rstp edge-port
no shutdown

Those, who are familiar with CISCO switch operating systems, they would expect VLANs configuration along with switch port configuration. Dell FTOS is different because you have to configure VLANs from the VLANs point of view and not from the switch ports point of view. Let's assume we have VLAN 4 for vSphere management network segment where ESXi hosts are connected natively without 802.1Q tagging. The rest of VLANs we would like to expose into ESXi must be tagged. These VLANs are 2-3, 5-9, 11, 13, 22-26, 31-34, 51-52, 100-101. So, below is the FTOS CLI commands to add two particular switch ports to several required VLANs ...

conf
interface vlan 4
untagged GigabitEthernet 0/29
untagged GigabitEthernet 0/34

interface range vlan 2-3,vlan 5-9,vlan 11,vlan 13,vlan 22-26,vlan 31-34,vlan 51-52
tagged GigabitEthernet 0/29
tagged GigabitEthernet 0/34

interface range vlan 100-101
tagged GigabitEthernet 0/29
tagged GigabitEthernet 0/34

Note: In this particular case, I have to use two ranges because the FTOS interface range is limited. See what error message you would get if you try to configure single interface range of all VLANs mentioned above.

interface range vlan 2-3,vlan 5-9,vlan 11,vlan 13,vlan 22-26,vlan 31-34,vlan 51-52,vlan 100-101

% Error: Exceeds maximum number of command arguments ( max = 32 ).

LAGs - Link Aggregates

Link Aggregation is a general term for channeling multiple links into single virtual aggregate also known as a port channel. There are two types of port channels static and dynamic (aka LACP). For more general information about "link aggregation" look here.

Now let's see how you can configure port channels.

Static Port Channel

Below is the example of static port channel bundled with two interfaces (te 0/1 and te 0/2)

interface port-channel 1
description "Static Port-Channel"
channel-member tengigabit 0/1
channel-member tengigabit 0/2
no ip address
switchport
no shutdown

Dynamic Port Channel

Below is the example of dynamic port channel bundled with two interfaces (te 0/1 and te 0/2)

VLT (Virtual Link Trunking) is actually virtual Port Channel spanned across multiple chassis (aka MultiChassis LAG). VLT can be static or dynamic port-channel. When two Force10 switches are configured in single VLT domain you can create VLT port-channel independently on each VLT node. You can read more about VLT here.

You configure VLT port-channel on each node in the absolutely same way as classic port-channels. The only difference is that you will tell FTOS that this particular port-channel is VLT and you can define peer port-channel id which can be different than on another node. However, the best practice is to use same port-channel IDs on both VLT nodes just to keep configuration simple and more readable.

Directive to tell the port-channel is VLT is vlt-peer-lag.

So if port-channel examples above would be VLTs then the configuration is the same only with one additional option. See examples below.

Static VLT Port Channel

interface port-channel 1
description "Static Port-Channel"
channel-member tengigabit 0/1
channel-member tengigabit 0/2
vlt-peer-lag port-channel 1
no ip address
no shutdown

Dynamic VLT Port Channel

interface port-channel 1

description "Dynamic Port-Channel (LACP)"

no ip address

vlt-peer-lag port-channel 1

switchport

no shutdown

interface tengigabit 0/1

port-channel-protocol lacp

port-channel 1 mode active

no shutdown

interface tengigabit 0/2

port-channel-protocol lacp

port-channel 1 mode active

no shutdown

Conclusion

Interface and VLAN configuration is a basic network operation. If you are familiar with any other switch vendor interface configuration I think Force10 interface configuration is simple for you. The only different approach is with VLAN configuration but it is just a matter of habit.

Hope you found this blog post useful and as always, any comment and feedback are highly appreciated.

Monday, June 29, 2015

DELL Force10 : Virtual Routing and Forwarding (VRF)

VRF Overview

Virtual Routing and Forwarding (VRF) allows a physical router to partition itself into multiple Virtual Routers (VRs). The control and data plane are isolated in each VR so that traffic does NOT flow across VRs. Virtual Routing and Forwarding (VRF) allows multiple instances of a routing table to co-exist within the same router at the same time.

DELL OS 9.7 supports up 64 VRF instances. Number of instances can be increased in future versions therefore check current documentation for authoritative number of instances.

VRF Use Cases

VRF improves functionality by allowing network paths to be segmented without using multiple devices. Using VRF also increases network security and can eliminate the need for encryption and authentication due to traffic segmentation.

Internet service providers (ISPs) often take advantage of VRF to create separate virtual private networks (VPNs) for customers; VRF is also referred to as VPN routing and forwarding.

VRF acts like a logical router; while a physical router may include many routing tables, a VRF instance uses only a single routing table. VRF uses a forwarding table that designates the next hop for each data packet, a list of devices that may be called upon to forward the packet, and a set of rules and routing protocols that govern how the packet is forwarded. These VRF forwarding tables prevent traffic from being forwarded outside a specific VRF path and also keep out traffic that should remain outside the VRF path.

VRF uses interfaces to distinguish routes for different VRF instances. Interfaces in a VRF can be either physical (Ethernet port or port channel) or logical (VLANs). You can configure identical or overlapping IP subnets on different interfaces if each interface belongs to a different VRF instance.

VRF Configuration

First of all you have to enable VRF feature.

conf
feature vrf

Next step is to create additional VRF instance

ip vrf tenant-1

vrf-id is assigned automatically however if you want to configure vrf-id explicitly you can by additional parameter. In example below we use vrf-id 1

ip vrf tenant-1 1

We are almost done. The last step is interface assignment in to particular VRF. You can assign following interfaces

Physical Ethernet interfaces (in L3 mode)
Port-Channel interfaces (static and dynamic/lacp)
VLAN interfaces
Loopback interfaces

Below is example how to assign LAN 100 in to VRF instance tenant-1.

interface vlan 100
ip vrf forwarding tenant-1

Configuration is pretty easy, right?

Working in particular VRF instance

When you want to configure, show or troubleshoot in particular VRF instance you have to explicitly specify in what VRF you want to be.

So for example when you want to do ping from tenant-01 VRF instance you have to use following command

ping vrf tenant-01 192.168.1.1

Conclusion

VRF is great technology for L3 multi-tenancy. DELL Network Operating System 9 supports VRF therefore you can design interesting network solutions.

Saturday, June 20, 2015

DELL Compellent Best Practices for Virtualization

All DELL Compellent Best Practices has been moved here.

The most interesting best practice document for me is "Dell Storage Center Best Practices with VMware vSphere 6.x".

VMware HA Error During VLT Failure

I have received following message in to my mailbox ...

Hi.
I have a customer that has been testing Force10 VLT with peer routing and VMWare and has encountered the warning message on all hosts during failover of the switches (S4810’s) only when the primary VLT node is failed
“vSphere HA Agent, on this host couldn’t not reach isolation address 10.100.0.1”
Does this impact HA at all? Is there a solution?
Thanks
Paul

Force10 is the legacy product name of DELL S-series datacenter networking. Force10 S4810's are datacenter L3 switches. If you don't know what Force10 VLT is look here. Generally it is something like CISCO virtual Port Channel (vPC), Juniper MC-LAG, Arista MLAG, etc.

I think my answer can be valuable for broader networking and virtualization community so here it is ...

First of all let’s make some assumptions:

Force10 VLT is used for multi chassis LAG capability
Force10 VLT peer routing is enabled in VLT domain to achieve L3 routing redundancy
10.100.0.1 is IP address of VLAN interface on Force10 S4810 (primary VLT node) and this particular VLAN is used for vSphere management.
10.100.0.2 is IP address on Force10 S4810 - secondary VLT node.
vSphere 5.x and above is used.

Root cause with explanation:

When primary Force10 VLT node is down then ping to 10.100.0.1 doesn’t work because peer-routing is ARP proxy on L2. Secondary node will route L2 traffic on behalf of primary node but 10.100.0.1 doesn’t answer on L3 therefore ICMP doesn’t work.

VMware (vSphere 5 and above) HA Cluster use network and storage heartbeat mechanism. Network mechanism use two probe algorithms listed below.

ESXi hosts in the cluster are sending heartbeat beacon to each other. This should work ok during primary VLT node failure.
ESXi hosts are also pinging HA isolation addresses (Default HA isolation address is default gateway therefore 10.100.0.1 in your particular case). This doesn’t work during primary VLT node failure.

That’s the reason VMware HA Cluster will log about this situation.

Is there any impact?

There is no impact on HA Cluster because

It is just informative message because algorithm (1) works correctly and there is still network visibility among ESXi hosts in the cluster.
From vSphere 5 and above there is also storage heartbeat mechanism which can eliminate network invisibility among ESXi host in the cluster.

Are there any potential improvements?

Yes they are. You can configure multiple HA Isolation Addresses to mitigate default gateway unavailability. In your particular case I would recommend to use two IP addresses (10.100.0.1 and 10.100.0.2) because at least one VLT node will be always available.

For more information how to configure multiple HA isolation addresses look at http://kb.vmware.com/kb/1002117

Monday, June 15, 2015

No data visibility for vSphere Admin

Recently I did very quick (time constrained) conceptual/logical design exercise for one customer who had virtualization first strategy and was willing to virtualize his Tier 1 business critical applications. One his requirement was to preclude data visibility for VMware vSphere admins.

I was quickly thinking how to fulfill this particular requirement and my first general answer was ENCRYPTION. The customer asked me to tell him more about encryption possibilities and I listed him following options.

Option 1/ Encryption in the Guest OS

Product examples Microsoft BitLocker, HyTrust, SafeNet, etc.
Very nice comparison of disk encryption softwares is here.

Option 2/ Application level encryption

Product examples Database Encryption in SQL Server 2008 and higher, Oracle Database Transparent Encryption, etc.

Option 3/ Encryption in the Fibre Channel SAN

Example is Brocade SAN Encryption Solution or Cisco MDS 9000 Family Storage Media Encryption.

Option 4/ Encryption in the Disk Array

Data encryption behind storage controllers. Usually leveraging Self Encrypted Disks (aka SED).

Next logical question was ... what is the performance impact?.
My quick answer was that there is definitely performance overhead in software encryption but no performance overhead with hardware encryption as it is offloaded into the special ASICs.

Hmm... Right, the most appropriate answer would be that hardware solutions are designed to have none or negligible performance impact. I always recommend to do testing before any real use in production but that's what hardware vendors claim at least in their white papers. Specifically in option (3) storage IO has to be redirected to the encryption module/appliance in the SAN which should be order of magnitude less that typical IO response time therefore impact on storage latency should be theoretically none or negligible.

However the problem with my recommended options is not performance claim.
The problem is that only option 1 and 2 are applicable to fulfill customer's requirement because option 3 and 4 do encryption and decryption on lower levels and data are decrypted and visible on vSphere layer. Therefore vSphere admin would have visibility into data.

Options 1 and 2 has definitely some performance overhead nowadays generally somewhere between 20%-100% depending on software solution, CPU family, encryption algorithm strength, encryption key length, etc.

For completeness let's say that options 3 and 4 are good for different use cases.

Option 3 can help you to secure data from storage admin not having access to SAN network or from someone having physical access to disks.
Option 4 can help you to secure data on disks against theft of physical storage or disks.

It is worth to say that security is always trade-off.

Software based solutions has some negative impact on performance, medium negative impact on price and also negative impact on manageability. Performance of software based solutions can be significantly improved by leveraging AES hardware offload to modern Intel Processors and performance overhead will be mitigated year by year.

Pure hardware based solutions are not applicable options for our specific requirement but even it would be applicable and they will have none or negligible impact on performance there are drawbacks like huge impact on cost and also some impact on scalability and manageability.

Conclusion
I was very quick during my consulting and I didn't realize what options really fulfill specific customer's requirement. I'm often saying that I don't trust anybody nor my self. This was exactly the case - unfortunately :-(

Time constrained consulting usually doesn't offer the best results. Good architecture need some time for review and better options comparison :-)

Thursday, May 28, 2015

How large is my ESXi core dump partition?

Today I have been asked to check the core dump size on ESXi 5.1 host because this particular ESXi experienced PSOD (Purple Screen of Death) with a message that the core dump was not saved completely because out of space.

To be honest, it took me some time to find the way how to find core dump partition size therefore I documented here.

All commands and outputs are from my home lab where I have ESXi 6 booted from USB but the principle should be the same.

To run these commands you have to log in to ESXi shell for example over SSH or ESXi troubleshooting console.

First step is to get information on what disk partition is used for the core dump.

 [root@esx01:~] esxcli system coredump partition get  Active: mpx.vmhba32:C0:T0:L0:9  
   Configured: mpx.vmhba32:C0:T0:L0:9

Now we know that core dump is configured on disk mpx.vmhba32:C0:T0:L0 partition 9.

Second step is to list disks and disks partitions together with sizes.

 [root@esx01:~] ls -lh /dev/disks/total 241892188  
 -rw-------  1 root   root    3.7G May 28 11:25 mpx.vmhba32:C0:T0:L0  
 -rw-------  1 root   root    4.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:1  
 -rw-------  1 root   root   250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:5  
 -rw-------  1 root   root   250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:6  
 -rw-------  1 root   root   110.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:7  
 -rw-------  1 root   root   286.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:8  
 -rw-------  1 root   root    2.5G May 28 11:25 mpx.vmhba32:C0:T0:L0:9

You can get the same information by partedUtil.

[root@esx01:~] partedUtil get /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:9326 255 63 5242880

Here you can see the partition has 5,242,880 sectors where each sector is 512 bytes. That's mean 5,242,880 * 512 / 1024 / 1024 / 1024 = 2.5GB

Note: It is 2.5GB because ESXi is installed on 4GB USB. If you have regular hard drive core dump partition should be 4 GB.

BUT all the above information is not valid if you have changed your Scratch Location (here is VMware KB how to do it). If your Scratch Location is changed you can display the current scratch location which is stored on /etc/vmware/locker.conf

 [root@esx01:~] cat /etc/vmware/locker.conf  
 /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cz 0

and you can list sub directories in your custom scratch location

 [root@esx01:~] ls -la /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cz  
 total 28  
 d---------  7 root   root     4096 May 12 21:45 .  
 d---------  4 root   root     4096 May 3 20:47 ..  
 d---------  2 root   root     4096 May 3 21:17 core  
 d---------  2 root   root     4096 May 3 21:17 downloads  
 d---------  2 root   root     4096 May 28 09:30 log  
 d---------  3 root   root     4096 May 3 21:17 var  
 d---------  2 root   root     4096 May 12 21:45 vsantraces

Please note that the new scratch location contains the custom core dump subdirectory (core) and also log subdirectory (log).

Other considerations
I usually change ESXi coredump partition and log directory location to shared datastore. This is done by following ESXi host advanced settings fully described in this VMware KB:

CORE DUMP Location: ScratchConfig.ConfiguredScratchLocation
Log Location: Syslog.global.logDir and optionally Syslog.global.logDirUnique if you want to redirect all ESXi hosts to the same directory

I also recommend sending logs to the remote syslog server over the network which is done with an advanced setting

Remote Syslog Server(s): Syslog.global.logHost

ESXi core dumps can also be transferred over to the network to the central Core Dump Server. It has to be configured with the following esxcli commands.

 esxcli system coredump network set --interface-name vmk0 --server-ipv4 [Core_Dump_Server_IP] --server-port 6500  
 esxcli system coredump network set --enable true  
 esxcli system coredump network check

Wednesday, May 06, 2015

DELL Force10 VLT and vSphere Networking

DELL Force10 VLT is multi chassis LAG technology. I wrote several blog posts about VLT so for VLT introduction look at http://blog.igics.com/2014/05/dell-force10-vlt-virtual-link-trunking.html. All Force10 related posts are listed here. By the way DELL Force10 S-Series switches has been renamed to DELL S-Series switches with DNOS 9 (DNOS stands for DELL Network Operating System) however I’ll keep using Force10 and FTOS in my series to keep it uniform.

In this blog post I would like to discuss Force10 VLT specific failure scenario when VLTi fails.

VLT Domain is actually cluster of two VLT nodes (peers). One node is configured as primary and second node as secondary. VLTi is a peer link between two VLT nodes. The main role of VLTi peer link is to synchronize MAC addresses interface assignments which is used for optimal traffic in VLT port-channels. In other words if everything is up and running data traffic over VLT port-channels (virtual LAGs) is optimize and optimal link will be chosen to eliminate inter VLTi traffic. VLTi is used for data traffic only in case of some VLT link failure in one node and another VLT link still available on another node.

Now you can ask what happen in case of VLTi failure. In this situation backup link will kick in and act as a backup communication link for VLT Domain cluster. This situation is called Split-Brain scenario and exact behavior is nicely described in VLT Reference guide.

The backup heartbeat messages are exchanged between the VLT peers through the backup links of the OOB Management network. When the VLTI link (port-channel) fails, the MAC/ARP entries cannot be synchronized between the VLT peers through the failed VLTI link, hence the Secondary VLT Peer shuts the VLT port-channel forcing the traffic from the ToR switches to flow only through the primary VLT peer to avoid traffic black-hole. Similarly the return traffic on layer-3 also reaches the primary VLT node. This is Split-brain scenario and when the VLTI link is restored, the secondary VLT peer waits for the pre-configured time (delay-restore) for the MAC/ARP tables to synchronize before passing the traffic. In case of both VLTi and backup link failure, both the VLT nodes take primary role and continue to pass the traffic, if the system mac is configured on both the VLT peers. However there would not be MAC/ARP synchronization.

With all that being said let’s look at some typical VLT topologies with VMware ESXi host. Force10 S4810 is L3 switch therefore VLT domain can provide switching and routing services. Upstream router is single router for external connectivity. ESXi host has two physical NIC interfaces.

First topology

First topology is with VMware switch independent connectivity. This is very common and favorite ESXi network connectivity because of simplicity for vSphere administrator.

The problem with this topology is when VLTi peer-link has a failure (red cross in the drawing). We already know that in this scenario the backup link will kick in and VLT links from secondary node are intentionally disabled (black cross in the drawing). However our ESXi host is not connected via VLT therefore the server facing port will stay up. VLT Domain doesn’t know anything about VMware vSwitch topology therefore it must keep port up which implies as a black hole scenario (black circle in the drawing) for virtual machines pinned into VMware vSwitch Uplink 2.

I hear you. You ask what the solution for this problem is. I think there are two solutions. First out-of-the-box solution is to use VLT down to the ESXi host which is depicted on second topology later in this post. Second solution could be to leverage UFD (Uplink Failure Detection) and track some VLT ports together with server facing ports. I did not test this scenario but I think it should work and there is big probability I’ll have to test it soon.

Second topology

Second topology is leveraging VMware LACP. LACP connectivity is obviously more VLT friendly because VLT is established down to the server and downlink to ESXi host is correctly disabled. Virtual machines are not pinned directly to VMware vSwitch uplinks but they are connected through LACP virtual interface. That’s the reason you will not experience black hole scenario for some virtual machines.

Conclusion

Server virtualization is nowadays on every modern datacenter. That’s the reason why virtual networking has to be taken in to account for any datacenter network design. VMware switch independent NIC teaming is simple for vSphere administrator but it can negatively impact network availability in some scenarios. Unfortunately VMware standard virtual switch doesn’t support dynamic port-channel (LACP) but only static port-channel. Static port-channel should work correctly with VLT but LACP is recommended because of LACP keep-alive mechanism. LACP is available only with VMware distributed virtual switch which requires the highest VMware licenses (vSphere Enteprise Plus edition). VMware’s distributed virtual switch with LACP uplink is the best solution for Force10 VLT. In case of the budget or technical constraint you have to design an alternative solution leveraging either static port-channel (VMware call it “IP Hash load balancing”) or FTOS UFD (Uplink Failure Detection) to mitigate risk of black hole scenario.

Update 2015-05-13:
I have just realized that NPAR is actually technical constraint avoiding to use port-channel technology on ESXi host virtual switch. NPAR technology allows switch independent network partitioning of physical NIC ports into more logical NICs. However port-channel cannot be configured on NPAR enabled NICs therefore UFD is probably the only solution to avoid black hole scenario when VLT peer-link fails.

VCDX #200 The Ultimate Way to Virtualize
Blog of one VMware Infrastructure Designer

Pages

Tuesday, July 28, 2015

How you understand documenting Conceptual, Logic, Physical?

Tuesday, July 07, 2015

DELL Force10 : Interface Configuration and VLANs

· Interface Type - can have values gigabit (gi), tengigabit (te), fortygigabit (fo)

· Stack Unit Number - is stack ID number if classic stacking is configured otherwise there is 0 as it is a single unit switch

· Interface Number - is a sequential port number on particular stack unit

Monday, June 29, 2015

DELL Force10 : Virtual Routing and Forwarding (VRF)

Saturday, June 20, 2015

DELL Compellent Best Practices for Virtualization

VMware HA Error During VLT Failure

Monday, June 15, 2015

No data visibility for vSphere Admin

Thursday, May 28, 2015

How large is my ESXi core dump partition?

Wednesday, May 06, 2015

DELL Force10 VLT and vSphere Networking

First topology

Second topology

Conclusion

Pages

Tuesday, July 28, 2015

Tuesday, July 07, 2015

· Interface Type - can have values gigabit (gi), tengigabit (te), fortygigabit (fo)

· Stack Unit Number - is stack ID number if classic stacking is configured otherwise there is 0 as it is a single unit switch

· Interface Number - is a sequential port number on particular stack unit

Monday, June 29, 2015

Saturday, June 20, 2015

Monday, June 15, 2015

Thursday, May 28, 2015

Wednesday, May 06, 2015

First topology

Second topology

Conclusion

Subscribe To