Last week I have received following question from one of my reader …
I came to your blog post http://blog.igics.com/2014/05/dell-force10-vlt-virtual-link-trunking.html and I am really happy that you shared this information with us. However I was wondering if you have tested a scenario with 4 S4810 with VLT configured on 2 x 2 and connected together (somewhere called mLAG). How do you continue to add VLT couples to the setup? I would be really happy if you could provide any info regarding such setup.
So let’s deep dive into VLT port-channel between two Force10
VLT Domains also known as mVLT. Please note that VLT can be configured not only
between two Force10 VLT domains but also between Force10 VLT domain and other
multi chassis port-channel technology like for instance CISCO virtual Port
Channel (vPC). However, this blog post is focused to single vendor solution mVLT
on DELL S-Series Switches (previously known as Force10 S-Series).
If you are not familiar with DELL Force10 VLT technology read my previous blog post where is VLT described in detail. It is really important to understand VLT before you try to understand mVLT (Multi-domain VLT). By the way mVLT is called eVLT (Enhanced VLT) in Force10 documentation so it might be little bit confusing. Anyway mVLT is nothing else then regular virtual port channel (VLT) between two VLT domains. Therefore mVLT is quite good term if you ask me.
If you are not familiar with DELL Force10 VLT technology read my previous blog post where is VLT described in detail. It is really important to understand VLT before you try to understand mVLT (Multi-domain VLT). By the way mVLT is called eVLT (Enhanced VLT) in Force10 documentation so it might be little bit confusing. Anyway mVLT is nothing else then regular virtual port channel (VLT) between two VLT domains. Therefore mVLT is quite good term if you ask me.
mVLT Logical Design
mVLT logical design is pretty straight forward. It is
required to achieve stretched L2 over two datacenters without any loops. This
topology is often called loop free topology and it is depicted on figure below
from spanning tree (STP) point of view.However we would like to have hardware and link redundancy therefore multi chassis port-channel technology (Force10 VLT in our particular case) is used to still have simple loop free topology from spanning tree point of view but with switch unit and physical link redundancy. Force10 mVLT solution is logically depicted on figure below.
Please note, that each single VLT Domain act in spanning tree as a single logical switch.
DELL highly recommends using four links between VLT domains
because of higher redundancy and optimal data flow. However, sometimes your are constraint with links between sites. Two links DCI is also supported design but
not recommended because there is obviously lower link redundancy and therefore
higher probability of communication over VLTi which adds hop and therefore latency.
Two links mVLT DCI also known as square design is depicted on figure below.
Even the topology is loop free and from logical view we have just one switch on each datacenter spanning tree protocol should be enabled and configured just in case of human error or VLT domain failure or split. Rapid Spanning Tree (RSTP) protocol is good enough therefore used later in physical configurations.
mVLT Physical Design and Configuration
Physical design below shows connectivity of four (2x two) Force10
S4810 switches leveraging four links for DCI port-channel (mVLT).
Physical design for just two links DCI is depicted on following
schema.
And switch configuration snippets for four links mVLT are listed
below for completeness. Two link DCI is just variation of similar
configurations so you can simply reuse and slightly change four link configuration.
DCA-SWCORE-A – acts
as primary Root Bridge in RSTP in case of loop
!
hostname DCA-SWCORE-A
!
protocol spanning-tree rstp
no disable
hello-time 1
max-age 6
forward-delay 4
bridge-priority 4096
!
vlt domain 1
peer-link port-channel
128
back-up destination 172.16.201.2
primary-priority 1
system-mac mac-address 02:00:00:00:00:01
unit-id 0
peer-routing
!
proxy-gateway lldp
peer-domain-link
port-channel 127
!
interface TenGigabitEthernet 0/46
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface TenGigabitEthernet 0/47
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface fortyGigE 0/56
no ip address
mtu 12000
no shutdown
!
interface fortyGigE 0/60
no ip address
mtu 12000
no shutdown
!
interface ManagementEthernet 0/0
ip address 172.16.201.1/24
no shutdown
!
interface Port-channel 127
description "mVLT
- interconnect link"
no ip address
mtu 12000
switchport
vlt-peer-lag
port-channel 127
no shutdown
!
interface Port-channel 128
description "VLTi
- interconnect link"
no ip address
mtu 12000
channel-member
fortyGigE 0/56,60
no shutdown
!
DCA-SWCORE-B – acts as secondary Root Bridge in RSTP in
case of loop
!
hostname DCA-SWCORE-B
!
protocol spanning-tree rstp
no disable
hello-time 1
max-age 6
forward-delay 4
bridge-priority 8192
!
vlt domain 1
peer-link port-channel
128
back-up destination
172.16.201.1
primary-priority 8192
system-mac mac-address 02:00:00:00:00:01
unit-id 1
peer-routing
!
proxy-gateway lldp
peer-domain-link
port-channel 127
!
interface TenGigabitEthernet 0/46
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface TenGigabitEthernet 0/47
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface fortyGigE 0/56
no ip address
mtu 12000
no shutdown
!
interface fortyGigE 0/60
no ip address
mtu 12000
no shutdown
!
interface ManagementEthernet 0/0
ip address 172.16.201.2/24
no shutdown
!
interface Port-channel 127
description "mVLT
- interconnect link"
no ip address
mtu 12000
switchport
vlt-peer-lag
port-channel 127
no shutdown
!
interface Port-channel 128
description "VLTi
- interconnect link"
no ip address
mtu 12000
channel-member
fortyGigE 0/56,60
no shutdown
!
DCB-SWCORE-A – acts
as tertiary Root Bridge in RSTP in case of loop
!
hostname DCB-SWCORE-A
!
protocol spanning-tree rstp
no disable
hello-time 1
max-age 6
forward-delay 4
bridge-priority 12288
!
vlt domain 2
peer-link port-channel
128
back-up destination 172.16.202.2
primary-priority 1
system-mac mac-address 02:00:00:00:00:02
unit-id 0
peer-routing
!
proxy-gateway lldp
peer-domain-link
port-channel 127
!
interface TenGigabitEthernet 0/46
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface TenGigabitEthernet 0/47
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface fortyGigE 0/56
no ip address
mtu 12000
no shutdown
!
interface fortyGigE 0/60
no ip address
mtu 12000
no shutdown
!
interface ManagementEthernet 0/0
ip address 172.16.202.1/24
no shutdown
!
interface Port-channel 127
description "mVLT
- interconnect link"
no ip address
mtu 12000
switchport
vlt-peer-lag
port-channel 127
no shutdown
!
interface Port-channel 128
description "VLTi
- interconnect link"
no ip address
mtu 12000
channel-member
fortyGigE 0/56,60
no shutdown
!
DCB-SWCORE-B – acts
as quaternary Root Bridge in RSTP in case of loop
!
hostname DCB-SWCORE-B
!
protocol spanning-tree rstp
no disable
hello-time 1
max-age 6
forward-delay 4
bridge-priority 16384
!
vlt domain 2
peer-link port-channel
128
back-up destination 172.16.202.1
primary-priority 8192
system-mac mac-address 02:00:00:00:00:02
unit-id 1
peer-routing
!
proxy-gateway lldp
peer-domain-link
port-channel 127
!
interface TenGigabitEthernet 0/46
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface TenGigabitEthernet 0/47
no ip address
mtu 12000
!
port-channel-protocol
LACP
port-channel 127 mode
active
dampening 10 100 1000
60
no shutdown
!
interface fortyGigE 0/56
no ip address
mtu 12000
no shutdown
!
interface fortyGigE 0/60
no ip address
mtu 12000
no shutdown
!
interface ManagementEthernet 0/0
ip address 172.16.202.2/24
no shutdown
!
interface Port-channel 127
description "mVLT
- interconnect link"
no ip address
mtu 12000
switchport
vlt-peer-lag
port-channel 127
no shutdown
!
interface Port-channel 128
description "VLTi
- interconnect link"
no ip address
mtu 12000
channel-member
fortyGigE 0/56,60
no shutdown
!
Conclusion
Force10 mVLT is great technology for loop free L2 network topology. It can be leveraged for local loop free topologies inside single datacenter or as L2 extension between datacenters. However our networks are usually built to support IP traffic therefore L3
considerations has to be addressed as well. Just think about default IP gateway
behavior and potential DCI potential trombone. That’s where other VLT features peer-routing and proxy-gateway come in to play and mitigate DCI trombone issue. You can see these technologies configured in VLT configurations above. But that’s another topic for another blog post.
To be absolutely honest I personally don't recommend L2 interconnects between datacenters without any good justification. I strongly recommend L3 datacenter interconnects and when stretched L2 is needed then some network overlay technology can be leveraged. L3 will guarantee independent availability zones and splitting L2 failure domain. But on the other hand such network overlay needs some other bits and pieces which in some cases increase complexity and cost. Therefore mVLT can be seriously considered for cost effective datacenter L2 extensions. That's a typical "it depends" scenario where these two design decision options has to be compared and final decision clearly justified.
If you want to know more about these technologies or use cases just ask and we can go deeper or
broader. And as always any feedback and/or comment is highly appreciated.
19 comments:
Thank you very much for this post David.
Hello David,
What mac-address should be placed in the vlt domain config ?
Is it just a formal mac-address or ?
Thanks
Command "system-mac mac-address" is optional and Dell Networking OS automatically creates a VLT-system MAC address used for internal system operations.
Explicit configuration minimize the time required for the VLT system to synchronize the default MAC address of the VLT domain on both peer switches when one peer switch reboots.
VLT-system MAC address is used just for internal system operations therefore any MAC address can be used.
Read my blog post "Locally Administered Address Ranges" at http://blog.igics.com/2014/05/locally-administered-address-ranges.html to deside what MAC adresses to choose.
Hi David, why do we need mVLT if both VLT domains are already loop free? VLT domain 1 and VLT domain 2 are seen as a single switch. So what is stopping us from just creating a simple VLT LAG between the two?
VLT is nothing else then Dell terminology for LAG. Actually Multi Chassis LAG (aka MLAG).
Single VLT domain are still two switches in the network but in STP topology they appears like single logical switch (single node).
When you have two VLT domains they appear in STP topology as two logical switches (two nodes).
mVLT is nothing else then VLT (aka MLAG) between these two logical switches (four physical chassis, two in each VLT domain).
To make it more complicated mVLT also use term eVLT in Dell/Force10 documentation. mVLT and eVLT are two names for the same thing.
Now, Dell has some enhancement on mVLT for L3 traffic optimization which can be useful for some use cases. Dell call it mVLT Proxy Gateway. It is based on ProxyARP. That's the only difference between normal VLT (MLAG) and mVLT (MLAG).
When you interconnect Force10 VLT Domain with Cisco vPC Domain (pair of Nexus switches) interconnect is VLT (Dell MLAG) on Dell side and vPC (Cisco MLAG) on Cisco side.
Does it make sense now?
So if I don't need mVLT Proxy Gateway, can I use VLT instead of mVLT betwern two two-switch VLT domains?
Short answer is yes.
Longer answer:
mVLT Proxy Gateway is L3 function. So if you don't need it don't use it.
If you are interested only in L2 you can use just multi-chassis LAG (port-channel) between two VLT domains. You call it VLT between two VLT domains. It is perfectly fine. Just FYI, it is exactly what mVLT means in Dell terminology. And to confuse it even more, in Dell documentation you can also see term eVLT which is just another name for mVLT.
VLT stands for "virtual link trunking" where trunk means LAG (port-channel) but because Virtual Port-Channel (vPC) is Cisco term I assume that Dell (actually Force10 company) used term trunk which is used by HP/3COM for port-channel. In my opinion, the most precise term would be MLAG (multi chassis link aggregation) but it is also used by some other vendors.
mVLT stands for "multi domains VLT"
eVLT stands for Enhanced VLT
Don't blame me for these different terms from different vendors. It seems to me that each vendor wants to be somehow unique which confuse all practitioners in the field.
Hope now it is absolutely clear.
Hi David, yes I know the terminology is a bit confusing. I just wanted to confirm the benefits of using mVLT over VLT for connecting two VLT domains. I guess at the end of the day it doesn't really matter. I can set up a mVLT and if I don't need L3 I just won't be using any of the mVLT enchancements.
Thanks again for the comprehensive answers!
Yes Nick. That's absolutely fair statement.
Hi David,
Do you have any insight how lacp restores the connection on mvlt topology whenever incident occurs on one of the vlt pair like reboot/hw failure? I've a similar mvlt/evlt setup with pvst and during my tests I've seen 6-10 no ping reply which I was expecting 2-3 max especially in full mesh mvlt topology.
Thanks,
David, thanks a lot for this.
Question on VLAN, for example vlan 2 has untagged portchannel 127. There is no way to "add" pc 127 to another valn?
Thanks
Frank
@Anonymous: Of course, there is way how to manage VLANs on PortChannel 127. You can manage VLAN tagging as usual. You have to go to config mode of particular VLANs and configure it as tagged or untagged on particular interface. See. http://blog.igics.com/2015/07/dell-force10-interface-configuration.html for further information.
Hey Dave,
thanks a lot for responing so fast.
I´ll have a look :-)
Frank
Hi David, can we use this kind of architecture to create disaster recovery solution? I wonder if we can stretch layer 2 between two datacenters and using peer-routing and proxy-gateway achieve HA for VMs. I'm not sure which address should I use as a default gateway for VM.
Peter
Hi Peter. Yes, you can but there are some buts as always and you have to know what is the goal, what you addressing and how the used technology really works.
It also depends if you want it as DR or cross-site/geo HA.
DR = recoverability
HA = availability
these two design qualities are different as I'm trying to explain here
https://www.slideshare.net/davidpasek/metro-cluster-high-availability-or-srm-disaster-recovery-69964166
Btw, DR is not only about technical aspects but mainly about the recoverability process in particular organization.
Anyway, from a technical point of view here are few buts ...
BUT #1
Stretched L2 is not the best approach to two DR sites because you will join two fault domains together. Yes, one L2 is IMHO single fault domain because of spanning-tree, broadcast storm, unknown unicast flooding, hair-pinning (aka tromboning), etc.
BUT #2
I designed and tested mVLT as HA/DR solution two one customer a few years ago and during pre-production tests I have realized that dynamic mVLT configuration works nice but when both core switches where default gateway is configured fails, the routing does not work even you can ping default gateway IPs. I have been told that static mVLT configuration would solve it but never tested.
BUT #3
mVLT is proxy ARP solution so you have 4 IP addresses (one per each switch) which can be used as a default gateway for end-points. Let's say you use IP addresses 192.168.100.1 (SW-A1), 192.168.100.2(SW-A2), 192.168.100.3 (SW-B1), 192.168.100.4 (SW-B2) on your four switches and 192.168.100.1 is configured on end-point devices as DEFAULT GW. When SW-A1 is down, other three switches can do the work on behalf of SW-A1 for running devices but not for newly spin up devices. Why? Because existing devices already know MAC address of default gw (192.168.100.1) so they know to which MAC address send the traffic and other three switches can work on behalf of SW-A1 but newly booted device is trying to resolve MAC address of IP 192.168.100.1 but this is not what mVLT does. It does not reply to ARP requests.
CONCLUSION
You can use it for DR and HA but you must know what scenarios this technology address out-of-the-box and what scenarios has to addressed by some other orchestration tool or other technology. I personaly believe Dell Force10 for L2 + VMware NSX for L3 services is better way to go but even this combination must be tested carefully.
NSX would give you L2 over L3 between sites and BGP or OSPF for L3 fail-over.
If you cannot use NSX, I believe another possible solution is to use mVLT for L2 and physically separated upstream routers with VRRP on top of Force10 Switches. Unfortunately, VRRP within Force10 mVLT was not supported back in the days. To be honest, I do not know what is the latest status.
David.
Hi, how would you connect 4 or more VLTi domains togheter? Do they all share the same port channel, or is there one port channel for each pair?
Hi Anonymous,
I think you mean - "how to connect 4 or more VLT domains". VLTi is just an interconnect between two switch chassis.
Each VLT domain is a single logical switch from the logical topology point of view, therefore there is one port-channel (mVLT in Force10 terminology) between each two VLT domains.
Hope this helps.
Post a Comment