One my philosophical rule is "Trust, but Verify". Design Verification Test Plan is good approach to be sure how the system you have designed behaves. Typical design verification test plan contains Usability, Performance and Reliability tests.
Force10 VLT domain configuration is actually two node cluster (the system) providing L2/L3 network services. What network services your VLT domain should provides depends on customer requirements. However typical VLT customer requirement is to have high availability and eliminate network down times when some system component fails or is maintained by administrator. Planing and executing reliability tests is good approach to verify that customer's high availability requirements have been achieved.
Bellow are some reliability tests I'm thinking are worth to execute and when my gear will be back in my lab I'll try to find some time and execute tests described below and publish real test results.
If you know about some other tests which make sense to perform, please don't be shy, leave the comment and I'll do it for you.
Test #1
Description:
Simulate VLT Domain secondary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Power Off secondary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result: 
TBD
Test #2
Description:
Simulate VLT Domain primary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Power Off primary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD
Test #3
Description:
Simulate one link from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out one cable participating in VLTi static port-channel
Measure network disruption
Expected Results:
VLT Domain should be still working without traffic impact.
Test Result:
TBD
Test #4
Description:
Simulate all links from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out all cables participating in VLTi static port-channel
Measure network disruption
Expected Results:
Backup link should act as arbiter. VLT Domain should be still working but in split brain mode and only primary VLT node should handle the traffic.
Test Result:
TBD
TBD
Test #2
Description:
Simulate VLT Domain primary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Power Off primary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD
Test #3
Description:
Simulate one link from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out one cable participating in VLTi static port-channel
Measure network disruption
Expected Results:
VLT Domain should be still working without traffic impact.
Test Result:
TBD
Test #4
Description:
Simulate all links from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out all cables participating in VLTi static port-channel
Measure network disruption
Expected Results:
Backup link should act as arbiter. VLT Domain should be still working but in split brain mode and only primary VLT node should handle the traffic.
Test Result:
TBD
Test #5
Description:
Simulate VLT Domain backup link failure. Backup link configured as IP heartbeat over out-of-band management.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out cable participating in backup link
Measure network disruption
Expected Results:
All traffic should work correctly but VLT should report backup link failure.
Test Result:
TBD
Test #6
Description:
Simulate one link failure on some virtual link trunk (aka VLT or virtual port-channel).
Tasks:
Use system A and system B both connected via VLT link to VLT Domain
Ping from system A to system B at least 10x per second
Pull out cable participating in VLT
Measure network disruption
Expected Results:
Port channel should survive this failure.
Test Result:
TBD
These six tests should verify basic high availability and resiliency of Force10 VLT cluster.
All problems should be notified by SNMP and/or syslog to central monitoring system in case it is configured properly. That can move us to Usability Tests .... but that's another set of tests ...
And please remember that TOO MUCH TESTING WOULD NEVER BE ENOUGH :-)
 
2 comments:
Hi David. Just wondering if you post the results somewhere?
I am new to VLT and I already read everything you've posted here about
VLT. VERY HELPFUL!!!
Hi Anonymous.
First of all, thanks for your comment.
I do not publish any test results I did on customer projects as there are specific customer's information like IP addresses, personal informations, comments, etc.
This blog post was just general recommendations to prepare and execute tests to validate design and particular implementation before it goes in to production. All expected results should be achievable in ideal situation. Test results are results in particular environment and if result is not as expected architect or implementer should understand why. It can be because of bad design, bad implementation, some physical/link error, some human error, bad test assumptions, etc...
Hope this is make sense.
Post a Comment