Friday, October 17, 2014

vCenter, Windows 2012 R2, .NET 3.5 issue


It is well know that vCenter Server 5.5 requires .NET Framework 3.5. It is quite easy to install it by Server Manager GUI or by following command: 
dism /online /enable-feature /featurename:NetFX3 /all /Source:d:\sources\sxs /LimitAccess
Command above assumes Windows 2012 DVD in drive d:
 
... but i had an issue with installation getting following error.
PS C:\Users\Administrator> dism /online /enable-feature /featurename:NetFX3 /all /Source:d:\sources\sxs /LimitAccess

Deployment Image Servicing and Management tool
Version: 6.3.9600.17031

Image Version: 6.3.9600.17031

Enabling feature(s)
[===========================66.4%======                    ]

Error: 0x800f081f

The source files could not be found.
Use the "Source" option to specify the location of the files that are required to restore the feature. For more informat
ion on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.

The DISM log file can be found at C:\Windows\Logs\DISM\dism.log
PS C:\Users\Administrator>

I discuss this issue with our Microsoft Specialist and he already knew the root cause and fix. The root cause was some bad Windows update. It is already fixed by Microsoft and if you didn't do update in bad time you should not experience this issue. However, when you hit this bug the only solution is to run following Microsoft fix.  
NDPFixit-KB3005628-X64.exe

Some more information about this issue:

HowTo

Monday, October 13, 2014

Fibre Channel NPV and NPIV

I'm often asked by customers and colleagues what is the difference between NPV and NPIV. I don't want to write information which are already well written and explain by someone else. So please read this Tony Bourke blog post which is IMHO very well written.

Just quick summary.

NPV is CISCO term doing the same thing like Brocade Access Gateway or DELL Force10 NPG (NPIV Proxy Mode). All these technologies put the Fibre Channel switch in to the mode where they don't have Fibre Channel Domain ID and therefore works like absolutelly transparent Fibre Channel multiplexer or intelligent pass-through if you wish. It significantly simplified SAN architectures and multivendor interoperability.

NPIV is the feature allowing Fibre Channel switch operates more FCIDs over single fibre channel switch port. So it effectively allows aggregation of more Fibre Channel Nodes (N-Port IDs) per single FC link.

Friday, October 10, 2014

Did you know? Mixing of FCoE and iSCSI on the same converged fabric...

Mixing of FCoE and iSCSI on the same converged fabric is not recommended and not supported by Dell.

Tuesday, September 16, 2014

Compellent Storage Center Live Volume and vSphere Metro Cluster

Are you interested in metro clusters (aka stretched clusters)?

Watch this video which introduces the new Synchronous Live Volume features available in Dell Compellent Storage Center 6.5.

And if you need more technical deep dive use this guide focuses on two main data protection and mobility features available in Dell Compellent Storage Center: synchronous replication and Live Volume. In this paper, each feature is discussed and sample use cases are highlighted where these technologies fit independently or together.

Compellent Live Volume curretnly doesn't support automated fail-over based on arbiter on third site so that's the reason why it is not certified as VMware vSphere Metro Cluster storage. Certification is just a matter of time. However, you can leverago Compellent Live Volume with vSphere. The only drawback is that whole storage node fail-over has to be done manually which can be enough or preferred method in some environments.


Wednesday, September 10, 2014

Tool for Network Assessment and Documentation

Do you need tool for Automated Network Assessment and Documentation? Try NetBrain and let me know how do you like it. I'm writing this tool to my todo list I need to test in my lab so I'll write another blog post after test.

NetBrain's deep network discovery will build a rich mathematical model of the network’s topology and underlying design. The data collected by the system is automatically embedded within every diagram and exportable to MS Visio, Word, or Excel.

NetBrain Personal Edition is the totally free version of NetBrain. It will let you discover up to 20 network devices and will never expire.

iSCSI and Ethernet

Each manufacturer of Ethernet switch may implement features unique to their specific model. Below are some general tips to look for when implementing an iSCSI network infrastructure. Each tip may or may not apply to a specific installation. Be aware that this is list is inspired by DELL Compellent iSCSI bets practices and it is not an all-inclusive list.
  • Bi-Directional Flow Control enabled for all Switch Ports that carry iSCSI traffic, including any inter switch links.
  • Separate networks or VLANs from data.
  • Separate iSCSI traffic multi-path traffic also.
  • Unicast storm control disabled on every switch that handles iSCSI traffic.
  • Multicast disabled at the switch level for any iSCSI VLANs - Multicast storm control enabled (if available) when multicast cannot disabled.
  • Broadcast disabled at the switch level for any iSCSI VLANs - Broadcast storm control enabled (if available) when broadcast cannot disabled.
  • Routing disabled between regular network and iSCSI VLANs - Use extreme caution if routing any storage traffic, performance of the network can be severely affected. This should only be done under controlled and monitored conditions.
  • Disable Spanning Tree (STP or RSTP) on ports which connect directly to end nodes (the server or Dell Compellent controller's iSCSI ports.) You can do it by enabling PortFast or EdgePort option  on these ports so that they are configured as edge ports.
  • Ensure that any switches used for iSCSI are of a non-blocking design.
  • Hard set for all switch ports and server ports for Gigabit Full Duplex if applicable.
  • When deciding which switches to use, remember that you are running SCSI traffic over it. Be sure to use a quality managed enterprise class networking equipment. It is not recommended to use SBHO (small business/home office) class equipment outside of lab/test environments.
Do you want configuration examples for DELL PowerConnect and DELL Force10 switches? Leave a comment with particular switch model and firmware version and I'll try my best to prepare it for you.


Tuesday, September 09, 2014

DELL Force10 switch and NIC Teaming

NIC teaming is a feature that allows multiple network interface cards in a server to be represented by one MAC address and one IP address in order to provide transparent redundancy, balancing, and to fully utilize network adapter resources. If the primary NIC fails, traffic switches to the secondary NIC because they are represented by the same set of addresses.

Let's assume we have the host with two NICs where primary NIC is connected to Force10 switch port 0/1 and secondary NIC to switch port 0/5. When you use NIC teaming, consider that the server MAC address is originally learned on Port 0/1 of the switch and Port 0/5 is the failover port. When the NIC fails, the system automatically sends an ARP request for the gateway or host NIC to resolve the ARP and refresh the egress interface. When the ARP is resolved, the same MAC address is learned on the same port where the ARP is resolved (in the previous example, this location is Port 0/5 of the switch). To ensure that the MAC address is disassociated with one port and re-associated with another port in the ARP table, configure the
mac-address-table station-move refresh-arp 
command on the Dell Networking switch at the time that NIC teaming is being configured on the server.

! NOTE: If you do not configure the mac-address-table station-move refresh-arp command, traffic continues to be forwarded to the failed NIC until the ARP entry on the switch times out.

UPDATE 2015-03-16:
I have just discovered another FTOS command ...
arp learn-enable
   Enable ARP learning using gratuitous ARP.

NIC Teaming solutions can leverage gratuitous ARP so it is worth to enable it in my opinion.

This command should be very beneficial on VMware environments where VMware vSwitch sends gratuitous ARP after VM is vMotioned from one ESXi host to another.
ESXi host doesn't use gratuitous arp but reverse arp (aka RARP). Anyway these two commands are beneficial for VMware vMotion.



Monday, September 08, 2014

Redirect ESXi syslog and coredump over network

Let's assume we have syslog server on IP address [SYSLOG-SERVER] and coredump server at [COREDUMP-SERVER]. Here are CLI commands how to quickly and effectively configure network redirection.

REDIRECT SYSLOG  
esxcli system syslog config set --loghost=udp://[SYSLOG-SERVER]
esxcli network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli network firewall refresh
esxcli system syslog reload
VERIFY SYSLOG SETTINGS

esxcli system syslog config get


REDIRECT COREDUMP
esxcli system coredump network set --interface-name vmk0 --server-ipv4 [COREDUMP-SERVER] --server-port 6500
esxcli system coredump network set --enable true
VERIFY COREDUMP SETTINGS 
esxcli system coredump network check
By the way, do you know that VMware vCenter Server Appliance works as syslog and coredump server? So why not use it? It is free of charge.

vCenter Log Insight is much better syslog server because you can easily search centralized logs and do some advanced analytic however that's another topic.

Sunday, September 07, 2014

vSphere HA Cluster Redundancy

All vSphere administrators and implementers know how easily vSphere HA Cluster can be configured. However sometimes quick and simple configuration doesn't do exactly what is expected. You can, and typically you should, enable Admission Control in vSphere HA Cluster configuration settings. VMware vSphere HA Admission Control is control mechanism checking if another VM can be powered on in HA enabled cluster and still satisfy redundancy requirement. So far so good however complexity starts from here because you have several options what algorithm you will use to fulfill your spare capacity redundancy requirement. So what options do you have?

Admission Control can be configured for following three algorithms:
  1. Define fail-over capacity by static number of hosts
  2. Define fail-over capacity by reserving a percentage of cluster resources
  3. Use dedicated fail-over hosts
Let's deep dive into each option ...

Algorithm 1 is generally N+X host redundancy 
When N+X redundancy is required most vSphere designers go with this option because it looks like most suitable choice. However, it is important to know that this particular algorithm is working with HA Slot Size. HA Slot Size is calculated based on defined reservations on powered VMs. If you don't use CPU/MEM reservations per VM than default reservation values (32 MHz, memory virtualization overhead)  are used for HA Slot Size calculation. By the way, VMware recommends to set  reservations per resource pools and not per VM so there is relatively high probability you don't have VM reservations and you will have very low HA Slot Size which means that Admission Control will allow to power on lot of VMs which introduce high resource over-allocation and your N+1 redundancy can significantly suffer. On the other hand, if you have just one VM with huge CPU/MEM reservations it can significantly impact and skew HA Slot Size with a negative impact on your VM consolidation ratio.  

How can we solve this problem? One solution is HA Cluster Advanced Options described below.

Maximum HA Slot size can be limited to two following advanced options.
  • das.slotcpuinmhz - Defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered-on virtual machine in the cluster.
  • das.slotmeminmb - Defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster.
It helps in a situation when you have one VM with high CPU or RAM reservations. Such VM will not increase HA Slot Size but it consumes smaller HA Slots.

Default VM reservation values for HA slot calculation can be defined by another two advanced options.
  • das.vmcpuminmhz - Defines the default CPU resource value assigned to a virtual machine if its CPU reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 32MHz.
  • das.vmmemoryminmb - Defines the default memory resource value assigned to a virtual machine if its memory reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 0 MB.
Default VM reservation values can help you to define HA Slot Size you want but it doesn't automatically correspond with required overbooking and planed spare fail-over capacity because HA Slot Size is not proportional to VM sizes on a particular cluster. If you really want to have one real spare host fail-over capacity you have to go with option 3 (Use dedicated fail-over hosts).

Algorithm 2 : percentage cluster spare capacity
This algorithm doesn't use HA Slot size but it simply calculates total cluster CPU/MEM resources and decrease these cluster resources by spare capacity defined in percentage.  The rest of cluster available resources is also decreased by powered on VM reservations and new VMs can be powered on only when some cluster resources are available. Quite clear and simple, right? However, it also requires to have VM reservations otherwise you will end up with over-allocated cluster and your overbooking ratio will be too high which can introduce some performance issues. So once again, if you really want to have one real spare host fail-over capacity without dealing with VM reservations the best way is to go with option 3 (Use dedicated fail-over hosts).
Note that algorithm 2 doesn't use HA Cluster Advanced Options related to HA Slot mentioned above. However das.vmCpuMinMHz and das.vmMemoryMinMB can be used  to set default reservations. For more details read this.

Algorithm 3 : dedicated fail-over hosts
This algorithm simply dedicates specified hosts to be unused during normal conditions and used only in case of ESXi host failure. Multiple fail-over dedicated hosts are supported since vSphere 5.0. This algorithm will keep your capacity and performance absolutely predictable and independent on VM reservations. You'll get exactly what you configure.

UPDATE 2018-01-09: for some additional details about dedicated fail-over hosts read the blog post Admission Control - Dedicated fail-over hosts.

CONCLUSION
So what option to use? The correct answer is, as usually , ...  it depends :-)   

However, if VM reservations are not used and absolutely predictable N+X redundancy is required I currently recommend Option 3.

If you have a mental problem with not using some ESXi host during non-degraded cluster state (isn't it exactly what is required?) I recommend Option 1 but VM reservations must be used to have a realistic size of HA Slot. In this options, artificial HA Slot can be designed leveraging advanced options.

If you don't want elaborate with HA Slot and use all ESXi hosts in the cluster you can use Option 2 but VM reservations must be used for some capacity guarantee to avoid high overbooking ratio.

FEATURE REQUEST
It would be great if VMware vSphere has some kind of Cluster Reservation policy for VMs. For example, if you want to guarantee cluster resources overbooking 2:1 you would set up 50% CPU and 50% RAM reservations for each VM running in HA Cluster. This policy should be dynamic so if someone changes VM size from CPU or RAM perspective reservations would be recalculated automatically.

Let's break down our example above. We are assuming following HA CLUSTER RESERVATION POLICY => CPU 50%, RAM 50% assigned to our HA Cluster. Let's powered on VM with 2x vCPUs and 6GB RAM. Dynamic reservation calculation is quite easy from RAM perspective because memory reservation would be 3GB (50% from 6GB). It is a little bit more complicated from CPU reservation perspective. CPU dynamic reservation has to be calculated based on physical CPU where VM is running. So let's assume we have Intel Xeon E5-2450 @ 2.1GHz. So 50% from 2.1GHz is 1.05GHz but we have 2 vCPUs so we have to multiply it by 2. Therefore dynamic CPU reservation for our VM is 2.1GHz.  I believe with such dynamic reservation policy we would be able to guarantee overbooking ratio and define cluster redundancy more predictable from overbooking and performance degradation point of view.

FEEDBACK 
I would like to know what is your preferred HA Cluster Admission Control setting. So, don't hesitate to leave a comment and share your thoughts with the community. Any feedback is very welcome and highly appreciated. 

Friday, September 05, 2014

EVO:RAIL Introduction Video

EVO:RAIL introduction video is quite impressive. Check it your self at

https://www.youtube.com/watch?v=J30zrhEUvKQ

I'm really looking forward for first EVO:RAIL implementation.

Friday, August 29, 2014

How to clear all jobs on DELL Lifecycle Controller via iDRAC

When you have problem with DELL Lifecycle Controller jobs you can delete all jobs by single iDRAC command. This command

racadm -r ip address -u user name -p password jobqueue delete -i JID_CLEARALL_FORCE

deletes all of the jobs plus the orphaned pending and restarts the data manager service on the iDRAC. It will take about 90-120 secs before the iDRAC is able to process another job.

Occasionally the iDRAC may need to be reset after issuing the above command due to other issues outside the job queue processing. In such case you can issue commands in the following order:
  • racadm -r ip address -u user name -p password jobqueue delete -i JID_CLEARALL_FORCE
  • Wait 120 secs
  • racadm –r ip address –u user name –p password racreset




Sunday, August 17, 2014

Network communications between virtual machines

I was contacted by colleague of mine who pointed to very often mentioned statement about network communication between virtual machines on the same ESXi host. One of such statement is cited below.
"Network communications between virtual machines that are connected to the same virtual switch on the same ESXi host will not use the physical network. All the network traffic between the virtual machines will remain on the host."
He was discussing this topic within his team and even they are very skilled virtualization administrators they had doubts about real behavior. I generally agree with statement above but it is actually correct statement only in specific situation when virtual machines are in the same L2 segment (the same broadcast domain - usually VLAN).

Figure 1 - L3 routing on physical network
I've prepared the drawing above to explain real behavior clearly. Network communication between VM1 and VM2 will stay on the same ESXi host because they are in the same L2 segment however communication between VM1 and VM3 has to go to physical switch (pSwitch)  to be routed between VLAN 100 and VLAN 200 and return back to ESXi host and VM3.

Discussed statement above can be slightly reformulated to be always correct.
"Network communications between virtual machines that are connected to the same virtual switch portgroup on the same ESXi host will not use the physical network. All the network traffic between the virtual machines will remain on the host."
Both VMware standard and also distributed vSwitch are dump L2 switches so L3 routing must be done somewhere else, typically on physical switches. However there can be two scenarios when even L3 traffic between virtual machines on  same ESXi host can stay there and not use physical network.

First scenario is when L3 routing is done on virtual machine running on top of the same ESXi host. Examples of such virtual routers are VMware's vShield Edge, Brocade's  Vyatta, CISCO's CSR, open source router pfSense, or some other general OS with routing services. This scenario, also known as network function virtualization, is depicted on Figure 2.

Figure 2 - L3 routing on virtual machine (Network Function Virtualization)
It is worth to mention that L3 traffic between VM5 and VM6 will go through physical network because L3 router is on another ESXi host.

Second scenario is when distributed virtual router like VMware's NSX is used. This scenario is depicted on Figure 3. In this scenario, L2 and L3 traffic of all virtual machines running on same ESXi host is optimized and will remain on the host without physical network usage.

Figure 3 - Distributed Virtual Routing (VMware NSX)

So in our particular scenario L2 and L3 network communication among VM1, VM2, VM3 and VM4 will stay on the same ESXi host. The same apply to VM5 and VM6.

Hope, I've covered all possible scenarios and this blog post will be helpful to others during similar discussions in virtualization teams. And as always, comments are very welcome.

Tuesday, July 29, 2014

Force10 VLT - Design Verification Test Plan


One my philosophical rule is "Trust, but Verify". Design Verification Test Plan is good approach to be sure how the system you have designed behaves. Typical design verification test plan contains Usability, Performance and Reliability tests.

Force10 VLT domain configuration is actually two node cluster (the system) providing L2/L3 network services. What network services your VLT domain should provides depends on customer requirements. However typical VLT customer requirement is to have high availability and eliminate network down times when some system component fails or is maintained by administrator. Planing and executing reliability tests is good approach to verify that customer's high availability requirements have been achieved.

Bellow are some reliability tests I'm thinking are worth to execute and when my gear will be back in my lab I'll try to find some time and execute tests described below and publish real test results.
 
If you know about some other tests which make sense to perform, please don't be shy, leave the comment and I'll do it for you.

Test #1
Description: 
Simulate VLT Domain secondary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Power Off secondary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD

Test #2
Description: 
Simulate VLT Domain primary node failure impact on Ethernet traffic. How long (in ms) is traffic disrupted?
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Power Off primary VLT node
Measure network disruption
Expected Results:
It should be sub second failure.
Test Result:
TBD

Test #3
Description: 
Simulate one link from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out one cable participating in VLTi static port-channel
Measure network disruption
Expected Results:
VLT Domain should be still working without traffic impact.
Test Result:
TBD
Test #4
Description: 
Simulate all links from VLTi (ISL) port-channel failure.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out all cables participating in VLTi static port-channel
Measure network disruption
Expected Results:
Backup link should act as arbiter. VLT Domain should be still working  but in split brain mode and only primary VLT node should handle the traffic.
Test Result:
TBD

Test #5
Description: 
Simulate VLT Domain backup link failure. Backup link configured as IP heartbeat over out-of-band management.
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out cable participating in backup link
Measure network disruption
Expected Results:
All traffic should work correctly but VLT should report backup link failure.
Test Result:
TBD 
Test #6
Description: 
Simulate one link failure on some virtual link trunk (aka VLT or virtual port-channel).
Tasks:
Use system A and system B both connected via VLT link to VLT Domain  
Ping from system A to system B at least 10x per second
Pull out cable participating in VLT
Measure network disruption
Expected Results:
Port channel should survive this failure.
Test Result:
TBD

These six tests should verify basic high availability and resiliency of Force10 VLT cluster. 

All problems should be notified by SNMP and/or syslog to central monitoring system in case it is configured properly. That can move us to Usability Tests .... but that's another set of tests ...

And please remember that TOO MUCH TESTING WOULD NEVER BE ENOUGH :-)

Wednesday, July 23, 2014

CISCO UDLD alternative on Force10

I've been asked by one DELL System Engineer if we support CISCO's UDLD feature because it was required in some RFI. Well, DELL Force10 Operating System have similar feature solving the same problem and it is called FEFD.

Here is the explanation from FTOS 9.4 Configuration Guide ...

FEFD (Far-end failure detection) is supported on the Force10 S4810 platform. FEFD is a protocol that senses remote data link errors in a network. FEFD responds by sending a unidirectional report that triggers an echoed response after a specified time interval. You can enable FEFD globally or locally on an interface basis. Disabling the global FEFD configuration does not disable the interface configuration.

Figure caption: Configuring Far-End Failure Detection

The report consists of several packets in SNAP format that are sent to the nearest known MAC address. In the event of a far-end failure, the device stops receiving frames and, after the specified time interval, assumes that the far-end is not available. The connecting line protocol is brought down so that upper layer protocols can detect the neighbor unavailability faster.

Update 2015-05-20:
If I understand it correctly CISCO's UDLD main purpose is to detect potential uni-directional links and mitigate the risk of loop in the network because STP cannot help in this scenario. Force10 has another feature to prevent a loop in such situation - STP loop guard.

The STP loop guard feature provides protection against Layer 2 forwarding loops (STP loops) caused by a hardware failure, such as a cable failure or an interface fault. When a cable or interface fails, a participating STP link may become unidirectional (STP requires links to be bidirectional) and an STP port does not receive BPDUs. When an STP blocking port does not receive BPDUs, it transitions to a Forwarding state. This condition can create a loop in the network.

Sunday, July 13, 2014

Heads Up! VMware virtual disk IOPS limit bad behavior in VMware ESX 5.5

I've been informed about strange behavior of  VM virtual disk IOPS limits by one my customer for whom I did vSphere design recently. If you don't know how VM vDisk IOPS limits can be useful in some scenarios read my another blog post - "Why use VMware VM virtual disk IOPS limit?". And because I designed this technology for some of my customers they are very impacted by bad vDisk IOPS limit behavior in ESX 5.5

I've tested VM IOPS limits in my lab to see it by myself. Fortunately I have two labs. Older vSphere 5.0 lab with Fibre Channel Compellent storage and newer vSphere 5.5 lab with iSCSI storage EqualLogic. First of all let's look how it works in ESX 5.0. Same behavior is in ESX 5.1 and this behavior make perfect sense.

By default VM vDisks doesn't have limits as seen on next screen shot.


When I run IOmeter with single worker (thread) on unlimited vDisk I can achieve 4,846 IOPS. That's what datastore (physical storage) is able to give to single thread.


When I run IOmeter with two workers (threads) on unlimited vDisk I can achieve 7,107 IOPS. That's ok because all shared storages have implemented algorithms to limit performance for threads. That's actually protection against single thread abuse of all storage performance.


Now let's try to setup SIOC to 200 IOPS limits on both vDisks on VM as depicted on  picture below.  


Due to settings above IOmeter single worker generated workload is limited to 400 IOPS (2 x 200) per whole VM because all limit values are consolidated per virtual machine per LUN. For more info look at http://kb.vmware.com/kb/1038241. So it behaves as expected because IOmeter IOPS was oscillating between 330 and 400 IOPSes as you can see in picture below.


We can observe similar behavior with two workloads.


So in ESX 5.0 lab everything works as expected. Now let's move to another lab where I have vSphere 5.5. There is iSCSI storage so first of all we will run IOmeter without vDisk IOPS limits to see maximal performance we can get. On picture below we can see that single thread is able to get 1741 IOPSes.


... and two workers can get 3329 IOPSes.


So let's setup vDisk IOPS limits to 200 IOPS limits on both vDisks on VM as in test on ESX 5.0. I have also 2 disks on this VM. Due to these settings IOmeter single worker generated workload should be also limited to 400 IOPS (2 x 200) per whole VM. But unfortunately it is not limited and it can get 2000 IOPSes. It is strange and in my opinion bad behavior.

 

But even worse behavior can be observe when there are more threads. In examples below you can see two and four workers (threads) behavior. VM is getting really slow performance.



ESX 5.5 VM vDisk behavior is really strange and because all typical OS storage workloads (even OS booting) are multi-threaded than VM vDisk IOPS limits technology is unusable. My customer has opened support request so I believe it is a bug and VMware Support will help to escalate it in to VMware engineering.

UPDATE 2014-07-14 (Workaround): 
I've tweet about this issue to   and Duncan moved me immediately in to the right direction.  He reveal me the secret ... ESXi has two disk schedulers old one and new one (aka mClock).  ESXi 5.5 uses new one (mClock) by default. If you switch back to the old one, disk scheduler behaves as expected.  Below is the setting how to switch to the old one.

Go to ESX Host Advanced Settings and set Disk.SchedulerWithReservation=0

This will switch back to the old disk scheduler.

Kudos to Duncan.

Switching back to old scheduler is good workaround which will probably appear in VMware KB but there is definitely some reason why VMware introduced new disk scheduler in 5.5. I hope we will get more information  from VMware engineering so stay tuned for more details ...

UPDATE 2015-12-3:
Here are some references to more information about disk schedulers in ESXi 5.5 and above ...

ESXi 5.5
http://www.yellow-bricks.com/2014/07/14/new-disk-io-scheduler-used-vsphere-5-5/
http://cormachogan.com/2014/09/16/new-mclock-io-scheduler-in-vsphere-5-5-some-details/
http://anthonyspiteri.net/esxi-5-5-iops-limit-mclock-scheduler/

ESXi 6
http://www.cloudfix.nl/2015/02/02/vsphere-6-mclock-scheduler-reservations/

Saturday, July 12, 2014

Why use VMware VM virtual disk IOPS limit?

What is VM IOPS limit? Here is explanation from VMware documentation ....
When you allocate storage I/O resources, you can limit the IOPS that are allowed for a virtual machine. By default, these are unlimited. If a virtual machine has more than one virtual disk, you must set the limit on all of its virtual disks. Otherwise, the limit will not be enforced for the virtual machine. In this case, the limit on the virtual machine is the aggregation of the limits for all virtual disks.
I really like this feature because VM vDisk IOPS limit is excellent mechanism to protect physical storage back-end against overloading by some disk intensive VMs and allows to set up some fair user policy for storage performance. Somebody can argue with usage of VM disk share mechanism. Yes, that's of course possible as well and it can be complementary. However, with shares fair user policy your users will get high performance at the beginning when back-end storage has lot of available performance but their performance will decrease later during time when more VMs will use this particular datastore. It means that performance is not predictive and users can complain.

Let's do simple IOPS limit example. You have datastores provisioned on storage pool with automated storage tiering which can serve up to 25,000 IOPS and you have there 100 virtual disks (vDisks). Setting 250 IOPS limit to each virtual disk ensures that if all VMs will use all their IOPSes back-end datastores will not be overloaded. I agree it is very strict limitation and VMs cannot use more IOPS when performance is available in physical storage. But this is business problem and best vDisk limiting policy depends on your business model and company strategy. Below are listed two business models for virtual disk performance limits I've already used on some my vSphere projects:

  • Service catalog strategy
  • Capacity/performance ratio strategy

Service catalog strategy allows customers (internal or external) increase or decrease vDisk IOPSes as needed and of course pay for it appropriately.

Capacity/performance ratio strategy approach is to calculate ratio between physical storage capacity and performance and use same ratio for vDisks. So if you have storage having 50 TB with 25,000 front-end performance you have 1 GB with 0.5 IOPS. You should define and apply some overbooking ratio because you use shared storage. Let's use ratio 3:1 and we will have 150 IOPSes for 100GB vDisk.

To be honest I prefer service catalog strategy as it is what real world need because each workload is different and service catalog gives better way how to define vDisks to match workloads in your particular environment.

Summary
VM vDisk IOPS limit approach is useful in environments where you want to have guaranteed and long term predictable storage performance (response time) for VMs vDisks. Please, be aware that even this approach is not totally fair because IOPS reality is much more complex and total number of IOPSes on back-end storage is not static number as we use in our example. In real physical storage, the number of front-end IOPSes you can get from storage is function of several parameters like IO size, read/write ratio, RAID type, workload type (sequence or random), cache hit, automated storage algorithm, etc ...

I hope VMware VVOLs will move this approach to the next level in future. However vDisk IOPS limit is technology we can use today.

Monday, July 07, 2014

How social media and community sharing help entrprise customers


I'm always happy when someone finds my blog article or shared document useful. Here is one example of recent email communication from one DELL customer who Googled my DELL OME (Open Manage Essentials is basic system management for DELL hardware inforastructure) document explaining network communication flows among DELL Open Manage components.

All personal information are anonymized so customer real name is changed to Mr.Customer. 


VCDX Defense Timer

If you are prepering for VCDX and you want to do VCDX mock defense you can use the exact timer which is used during real VCDX defense.

The timer is available online at https://vcdx.vmware.com/vcdx-timer

Good luck with your VCDX journey!!!

Wednesday, May 28, 2014

DELL Force10 : VLT - Virtual Link Trunking

Do you know CISCO's Virtual port Channel? Do you want the same with DELL datacenter switches. Here we go.

General VLT overview

Virtual Link Trunking or VLT is a proprietary aggregation protocol developed by Force10 and available in their datacenter-class or enterprise-class network switches. VLT is implemented in the latest firmware releases (FTOS from 8.3.10.2) for their high-end switches like the S4810, S6000 and Z9000 10/40 Gb datacenter switches. Although VLT is a proprietary protocol from Force10, other vendors offer similar features to allow users to set up an aggregated link towards two (logical) different switches, where a standard aggregated link can only terminate on a single logical switch (thus either a single physical switch or on different members in a stacked switch setup).  For example CISCO's similar proprietary protocol is called Virtual Port Channel (aka vPC) and Juniper has another one called Multichassis LAG (MC-LAG).

VLT is a layer-2 link aggregation protocol between end-devices (servers) connected to (different) access-switches, offering these servers a redundant, load-balancing connection to the core-network in a loop-free environment, eliminating the requirement for the use of a spanning-tree protocol.[2] Where existing link aggregation protocols like (static) LAG (IEEE 802.3ad) or LACP (IEEE 802.1ax) require the different (physical) links to be connected to the same (logical) switch (such as stacked switches), the VLT, for example, allows link connectivity between a server and the network via two different switches.

Instead of using VLT between end-devices like servers it can also be used for uplinks between (access/distribution) switches and the core switches.[3]

Above VLT general description is from Wikipedia. Fore more information about VLT see http://en.wikipedia.org/wiki/Virtual_Link_Trunking

DELL published Force10 VLT Reference Architecture (PDF - link cached by google) where VLT is explained in detail so it is highly recommended to read it together with all product documentation and release notes before any real plan, design and implementation.

VLT Basic concept and terminology

The VLT peers exchange and synchronize Layer2-related tables to achieve harmonious Layer2 forwarding among the whole VLT domain, but the mechanism involved is transparent.

VLT is a trunk (as per its name) attaching remote hosts or switches.
VLTi is the interconnect link between the VLT peers. For historical reasons that is also called ICL (InterConnect Link) in the command outputs.

All the following rules apply to the VLT topologies
 2 unit per domain (as of FTOS 8.3.10.2)
 8 links per port-channel or fewer.
 Units should run the same FTOS version
 The backup should employ a different link than the VLTi, and preferably a diverse path

Simple implementation plan

Below I'll write simplified implementation plan for VLT configuration so it should be handy for any lab or proof of concept deployments.

 Implementation plan is divided in to 6 steps.
  1. Check or configure spanning tree protocol
  2. Check or configure LLDP
  3. Check or configure out of band management leveraged for VLT backup link
  4. Configure VLTi link (VLT inter connect)
  5. Configure VLT domain
  6. Configure VLT port-channel

Step 1 - Check or configure spanning tree protocol
Rapid Spanning-Tree should be enabled to prevent configuration and patching mistakes. STP configuration depends on customer environment and spanning tree topology preferences. Below parameters are just examples.

Switch A - configured to become RSTP root
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 4096 (if you want to have this switch as STP root)

Switch B - configured as backup root.
protocol spanning-tree rstp
 no disable
 hello-time 1
 max-age 6
 forward-delay 4
 bridge-priority 8192
Step 2 - LLDP configutration
LLDP must be enabled to advertise theirs configuration and receive configuration information form the adjacent LLDP-enabled device.

Switch A
protocol lldp
  advertise management-tlv system-description system-name
  no disable

Switch B
protocol lldp
  advertise management-tlv system-description system-name
  no disable
Step 3 - VLT backup link
VLT backup link is used to exchange heartbeat messages between the two VLT peers. The Management interface at both VLT peers to activate the backup link.

Switch A
interface management 0/0
  ip address switch-A-IP/switch-A-mask
  no shutdown
Switch B
interface management 0/0
  ip address switch-B-IP/switch-B-mask
  no shutdown

Step 4 - VLTi (interconnect) link
Now we configure the VLTi, the connection between both VLT peers. It is recommended to use a Static Port channel for redundancy reasons. Two 40GbE interfaces are enough and we bound it at the Port channel 127.  No special configuration is required at the interface or Port channel configuration level. To become a VLTi (automatically managed by the system), the port-channel should be in default mode (no switchport).

Switch A
interface port-channel 127
  description "VLTi - interconnect link"
  channel-member VLTi_INTERFACE1
  channel-member VLTi_INTERFACE2
  no ip address 
  mtu 12000
  no shutdown

Switch B
interface port-channel 127
  description "VLTi - interconnect link"
  channel-member VLTi_INTERFACE1
  channel-member VLTi_INTERFACE2
  no ip address 
  mtu 12000
  no shutdown

Note 1: Don't forget to do no shutdown for physical interfaces acting as port-channel members. Your port-channel stay down unless you put them up.
Note 2: Port-channel nor physical ports must NOT be in switchmode to be used for VLTi.
Note 3: If you are planning to use jumbo frames (bigger MTU size) then you have to use it also for VLTi links (max MTU on Force10 is 12000 so it is good idea to set it to max).

Use following configuration for all VLTi interfaces
interface VLTi_INTERFACEx
  no shutdown
  no switchmode

Verify port-channel status on both switches
show int po 127 brief

Port-channel should be up and composed from 2 ports.

Step 5 - VLT domain configuration
 We have to configure the domain number and the VLT domain options described below.
  • We use the peer-link command to select which is the VLTi interface.
  • We have to select the interface for the heartbeat messages exchange we use the back-up destination command with the ip address of the other VLT peer.
  • We should set the primary-priority command to configure the VLT role (primary or secondary). Primary VLT node will be the switch with lower priority. 
  • The system-mac mac-address command must match at both peers in the VLT domain. 
  • The unit id number 0 or 1 with the unit-id command will minimize the time required for the VLT system to determine the unit ID assigned to each peer switch when one peer switch reboots.

Switch A (primary)
vlt domain 1
  peer-link port-channel 127
  back-up destination switch-B-IP
  primary-priority 1
  system-mac mac-address 02:00:00:00:00:01
  unit-id 0
Switch B (secondary)
vlt domain 1
  peer-link port-channel 127
  back-up destination switch-A-IP
  primary-priority 8192
  system-mac mac-address 02:00:00:00:00:01
  unit-id 1
For verification we can use commands below
sh vlt brief
sh vlt statistics
sh vlt backup-link

Step 5 - VLT Port Channel
It is recommended that VLTs that are facing hosts/switches should be preferably built by LACP, to benefit from the protocol negotiations. However static port-channels are also supported.

It is also recommended to configure dampening (or equivalent) on the interfaces of connected hosts/switches (access switches, not VLT peers). The reason to use dampening is that at start-up time, once the physical ports are active a newly started VLT peer takes several seconds to fully negotiate protocols and synchronize (VLT peering, RSTP, VLT backup links, LACP, VLT LAG sync, etc). The attached devices are not aware of that activity and upon activation of a physical interface, the connected device will start forwarding traffic on the restored link, despite the VLT peer unit being still unprepared. It will black-hole traffic. Dampening on connected devices (access switches) will hold an interface temporarily down after a VLT peer device reload. A reload is detected as a flap: the link goes down and then up. Dampening acts as a cold start delay, ensuring that the VLT peers are up most ready to forward before the physical interface is activated, avoiding temporary black holes. Suggested dampening time: 30 seconds to 1 minute. We use 60 seconds in our example.

So let's finally configure the port channel (dynamic LAG) that interconnect the  S4810’s (VLT Domain) to the ustream S60 what is our hypotetical L3 switch (router).

Switch A
interface port-channel 1
  description "Uplink to S60"
  no ip address
  switchport
  vlt-peer-lag port-channel 1
  no shutdown

interface tengigabit 0/PO1-INTERFACE
  port-channel-protocol lacp
    port-channel 1 mode active
  dampening 10 100 1000 60
  no shutdown
Switch B
interface port-channel 1
  description "Uplink to S60"
  no ip address
  switchport
  vlt-peer-lag port-channel 1
  no shutdown

interface tengigabit 0/PO1-INTERFACE
  port-channel-protocol lacp
    port-channel 1 mode active
  dampening 10 100 1000 60
  no shutdown
Hope it is helpful not only for me but also for someone else. Any comments are welcome.

Locally Administered Address Ranges

MAC Addresses
There are  4 sets of Locally Administered Address Ranges that can be used on your network without fear of conflict, assuming no one else has assigned these on your network:
 
x2-xx-xx-xx-xx-xx
x6-xx-xx-xx-xx-xx
xA-xx-xx-xx-xx-xx
xE-xx-xx-xx-xx-xx

Replacing x with any hex value.

See http://en.wikipedia.org/wiki/MAC_address for more information.

Update 2014-10-27: one my reader notify me that some MAC OUI comply with rules above are used by some vendors. See example below:
  02-07-01   (hex) RACAL-DATACOM
  02-1C-7C   (hex) PERQ SYSTEMS CORPORATION
  02-60-86   (hex) LOGIC REPLACEMENT TECH. LTD.
  02-60-8C   (hex) 3COM CORPORATION
  02-70-01   (hex) RACAL-DATACOM
  02-70-B0   (hex) M/A-COM INC. COMPANIES
  02-70-B3   (hex) DATA RECALL LTD
  02-9D-8E   (hex) CARDIAC RECORDERS INC.
  02-AA-3C   (hex) OLIVETTI TELECOMM SPA (OLTECO)
  02-BB-01   (hex) OCTOTHORPE CORP.
  02-C0-8C   (hex) 3COM CORPORATION
  02-CF-1C   (hex) COMMUNICATION MACHINERY CORP.
  02-E6-D3   (hex) NIXDORF COMPUTER CORPORATION

Therefore I would always recommend to validate it at
http://en.wikipedia.org/wiki/MAC_address#Bit-reversed_notation
For example 02-00-00 is not used by anybody so you can most probably use it for internal purpose.

IP Addresses
Private network (internal)
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16

Private network (service provider - subscriber)
100.64.0.0/10
See http://en.wikipedia.org/wiki/Reserved_IP_addresses for more information.
 

Sunday, May 18, 2014

How to convert thick zeroed virtual disk to thin and save storage space

Last week I've been notified by my colleague about long term VMware vSphere issue described in VMware KB 2048016. The issue is that vSphere Data Protection restores a thin-provisioned disk as a thick-provisioned disk. This sounds like relatively big operational impact. However after reading VMware KB I've explained to my colleague that this is not typical issue or bug but it is rather expected behavior of VMware's CBT (change block tracking) technology and VADP (VMware API for data protection) framework. It's important to mention that it should behave like that only when you do initial full backup of thin provisioned VM which has never been powered on. In other words, if VM was ever powered on before initial backup procedure you shouldn't experience this issue.

After above explanation another logical question appeared.  
"How you can convert thick zeroed virtual disk to thin" ... when you experience weird behavior explained above and you restore your originally thin provisioned VM as thick VM. The obvious objection is to save storage space again leveraging VM thin provisioning.
My answer was to use "storage vMotion" which allows change ot vDisk type during migration. But just after my quick answer I realized there can be another potential issue with storage vMotion. If you use VAAI capable storage then storage vMotion is offloaded to the storage and it may not reclaim even zeroed vDisk space. This behavior and resolution is describe in VMware KB 2004155 named as "Storage vMotion to thin disk does not reclaim null blocks". The workaround mentioned in KB is to use offline method leveraging vmkfstools. If you want live storage migration (conversion) without downtime you would need another datastore with different block size. You can do it with legacy VMFS3 filesystem.

I decided to do a test to prove real storage vMotion behavior and know the truth. Everything else would be just speculations. Therefore I’ve test storage vMotion behavior of thick to thin migration and space reclamation in my lab where I have vSphere/ESX 5.5 and EqualLogic storage with VAAI support. To be honest the result surprised me in positive way. It seems that svMotion can save the space even I do svMotion between datastore with the same block and there is VAAI enabled.

You can see thick eager zeroed 40GB disk in screenshot below ...
Provisioned size is 44GB because VM has 4 GB RAM and therefore 4 GB swap file on VMFS.
Used storage is 40GB.

After live storage vMotion with conversion to Thin it saved the space. 
Used storage is just 22 GB.  You can see result at screenshot below ...

So I have just verified that svMotion can do what you need without downtime. And I don’t even need to migrated between datastores with different block size.

It was tested on ESX 5.5, EqualLogic firmware 6.x., and VMFS5 datastores created on thin provisioned LUNs by EqualLogic.  Storage thin provisioning is absolutely transparent to vSphere so this should not have impact on vSphere thin provisioning.


I know that this is just a workaround to the problem of VADP restore of never powered on VMDK but it works.  It converts thick to thin and is able to reclaim unused (zeroed) space insight virtual disks.

Conclusion:
vSphere 5.5 storage vMotion can convert thick VM to thin even between datastores having same block size. At least in tested configuration. Good to know. If someone else can do the test in your environment just leave the comment. It can be beneficial for others.

A/C Controller

A/C Controller is FreeBSD based appliance which monitors environmental temperature and automatically power on/off Air Conditioning units to achieve required temperature. It's distributed as 2GB (204MB zip) pre-installed FreeBSD image.

Project page: https://sourceforge.net/projects/accontrol/
Author: David Pasek

Hardware Infrastructure Monitoring Proxy

Every enterprise infrastructure product like server, blade system, storage array, fibre-channel or ethernet switch has some kind of CLI or API management. Lot of products support SNMP but it usually doesn't return everything what CLI/API offers. This project is set of connectors to different enterprise systems like DELL iDRAC and blade Chassis Management Controller, VMware vCenter and/or ESX, DELL Compellent. The framework is universal and other connectors to other systems can be simply developed.

Available connectors:
DELL CMC (racadm)
DELL DRAC (racadm)
General IPMI (ipmitools)
VMware vCLI (vcli)
Compellent Enterprise manager (odbc to MS-SQL)
Compellent Storage Center (CompCU.jar)
Brocade FC Switch (cli over telnet)

... other connectors and sensors can be simply developed. So if you have any need don't hesitate to contact me.

See video introduction at  https://www.youtube.com/watch?v=JRomfnfymlY
Project page: https://sourceforge.net/projects/monitoringproxy/ 
Author: David Pasek

Friday, May 16, 2014

Unable unmout ESX datastore

I've just been notified about annoying problem by customer for whom I did vSphere 5.5 Design. The datastore was not  posible to unmount. In ESX logs were something similar to message below.
Cannot unmount volume 'Datastore Name: vm3:xxx VMFS uuid: 517c9950-10f30962-931f-00304830a1ea' because file system is busy. Correct the problem and retry the operation.
There is KB about this symptom. VSAN component VSANTRACE was using datastore. That was the reason of busy file system. It was pretty annoying  issue as VSAN was not used nor enabled.

The solution is to disable vsantraced service so it is necessary to issue following command on evey ESX ...
chkconfig vsantraced off 

Not so nice, right? That's the downside of fully integrated VSAN software into general ESX hypervisor. I'm not happy with this approach. In my opinion, it would be much better distribute VSAN as additional software installing as regular VIB (VMware Installable Bundle).

Friday, May 09, 2014

Understanding Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) Terminology

To understand the Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) capabilities of the, you should become familiar with some basic terminology. I have just found excellent single page explaining all important terms from FC and  FCoE worlds. It is here.

Thanks Juniper to prepare it. I'm sure I will come back later for some abbreviation explanation.

 

Tuesday, May 06, 2014

Recovering from a Forgotten Password on the Force10 S series switch

I've just spent several hours to find the recovery procedure from forgotten password. Google returned just one relevant result to the Force10 tech tip page "How Do I Reset the S-Series to Factory Defaults?". However the procedure doesn't work because there is not "Option menu" during system boot. It is most probably old and deprecated procedure.

Here is the new procedure so I hope google will return it for other people looking for correct procedure.

Procedure to recovery from forgotten password on Force10 S-series switches:
  1. Use serial console
  2. Power off and then Power on all of the power modules
  3. Wait for message similar to "Hit Esc key to interrupt autoboot: 5" and press Esc key to go to Boot Loader (uBoot) interactive shell
  4. Change environment variable – setenv stconfigignore true - (uBoot - boot loader interactive shell)
  5. Save the changes - saveenv - (uBoot - boot loader interactive shell)
  6. Continue to boot the system – boot  - (uBoot - boot loader interactive shell)
  7. Default configuration is loaded so console login authentication is disabled by default
  8. Go to EXEC mode - en - (FTOS command line)
  9. Load startup configuration -  copy startup-config running-config - (FTOS command line)
  10. Now you can reconfigure the switch to change or add user login credentials
  11. Save configuration -  copy running-config startup-config - (FTOS command line)
  12. Reload the switch just for verification -  reload - (FTOS command line)

This procedure was tested on Force10 S4810 and S60.

Simple TFTP server for windows

Anybody working with networking equipment need simple tftp server. Typical use case is to download and/or upload switch configuration and to perform firmware upgrades.

I generally like simple tools which allow me to do my work quickly and efficiently.  That's the reason I really like portable version of TFTP32.

Fore more information about TFTP32 go here.


Monday, May 05, 2014

Microsoft Cluster Service (MSCS) support on VMware vSphere

Microsoft Cluster Service (MSCS) is Microsoft cluster technology required shared storage supporting SCSI reservation mechanism. Microsoft has introduced new - perhaps more modern and more descriptive - name for the same technology. New name is "Microsoft Failover Cluster" so don't be confused with different names.

VMware has supplementary documentation called "Setup for Failover Clustering and Microsoft Cluster Service" covering the subject. Here is online documentation for vSphere 5.5 and here for vSphere 5.1. This documentation is a must to read to fully understand what is and what is not possible.

However documentation is relatively old so if you want to know up-to-date information you have to leverage VMware KB. Here are two KB articles related to the topic
  • Microsoft Cluster Service (MSCS) support on ESXi/ESX (1004617)
  • Microsoft Clustering on VMware vSphere: Guidelines for supported configurations (1037959)
And here is general advice.
Plan, plan, plan ... design ... review ... implement ... verify.
I hope you know what I mean.

DELL Force10 : Initial switch configuration

[ Previous | DELL Force10 : Series Introduction ]

I assume you have serial console access to the switch unit to perform initial switch configuration. I guess it will not impressed you that to switch from read mode to configuration mode you  have to use command
conf
... before continue I would like to recap some important basic FTOS commands we will use later in this blog post. If you want to exit from configuration mode or even from deeper configuration hierarchy you can do it with one or several
exit 
commands which will jump to the upper level of configuration hierarchy and eventually exit conf mode. However the easiest way to leave configuration mode is to use command
end
which will exit configuration mode immediately.

The last but very important and very often used command is
write mem
which will write your running switch configuration to the flash and therefore configuration will survive the switch reload. You can do the same with more general command
copy running-config startup-config
If you want to display running configuration you can use command
show running-config
Whole configuration can be pretty long, so if you are interested only on some part of running configuration you can use following commands
show running-config interface managementethernet 0/0
show running-config interface gigabitethernet 0/2
show running-config spanning-tree rstp
show running-config boot
As you can see FTOS command line interface (cli) is very similar to CISCO.

Ok, so after basics let's start with initial configuration. Switch configuration usually begins with host name configuration. It is generally good practice to use unique host names because you know on which system you are logged in.
hostname f10-s60
As a next step I usually configure management IP settings and enable remote access. You have to decide if you will use in-band management leveraging normal IP settings usually configured on dedicated VLAN interface just for system management or you will leverage dedicated out-of-band management port. In example below you can see
  • out-of-band management port for system management IP settings
  • how to create admin user
  • how to enable ssh to allow remote system management
interface ManagementEthernet 0/0
  ip address 192.168.42.101/24
  no shutdown
exit
management route 0.0.0.0/0 192.168.42.1

username admin password YourPassword privilege 15

ip ssh server enable
Now you have to decide if you want to enforce login for users connected via local console. By default there is no login required which can by security risk especially in environments without strict physical security rules. Below is configuration which enforce local login credentials when using serial console.
 aaa authentication login default local
At this point I would like to note that Force10 switch has all capabilities and features disabled in default factory configuration. That's the reason why for example each switch interface must be explicitly enabled before usage because all interfaces are in shutdown state by default.

Before you enable any switch interface it is good practice to enable spanning tree protocol as security mechanism against potential loops in the network. Once again, spanning tree feature is not enabled by default so you have to do it explicitly. Force10 FTOS has implemented all standard and even some non-standard (CISCO proprietary) spanning tree protocols like PVSTP+. On the latest FTOS version following spanning tree protocols are supported:

  • STP (Spanning Tree Protocol)
  • RSTP (Rapid Spanning Tree Protocol)
  • MSTP (Multiple Spanning Tree Protocol)
  • PVSTP+ (Per-VLAN Spanning Tree Plus)
Bellow is configuration example which enables standard rapid spanning tree protocol (aka RSTP) ...
protocol spanning-tree rstp
  no disable 
Another decision you have to do before implementation is the location from where do you want to boot your switch operating system. On some Force10 models (for example on S60) is default primary boot location TFTP server  ...
boot system stack-unit 0 primary tftp://192.168.128.1/FTOS-SC-8.3.3.7.bin
boot system stack-unit 0 secondary system: A:
boot system stack-unit 0 default system: B:
boot system gateway 192.168.128.1
You can see that primary boot location is TFTP server. If you don't have tens or hundreds of switches you usually don't want to load FTOS remotely from TFTP server but from internal flash in the switch. Although default switch configuration would work because if TFTP server boot fails switch boot sequence continue with secondary location  but it's better to configure the switch boot sequence explicitly base on your requirements. Below is typical boot sequence configuration.
boot system stack-unit 0 primary system: A:boot system stack-unit 0 secondary system: B:boot system stack-unit 0 default system: A: no boot system gateway
Next thing you should check is what FTOS version do you have. Below is the command how you can check it ...
f10-s60#show version
Dell Force10 Networks Real Time Operating System Software
Dell Force10 Operating System Version: 1.0
Dell Force10 Application Software Version: 8.3.3.7
Copyright (c) 1999-2011 by Dell Inc.
Build Time: Sat Nov 26 01:23:50 2011
Build Path: /sites/sjc/work/build/buildSpaces/build20/E8-3-3/SW/SRC
f10-s60 uptime is 4 minute(s)
System image file is "system://A"
System Type: S60
Control Processor: Freescale MPC8536E with 2147483648 bytes of memory.
128M bytes of boot flash memory.
  1 48-port E/FE/GE (SC)
 48 GigabitEthernet/IEEE 802.3 interface(s)
  2 Ten GigabitEthernet/IEEE 802.3 interface(s)
You can see FTOS version 8.3.3.7 which is not the latest one as the latest FTOS version at the time of writing this article is 8.3.3.9 and boot loader 1.0.0.5. It is generally good practice to upgrade FTOS to the latest version before performing verification test and going into production. For the latest version you have to go to http://www.force10networks.com and sign in. If you don't have Force10 account you can register there. Please note that each Force10 switch model use different FTOS versions. So there can be FTOS 9.4.x for model S4810 and 8.3.x for S60.

Now I'll show you how to do FTOS and boot loader upgrade.
FTOS should be upgraded first and Boot Loader later ...
upgrade system tftp: A:
upgrade system stack-unit all A:
(applicable only if you have stack configured)
upgrade boot ftp: (applicable only if  new bootloader compatible with FTOS code exists)
reload
You can check current FTOS version
show version
and if you want to know what FTOS version do you have on which boot bank you can 
show boot system stack-unit 0
By the way, have I told you there are two boot banks? Boot bank A: and boot bank B:so you can choose primary and secondary boot location. We have already covered boot configuration but here it is again ...
conf
  boot system stack-unit 0 primary system: A:
  boot system stack-unit 0 secondary system: B:
FTOS is loaded by boot loader and current Boot Loader can be displayed by command below
show system stack-unit 0
Hope this post is helpful for IT community. In case you have any question, suggestion or idea on improvements please share your thoughts in in the comments.

Stay tuned and wait for next article ...

[ Next | DELL Force10 : Interface configuration and VLANs ]

DELL Force10 : Series Introduction

I have just decided to write dedicated blog post series about DELL Force10 networking. Why?

Who knows me in person is most probably aware that my primary professional focus is on VMware vSphere infrastructure and datacenter enterprise hardware. Sometimes I have discussion with infrastructure experts, managers and other IT folks what is the most important/complex/critical/expensive vSphere component. vSphere component is meant by compute (servers), storage and networking.  I thing it is needless discussion because all components are important and have to be integrated into single integrated system fulfilling all requirements, dealing with known constraints and mitigating all potential risks. Such integrated infrastructures are very often called POD which stands, as far as I know, for Performance Optimized Datacenter. These integrated systems are, from my point of view, new datacenter computers having dedicated but distributed computing, storage and networking components. I would prefer to call such equipment as "Optimized Infrastructure Block" or "Datacenter Computer" because it is not only about performance but also about reliability, capacity, availability, manageability, recoverability and security. We call these attributes infrastructure qualities and whole infrastructure block inherits qualities from sub-components. Older IT folks often compare this concept with main frame architectures however nowadays we usually use commodity x86 hardware "a little bit" optimized for enterprise workloads. 

By the way that's one of the reason I like current DELL datacenter product portfolio because DELL has everything I need to build POD - server, storage systems and now also networking so I'm able to design single vendor infrastructure block with unified support, warranty, etc. Maybe someone don't know but DELL acquired EqualLogic and Compellent storage vendors some time ago, but more importantly for this blog post, DELL also acquired well known (at least in US) datacenter networking producer Force10. For official acquisition details look here.

But back to the networking. Everybody would probably agree that networking is very important part of vSphere infrastructure because of several reasons. It provides interconnect between clustered components - think about vSphere networks like Management, vMotion, Fault Tolerance, VSAN, etc. It also routes network traffic to the outside world. And sometimes it even provides storage fabrics (iSCSI, FCoE, NFS). That's actually the reason why I'm going to write this series of blog posts about DELL Force10 networking - because of networking importance. However I don't want to write about  legacy networking but modern networking approach for next generation virtualized and software defined datacenters.

Modern physical networking is not only about hardware (burned intelligence in ASICs with high bandwidth, fast and low latency interfaces) but also in software. The main software sits inside DELL Force10 switches. It is switch firmware called FTOS - Force10 Operating System (see. for more general information about FTOS look here).  However, today it is not only about switch embedded firmwares but also about whole software ecosystem - managements, centralized control planes, virtual distributed switches, network overlays, etc.
 
In future articles I would like to deep dive into FTOS features, configuration examples and virtualization related integrations.

Next, actually first technical article in this series will be about about typical initial configuration of Force10 switch. I know it is not rocket science but we have to know basics before taking off. In the future I would like to write about more complex designs, capabilities and configurations like
  • Multi-Chassis Link Aggregation (aka MC-LAG). In Force10 terminology we call it VLT - Virtual Link Trunking.
  • Virtual Routing and Forwarding (aka VRF). Some S-series Force10 models with FTOS 9.4 support VRF-lite.
  • some Software Define Networking (aka SDN) capabilities like python/perl scripting inside the switch, REST API, VXLAN hardware VTEP, Integration with VMware Distributed Virtual Switch, Integration with VMware NSX, OpenFlow, etc.
If you prefer some topics please let me know in comments and I'll try to prioritize it. Otherwise I'll write future posts based on my preferences.   
So let's finish blog post series introduction and start some technical stuff and begin with deep dive into switch initial configuration! Just click next and below ...

[ Next | DELL Force10 : Initial switch configuration ]