Sunday, August 25, 2013

DELL OpenManage Essentials (OME)

OpenManage Essentials (OME) is a systems management console that provides simple, basic Dell hardware management and is available as a free download.

DELL OME can be downloaded at https://marketing.dell.com/dtc/ome-software?dgc=SM&cid=259733&lid=4682968

Patch 1.2.1 downloadable at
http://www.dell.com/support/drivers/us/en/555/DriverDetails?driverId=P1D4C

For more information look at DELL Tech Center.

Data Center Bridging

DCB 4 key protocols:
  •  Priority-based Flow Control (PFC): IEEE 802.1Qbb
  •  Enhanced Transmission Selection (ETS): IEEE 802.1Qaz
  •  Congestion Notification (CN or QCN): IEEE 802.1Qau
  •  Data Center Bridging Capabilities Exchange Protocol (DCBx)
PFC - provides a link level flow control mechanism that can be controlled independently for each frame priority. The goal of this mechanism is to ensure zero loss under congestion in DCB networks.PFC is independent traffic priority pausing and enablement of lossless packet buffers/queuing for particular 802.1p CoS.

ETS - provides a common management framework for assignment of bandwidth to frame priorities. Bandwidth can be dynamic based on congestion and relative ratios between defined flows. ETS provides minimum, guaranteed bandwidth allocation per traffic class/priority group during congestion and permits additional bandwidth allocation during non-congestion.

CN - provides end to end congestion management for protocols that are capable of transmission rate limiting to avoid frame loss. It is expected to benefit protocols such as TCP that do have native congestion management as it reacts to congestion in a more timely manner. Excellent blog post about CN is here.

DCBX - a discovery and capability exchange protocol that is used for conveying capabilities and configuration of the above features between neighbors to ensure consistent configuration across the network. Performs discovery, configuration, and mismatch resolution using Link Layer Discovery Protocol (IEEE 802.1AB - LLDP).

DCBX can be leveraged for many applications.
One DCBX application example is iSCSI application priority - Support for the iSCSI protocol in the application priority DCBX Type Length Value (TLV). Advertises the priority value (IEEE 802.1p CoS, PCP field in VLAN tag) for iSCSI protocol. End devices identify and tag Ethernet frames containing iSCSI data with this priority value.

Friday, August 23, 2013

DELL Force10 I/O Aggregator 40Gb Port Question

Today I have received question how to inter connect DELL Force10 IOA 40Gb uplink with DELL Force10 S4810 top of rack switches.

I assume the reader is familiar with DELL Force10 datacenter networking portfolio.

Even if you have 40Gb<->40Gb twinax cable with QSFPs between IOA and Force10 S4810 switch it is in IOA side configured by default as 4x10Gb links grouped  in Port-Channel 128.

If you connect it directly into 40Gb port in Force10 S4810 switch the 40Gb port is by default configured as 1x40Gb interface.

That’s the reason why it doesn’t work out-of-the-box. Port speeds are simply mismatched.

To make it correct you have to change 40Gb switch port to 4x10Gb port. Here is the S4810 command to change switch port from 1x40Gb to 4x10Gb:
stack-unit 0 port 48 portmode quad

Here is the snip from S4810 configuration where 40Gb port 0/48 is configure as 4x10Gb port in port-channel 128
interface TenGigabitEthernet 0/48
no ip address
!
port-channel-protocol LACP
  port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/49
no ip address
!
port-channel-protocol LACP
  port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/50
no ip address
!
port-channel-protocol LACP
  port-channel 128 mode active
no shutdown
!
interface TenGigabitEthernet 0/51
no ip address
!
port-channel-protocol LACP
  port-channel 128 mode active
no shutdown

interface Port-channel 128
no ip address
portmode hybrid
switchport
no shutdown

Tuesday, August 20, 2013

Best Practices for Faster vSphere SDK Scripts

Reuben Stump published excellent blog post at http://www.virtuin.com/2012/11/best-practices-for-faster-vsphere-sdk.html about performance optimization of PERL SDK Scripts.

The main takeaway is to minimize the ManagedEntity's Property Set.

So instead of

my $vm_views = Vim::find_entity_views(view_type => "VirtualMachine") ||
  die "Failed to get VirtualMachines: $!";

you have to use

# Fetch all VirtualMachines from SDK, limiting the property set
my $vm_views = Vim::find_entity_views(view_type => "VirtualMachine",
          properties => ['name', 'runtime.host', 'datastore']) ||
  die "Failed to get VirtualMachines: $!";

This small improvement have significant impact on performance because it eliminates big data (SOAP/XML) generation and transfer between vCenter service and the SDK script.

It helped me improve performance of my script from 25 seconds to just 1 second. And the impact is even better for bigger vSphere environment. So my old version of script was almost useless and this simple improvement help me so much.

Thanks Reuben for sharing this information.
 

Monday, August 19, 2013

DELL Blade Chassis power consumption analytics in vCenter Log Insight

DELL Blade Chassis has a capability to send power consumption information via syslog messages. I have never understood  how to practically leverage this capability. When VMware released vCenter Log Insight I have immediately realized how to leverage this tool to visualize blade chassis power consumption.

I prepared short video how to create blade chassis power consumption graph in vCenter Log Insight. The video is located at http://youtu.be/fda1cLW8enA



Wednesday, August 14, 2013

ESXi Advanced Settings for NetApp NFS

Here are NetApp 
Net.TcpipHeapSize=30
Net.TcpipHeapMax=120
NFS.MaxVolumes=64
NFS.HeartbeatMaxFailures=10
NFS.HeartbeatFrequency=12
NFS.HeartbeatTimeout=5

Enabled SIOC or if you don't have Entrprise+ license set NFS.MaxQueueDepth=64, 32 or 16 based on storage workload and utilization

Sunday, August 11, 2013

Unified Network, DCB and iSCSI challenges

iSCSI SAN is Storage Area Network. Storage need lost less fabric. If, for any reason, unified fabric need to be used then quality of ethernet/IP network  is crucial for problem less storage operation.

For example DELL EqualLogic supports and leverage DCB (PFC, ETS and DCBX).
iSCSI-TLV is a part of DCBX. However the DCB protocol primitives must be supported end to end so if one member of the chain doesn't support it than it is useless.

How DCB makes iSCSI better is deeply explained here

So think twice if you really want converged network (aka unified fabric) or dedicated iSCSI network is better option for you.

Tuesday, August 06, 2013

DELL EqualLogic general recommendations for VMware vSphere ESXi




Bellow are eight major recommendations for DELL EqualLogic implementation with vSphere ESXi:

  1. Delayed ACK disabled
  2. LRO disabled
  3. If using Round Robin, set IOs to 3
  4. If  ENTERPRISE or ENTERPRISE+ license and ONLY with Enterprise/Enterprise+  install MEM 1.1.2
  5. Extend login_timeout to 60 seconds.
  6. Don’t have multiple VMDKs on a single Virtual SCSI controller.  (major cause of latency alerts)
  7. Align partitions on 64K boundary
  8. Format with 64K cluster (allocation unit) size with Windows

Monday, August 05, 2013

CISCO Nexus 1000v - Quality Of Service configuration


class-map type queuing match-any n1kv_control_packet_mgmt_class
 match protocol n1k_control
 match protocol n1k_packet
 match protocol n1k_mgmt

class-map type queuing match-all vmotion_class
 match protocol vmw_vmotion

class-map type queuing match-all vmw_mgmt_class
 match protocol vmw_mgmt

class-map type queuing match-any vm_production
 match cos 0

policy-map type queuing uplink_queue_policy
 class type queuing n1kv_control_packet_mgmt_class
   bandwidth percent 10
 class type queuing vmotion_class
   bandwidth percent 30
 class type queuing vmw_mgmt_class
   bandwidth percent 10
 class type queuing vm_production
   bandwidth percent 40

port-profile type ethernet uplink
 service-policy type queuing output uplink_queue_policy

Wednesday, July 24, 2013

How to downgrade IBM V7000 (Storwize) firmware

Sometimes, especially when you do a problem management, you have a need to downgrade firmwares on some system components. I have such need for IBM V7000 storage array. Downgrade process is not documented in IBM official documentation so here is the downgrade process step by step:
  1. Double check you have IP addresses on management interfaces of both canisters (controllers)
  2. Login to management interface of one particular canister over https. https://[ip_mgmt_canister]. You have to use superuser credentials. Default IBM Storwize superuser password is passw0rd
  3. Switch node to serrvice state. You should wait 15-20 minutes
  4. Login to second node management interface. 
  5. Switch second node to service state. You should wait another 15-20 minutes
  6. Double check both nodes are in service state
  7. Login to one node and choose action "Reinstall software". Browse and upload firmware image via web browser. Software reinstallation takes a while. You have to wait approximattely one or two hours. In the mean time you can ping canyster management IP addresses to check when nodes comming back. 
  8. Repeat software reinstallation for second node.
  9. Please be aware that storage configuration is lost after software reinstallation. Therefore you have to use default password for superuser. Recall it is passw0rd
  10. When both nodes are up and running login to one canister node management interface and exit both nodes from service state. It can takes another 15-20 minutes.
  11. When nodes are active you have to regenerate Cluster ID. You have to go to "Configure Enclosure" and enable checkbox "Reset System ID".
  12. After all these actions you have Storwize ready to form a new Cluster. So create cluster and assign cluster virtual IP address you will use for standard storage management.

Sunday, July 14, 2013

ESX host remote syslog configuration

For remote CLI you can use vMA or vCLI. Here is the example how to configure ESX host (10.10.1.71) to send logs remotely to syslog server listening on IP address 10.10.4.72 on tcp port 514. 

First of all we have to instruct ESX where is the syslog server.
esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog config set --loghost='tcp://10.10.4.72:514'
Then syslog service on ESX host must be restarted to accept configuration change.
esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog reload

ESX firewall must be reconfigured to allow syslog traffic
esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall refresh

If you want to test or troubleshoot syslog logging you can login to ESX host and use logger command to send test message to syslog.
logger "Test syslog over network"

Tuesday, July 09, 2013

Excellent article: "Anatomy of an Ethernet Frame"

Trey Layton (aka EthernetStorageGuy) wrote excellent article about MTU sizes and Jumbo Frame settings. The article is here. In the article you will learn what MTU size parameters you have to configure in the path among server, network gear and storage. It is crucial to understand difference between payload (usually 1500 or 9000) and different frame sizes (usually 1522 or 9018 or 9022 or  9216) on networking equipment.

Here is summation of  Trey's deep Ethernet frame anatomy in to the simple best practice. "If you want to implement Jumbo Frames use pure datacenter networking equipment and setup MTU size to the device maximum which is usually 9216."

DELL Open Manage Essentials 1.2 has been released

Dell OpenManage Essentials is a 'one to many' console used to monitor Dell Enterprise hardware. It can discover, inventory, and monitor the health of Dell Servers, Storage, and network devices. Essentials can also update the drivers and BIOS of your Dell PowerEdge Servers and allow you to run remote tasks. OME can increase system uptime, automate repetitive tasks, and prevent interruption in critical business operations.

It can be downloaded here.

Fixes & Enhancements
Fixes:
  1. Multiple defect fixes and performance improvements
Enhancements:
  1. Support for Discovery, Inventory and Map View for Dell PowerEdge VRTX devices. 
  2. Addition of Microsoft Windows Server 2012 as a supported operating system for the management station.
  3. Context sensitive Search functionality. 
  4. Ability to configure OpenManage Essentials to send the warranty status of your devices through email at periodic intervals. 
  5. Ability to configure OpenManage Essentials to generate a warranty scoreboard based on your preference and display a notification icon in the heading banner when the warranty scoreboard is available.
  6. Enhanced support for Dell Compellent, Dell Force10 E-Series and C-Series, Dell PowerConnect 8100 series, Dell PowerVault FS7500, and PowerVault NX3500 devices. 
  7. Support for installing OpenManage Essentials on the domain controller.
  8. Device Group Permissions portal. 
  9. Additional reports: Asset Acquisition Information, Asset Maintenance Information, Asset Support Information, and Licensing Information. 
  10. Addition of a device group for Citrix XenServers and Dell PowerEdge C servers in the device tree. 
  11. Availability of storage and controller information in the device inventory for the following client systems: Dell OptiPlex, Dell Latitude, and Dell Precision.
  12. CLI support for discovery, inventory, status polling, and removal of devices from the device tree. 
  13. Availability of sample command line remote tasks for uninstalling OpenManage Server Administrator and applying a server configuration on multiple managed nodes. 
  14. Support for SUDO users in Linux for system updates and OMSA deploy tasks. 
  15. Display of a notification icon in the heading banner to indicate the availability of a newer version of OpenManage Essentials. 
  16. Support for enabling or disabling rebooting after system update for out-of band (iDRAC) system updates.
  17. Support for re-running system update and OpenManage Server Administrator (OMSA) deployment tasks.
  18. Support for Single Sign-On (SSO) for iDRAC and CMC devices. 
  19. Ability to log on as a different user.


Tuesday, July 02, 2013

How to change default path selection policy for particular storage array?

Sometimes the firmware in storage array has some problems and you have to "downgrade" functionality to achieve operable system. That's sometimes happen for some ALUA storage systems where Round Robin path policy or Fixed path policy (aka FIXED) should work but doesn't because of firmware issue.

So relatively simple solution is to switch back from more advanced round robin policy to legacy - but properly functioning -  Most Recently Used path policy (aka MRU) normally used for active/passive storage systems. 

Note: Please be aware that some storage vendors saying they have active/active storage even they have not. Usually and probably more precisely they call it "dual-active storage" which is not same as active/active. Maybe I should write another post about this topic.

You can change Path Selection Policy by several ways and as always the best option depends on your specific requirements and constrains.

However, if you have only one  instance of some storage type connected to your ESX hosts you can simply change default path selection policy for this particular SATP type. Let's assume you have some LSI storage.

Below is simple esxcli command how to do it ...

esxcli storage nmp satp set --default-psp=VMW_PSP_MRU --satp=VMW_SATP_LSI

... and then your default PSP for VMW_SATP_LSI is now VMW_PSP_MRU

One thing you must be aware … if you in the past explicitly changed any devices (disks) to another path selection policy then default PSP will not change even you have another default path policy. There is not esxcli mechanism how to change devices back to accept  default PSP for particular SATP type. Only solution is to edit /etc/vmware/esx.conf

All previous explicit changes are written in /etc/vmware/esx.conf  so it is pretty simple to find it and remove these lines form config file.  I silently assume you do such operations in maintenance mode so after ESX reboot all your paths for your devices will follow default SATP path selection policy. 

BTW: That’s why I generally don’t recommend to change PSP for particular device when not necessary. Sometimes it is necessary for example for RDM’s participating in MSCS cluster. But usually it is abused by admins and implementation engineers.  I strongly believe it is always better to set default PSP to behave as required.

Do you want to test Heavy Load? Try Heavy Load tool.

Bring your PC to its limits with the freeware stress test tool HeavyLoad. HeavyLoad puts your workstation or server PC under a heavy load and lets you test whether they will still run reliably.

Look at http://www.jam-software.com/heavyload/

Sunday, June 30, 2013

Simple UNIX Shell Script for generating disk IO trafic

Here is pretty easy unix shell script for disk I/O generation.
#!/bin/sh
dd_threads="0 1 2 3 4 5 6 7 8 9"
finish () {
  killall dd
  for i in $dd_threads
  do
    rm /var/tmp/dd.$i.test
  done
  exit 0;
}
trap 'finish' INT
while true
do
  for i in $dd_threads
  do
    dd if=/dev/random of=/var/tmp/dd.$i.test bs=512 count=100000 &
  done
done
Generated IOs (aka TPS - transaction per second) can be watched by following command
iostat -d -c 100000
Script can be terminated by pressing CTRL-C.

Thursday, June 27, 2013

Calculating optimal segment size and stripe size for storage LUN backing vSphere VMFS Datastore

Colleague of mine (BTW very good Storage Expert) asked me what is the best segment size for storage LUN used for VMware vSphere Datastore (VMFS). Recommendations can vary among storage vendors and models but I think the basic principles are same for any storage.

I found IBM RedBook [SOURCE: IBM RedBook redp-4609-01] explanation the most descriptive, so here it is.
The term segment size refers to the amount of data that is written to one disk drive in anarray before writing to the next disk drive in the array, for example, in a RAID5, 4+1 array with a segment size of 128 KB, the first 128 KB of the LUN storage capacity is written to the first disk drive and the next 128 KB to the second disk drive. For a RAID1, 2+2 array, 128 KB of an I/O is written to each of the two data disk drives and to the mirrors. If the I/O size is larger than the number of disk drives times 128 KB, this pattern repeats until the entire I/O is completed. For very large I/O requests, the optimal segment size for a RAID array is one that distributes a single host I/O across all data disk drives. 
The formula for optimal segment size is:
LUN segment size = LUN stripe width ÷ number of data disk drives 
For RAID 5, the number of data disk drives is equal to the number of disk drives in the array minus 1, for example:
RAID5, 4+1 with a 64 KB segment size = (5-1) * 64KB = 256 KB stripe width 
For RAID 1, the number of data disk drives is equal to the number of disk drives divided by 2, for example:
RAID 10, 2+2 with a 64 KB segment size = (2) * 64 KB = 128 KB stripe width 
For small I/O requests, the segment size must be large enough to minimize the number ofsegments (disk drives in the LUN) that must be accessed to satisfy the I/O request, that is, to minimize segment boundary crossings. 
For IOPS environments, set the segment size to 256KB or larger, so that the stripe width is at least as large as the median I/O size. 
IBM Best practice: For most implementations set the segment size of VMware data partitions to 256KB.

Note: If I decrypting IBM terminology correctly then IBM mentioned term "stripe width" is actually "data stripe size". We need to clear terminology because normally is the term "stripe width" used as number of disks in RAID group. "Data stripe size" is payload without the parity. The parity is stored on another segment(s) dependent on selected RAID level.

For clear understanding terminology I've created  RAID 5 (4+1) segment/stripe visualization depicted bellow.

RAID 5 (4+1) striping example
RAID 5 (4+1) striping example

Even I found this IBM description very informative I'm not sure why they recommend to use segment size 256KB for VMware. It is true that the biggest IO size issued from ESX can be by default 32MB because bigger IOs issued from guest OS ESX splits into more IOs (for more information about big IO split see this blog post). However the most important is IO size issued from guest OSes. If you want to monitor max/average/median IO size from ESX you can use tool vscsiStats already included in ESXi for such purpose. It allows you to show histogram which is really cool (for more information about vscsiStats read this excellent blog post). So based on all these assumptions and also my own IO size monitoring in the field it seems to me that average IO size issued from ESX is usually somewhere between 32 and 64KB. So let's use 64KB as average data stripe (IO size issued from OS). Then for RAID 5 (4+1) data stripe will be composed from 4 segments and optimal segment size in this particular case should be 16KB (64/4).

Am I right or I missed something? Any comments are welcome and highly appreciated.

Update 2014/01/31:
We are discussing this topic very frequently with my colleague who work as DELL storage specialist. The theory is nice but only the real test can prove any theory. Recently he performed set of IOmeter tests against DELL PV MD3600f which is actually the same array as IBM DS3500. He found that optimal performance (# of IOPS versus response times) is when segment size is as close as possible to IO size issued from operating system. So key takeaway from this exercise is that optimal segment size for example above is not 16KB but 64KB. Now I understand IBM general recommendation (best practice) to use 256KB segment size for VMware workloads as this is the biggest segment size which can be chosen.

Update 2014/07/23:
After more thinking about this topic I've realized that idea to use the segment size bigger than your biggest IO size can make sense from several reasons

  • each IO will get single spindle (disk) to handle this IO which will use queues down the route and will be served in spindle latency time which is the minimal one for this single IO, right?
  • typical virtual infrastructure environment is running several VMs generating several IOs based on queues available in the guest OS, ESX layer disk scheduler settings (see more here on Duncan Epping blog) so at the end of the day you are able to generate lot of IOPSes by different threads and load is evenly distributed across RAID group
However, please note, that all this discussion was related to legacy (traditional) storage architectures. Some modern (virtualized) storages are doing some magic on their controllers like I/O Coalescing. I/O Coalescing is IO optimization leveraging reordering smaller IO writes to another bigger IO in controller cache and sending this bigger IO down to the disks. This can significantly change segment size recommendations so please try to understand particular storage architecture or follow storage vendor best practices and try to understand the reason of these recommendations in your particular use case. I remember EMC Clariions used IO coalescing into 64KB IO blocks. 

Related resources:

Wednesday, June 26, 2013

IOBlazer

IOBlazer is a multi-platform storage stack micro-benchmark. IOBlazer runs on Linux, Windows and OSX and it is capable of generating a highly customizable workload. Parameters like IO size and pattern, burstiness (number of outstanding IOs), burst interarrival time, read vs. write mix, buffered vs. direct IO, etc., can be configured independently. IOBlazer is also capable of playing back VSCSI traces captured using vscsiStats. The performance metrics reported are throughput (in terms of both IOPS and bytes/s) and IO latency.
IOBlazer evolved from a minimalist MS SQL Server emulator which focused solely on the IO component of said workload. The original tool had limited capabilities as it was able to generate a very specific workload based on the MS SQL Server IO model (Asynchronous, Un-buffered, Gather/Scatter). IOBlazer has now a far more generic IO model, but two limitations still remain:
  1. The alignment of memory accesses on 4 KB boundaries (i.e., a memory page)
  2. The alignment of disk accesses on 512 B boundaries (i.e., a disk sector).
Both limitations are required by the gather/scatter and un-buffered IO models.
A very useful new feature is the capability to playback VSCSI traces captured on VMware ESX through the vscsiStats utility. This allows IOBlazer to generate a synthetic workload absolutely identical to the disk activity of a Virtual Machine, ensuring 100% experiment repeatability.

TBD - TEST & WRITE REVIEW

PXE Manager for vCenter

PXE Manager for vCenter enables ESXi host state (firmware) management and provisioning. Specifically, it allows:
  • Automated provisioning of new ESXi hosts stateless and stateful (no ESX)
  • ESXi host state (firmware) backup, restore, and archiving with retention
  • ESXi builds repository management (stateless and statefull)
  • ESXi Patch management
  • Multi vCenter support
  • Multi network support with agents (Linux CentOS virtual appliance will be available later)
  • Wake on Lan
  • Hosts memtest
  • vCenter plugin
  • Deploy directly to VMware Cloud Director
  • Deploy to Cisco UCS blades
TBD - TEST & WRITE REVIEW

vBenchmark

vBenchmark provides a succinct set of metrics in these categories for your VMware virtualized private cloud. Additionally, if you choose to contribute your metrics to the community repository, vBenchmark also allows you to compare your metrics against those of comparable companies in your peer group. The data you submit is anonymized and encrypted for secure transmission.

Key Features:

  • Retrieves metrics across one or multiple vCenter servers
  • Allows inclusion or exclusion of hosts at the cluster level
  • Allows you to save queries and compare over time to measure changes as your environment evolves
  • Allows you to define your peer group by geographic region, industry and company size, to see how you stack up
TBD - TEST & WRITE REVIEW

Tuesday, June 25, 2013

How to create your own vSphere Performance Statistics Collector

Statsfeeder is a tool that enables performance metrics to be retrieved from vCenter and sent to multiple destinations, including 3rd party systems. The goal of StatsFeeder is to make it easier to collect statistics in a scalable manner. The user specifies the statistics to be collected in an XML file, and StatsFeeder will collect and persist these stats. The default persistence mechanism is comma-separated values, but the user can extend it to persist the data in a variety of formats, including a standard relational database or Key-value store. StatsFeeder is written leveraging significant experience with the performance APIs, allow the metrics to be retrieved in the most efficient manner possible.
White paper located at StatsFeeder: An Extensible Statistics Collection Framework for Virtualized Environments can give you better understanding how it work and how to leverage it.




Monday, June 24, 2013

vCenter Single Sign-On Design Decision Point

When you designing vSphere 5.1 you have to implement vCenter SSO. Therefore you have to make design decision what SSO mode to choose.

There are actually three available options

  1. Basic
  2. HA (don't mix with vSphere HA)
  3. Multisite
Justin King wrote excellent blog post about SSO here and it is worth source of information to make right design decision. I fully agree with Justin and recommending Basic SSO to my customers if possible. SSO Server protection  can be achieved by standard backup/restore methods and SSO High Availability can be increased by vSphere HA. All these methods are well known and long time used.

You have to use Multisite SSO when vCenter linked-mode is required but think twice if you really need it and benefits overweight drawbacks.

Thursday, June 20, 2013

Force10 Open Automation Guide - Configuration and Command Line Reference

This document describes the components and uses of the Open Automation Framework designed to run on the Force10 Operating System (FTOS), including:
• Smart Scripting
• Virtual Server Networking (VSN)
• Programmatic Management
• Web graphic user interface (GUI) and HTTP Server

http://www.force10networks.com/CSPortal20/KnowledgeBase/DOCUMENTATION/CLIConfig/FTOS/Automation_2.2.0_4-Mar-2013.pdf

Tuesday, June 18, 2013

How to – use vmkping to verify Jumbo Frames

Here is nice blog post about Jumbo Frame configuration on vSphere and how to test it works as expected. This is BTW excellent test for Operational Verification (aka Test Plan).

Architectural Decisions

Josh Odgers – VMware Certified Design Expert (VCDX) #90 is continuously building database of architectural decisions available at  http://www.joshodgers.com/architectural-decisions/

It is very nice example of one architecture approach.
 

Monday, June 17, 2013

PowerCLI One-Liners to make your VMware environment rock out!

Christopher Kusek wrote excellent blog post about PowerCLI useful scripts fit single line. He call it one-liners. These one-liners can significantly help you on daily vSphere administration. On top of that you can very easily learn PowerCLI constructs just from reading these one-liners.

http://www.pkguild.com/2013/06/powercli-one-liners-to-make-your-vmware-environment-rock-out/

Tuesday, June 04, 2013

Software Defined Networking - SDN

SDN is another big topic in modern virtualized datacenter so it is worth to understand what it is and how it can help us to solve real datacenter challenges.

Brad Hedlund's explanation "What is Network Virtualization"
http://bradhedlund.com/2013/05/28/what-is-network-virtualization/
Bred Hedlund is very well known netwoking expert. Now he works for VMware | Nicira participating on VMware NSX product which should be next network virtualisation platform (aka network hypervisor). He is ex-CISCO and ex-DELL | Force10 so there is big probability he fully understand what is going on.

It is obvious that "dynamic service insertion" is the most important thing in SDN. OpenFlow and CISCO vPath is trying to do it but each in different way. Same goal but with different approach. What is better? Who knows? The future and real experience will show us what is better. Jason Edelman's blog post very nicely and clearly compares both approaches.
http://www.jedelman.com/1/post/2013/04/openflow-vpath-and-sdn.html

CISCO as long term networking leader and pioneer has of course its own vision of SDN. Nexus 1000V and Virtual Network Overlays Play for CISCO Pivotal Role in Software Defined Networks. Very nice explanation of CISCO approach is available at
http://blogs.cisco.com/datacenter/nexus-1000v-and-virtual-network-overlays-play-pivotal-role-in-software-defined-networks/


Saturday, May 25, 2013

PernixData: New storage statusquo is comming

Storage SME's knows for ages that storage design begins with performance. The storage performance is usually much more important then capacity. One IOPS  cost more money then one GB of storage. Flash disks, EFD's and SSD's changed storage industry already. But the magic and the future is in software. PernixData FVP (Flash Virtualization Platform) looks like very intelligent, fully redundant and reliable software cluster aware storage acceleration platform. It leverages any local flash devices to accelerate any back-end storage used for server virtualization. Right now only VMware vSphere is supported but solution is hypervisor agnostic and it is just a matter of time when it will be ported to another server virtualization platform like Hyper-V, Xen, or KVM.

PernixData setups absolutely new storage quality in virtualized datacenter. If you have issue with storage response time (latency) then look at PernixData FVP. But what impressed me is the future because I believe the platform can be improved significantly and new functionality will come soon. I can imagine data compression and deduplication, data encryption, vendor independent replication, clonning, snapshoting, etc.

So software defined storage virtualization just began.

Happy journey PernixData.

For more information look at
http://www.pernixdata.com/
http://www.pernixdata.com/SFD3/

Wednesday, May 22, 2013

Magic Quadrant for General-Purpose Disk Arrays

http://www.gartner.com/technology/reprints.do?id=1-1ENAPKJ&ct=130325&st=sg

Pretty nice overview and comparison among storage vendors. Because I have privilege to practically design, implement and work with many storage arrays I can't agree with IBM positioning and description. In the past I was also impressed about IBM storage products but reality is little bit different. I was troubleshooting several big issues with IBM mid-range storage array IBM V7000 (Storwize) and also with high-end IBM DS8700 (Shark).
  

Monday, May 20, 2013

Difference between SCSI-2 and SCSI-3 reservation

SCSI-3 reservations are persistent across SCSI bus resets and support multiple paths from a host to a disk. In contrast, only one host can use SCSI-2 reservations with one path. If the need arises to block access to a device because of data integrity concerns, only one host and one path remain active. The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.

Info retrieve from:
http://sfdoccentral.symantec.com/sf/5.0/hpux/html/vcs_install/ch_vcs_install_iofence4.html

Thursday, May 16, 2013

Reduced vCenter DB by deleting old events and tasks from vCenter database


In vCenter MS-SQL Database is storage procedure called cleanup_events_tasks_proc which deletes old data based on event and task retention settings. vCenter retention settings can be setup in vCenter Settings though vSphere Client or changed directly in database. Using vSphere Client  is recommended.


c:> "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\OSQL.EXE" -S \SQLEXP_VIM -E
1> use VIM_VCDB
2> go
1> update vpx_parameter set value='' where name='event.maxAge'
2> update vpx_parameter set value='' where name='task.maxAge'
3> update vpx_parameter set value='true' where name='event.maxAgeEnabled'
4> update vpx_parameter set value='true' where name='task.maxAgeEnabled'
5> go
(1 row affected)
(1 row affected)
(1 row affected)
(1 row affected)
1> exec cleanup_events_tasks_proc
2> go
1> dbcc shrinkdatabase ('VIM_VCDB')
2> go
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
      5           1       81080         280       78776          78776
      5           2         128         128         128            128

(2 rows affected)
DBCC execution completed. If DBCC printed error messages, contact your system
administrator.
1> quit

Monday, April 29, 2013

DELL PowerConnect Time Configuration


Here is procedure how to setup it:

enable
configure
sntp unicast client enable
sntp server ntp.cesnet.cz
end

Here is how to verify:


console#show sntp configuration

Polling interval: 64 seconds
MD5 Authentication keys:
Authentication is not required for synchronization.
Trusted keys:
No trusted keys.
Unicast clients: Enable

Unicast servers:
Server          Key             Polling         Priority
---------       -----------     -----------     ----------
195.113.144.201 Disabled        Enabled         1
195.113.144.204 Disabled        Enabled         1

Here is how to check current time:


console#show clock

10:23:42 (UTC+0:00) Apr 29 2013
Time source is SNTP

Summary:
That's how to set time on DELL PowerConnect switches. Please note that time is in UTC +0:00 so when you want localize your time you can use "clock timezone" and "clock summer-time" command in conf mode but I don't like it. UTC time is better for troubleshooting.  



Tuesday, April 16, 2013

Network Overlays vs. Network Virtualization

Scott Lowe published very nice blog post (philosophy reflection) about "Network Overlays vs. Network Virtualization".

And this was my comment to his post ..

Scott, excellent write-up. As always. First of all I absolutely agree that good definitions, terminology, and conceptual view of particular layer is fundamental to fully understand any technology or system. Modern hardware infrastructure is complex and complexity is growing year on year.
Software programming has the same history. Who programs in assembler nowadays? Why we use object oriented programming more then 20 years? The answer is ... to avoid complexity and have control on system behavior. In software MVC model is often use and it stands for Model-View-Controller. Model is logical representation of something we want to run in software, View is simplified model presentation to end user and controller is engine behind the scene. The same concept apply to SDI (Software Defined Infrastructure) where SDN (Software Defined Network) is another example of the same story.
VMware did excellent job with infrastructure abstraction. Everything in VMware vSphere is object. Better to say managed object which has some properties and methods. So it is the model. vSphere Client or Web Client or vCLI or PowerCLI are different user interfaces into the system. So it is View. And who is Controller? Controller is vCenter because it orchestrates system behavior. vCenter controller includes prepackaged behavior (out-of-the-box) but it can be extended by custom scripts and orchestrated externally for example by vCenter Orchestrator. That's what I really love VMware vSphere. And it is from the begining architected to purely represent hardware infrastructure in software constructs.
Now back to Network Virtualization. In my opinion Network Overlay (for example VXLAN) is mandatory  component to abstract L2 from physical switches and have it in software. Particular Network overlay protocol must be implemented in "Network Hypervisor" which is software L2 switch. But "Network Hypervisor" has to implement also other protocols and components to be classified as "Network Virtualization" and not only as just another software vSwitch.
What Scott already mentioned in his post is that networking is not just L2 but also L3-7 network services so all network services must be available to speak about full "Network Virtualization". Am I correct Scott? And I feel the open question in this post ... who is the controller of "Network Virtualization"? :-)  

Monday, April 15, 2013

Tecomat: Industrial and home automation

How to get Managed Object Reference ID ( aka MoRef ) from vSphere

If you've already scripted vSphere infrastructure you probably already know that everything has software representation also known as Managed Object. Each Managed Object has unique identifier referenced as Managed Object ID. Sometimes this Managed Object ID is needed.

In PowerCLI you can get it via following two lines
$VM = Get-VM -Name $VMName 
$VMMoref = $VM.ExtensionData.MoRef.Value
You can also use Perl script leveraging VMware vSphere Perl SDK to get Managed Object ID for particular virtual machine or datastore. If you need MOID for another entity it's pretty easy to slightly change the script below.

Script is developed and tested on vMA (VMware management Assistant) in directory /usr/lib/vmware-vcli/apps/general and script name is getmoid.pl

Here is usage example how to get MOID of datastore called FreeNAS-iSCSI-01:
./getmoid.pl --server --username --password --dsname FreeNAS-iSCSI-01

Manage Object ID: datastore-162

Here is usage example how to get MOID of virtual machine called VMA:
./getmoid.pl --server --username --password --vmname VMA
Manage Object ID: vm-122

Any feedback or comments are welcome.

Monday, April 08, 2013

How to create FreeBSD memstick in running FreeBSD system

# make 2GB image file: dd if=/dev/zero of=./memstick.img bs=1m count=2000 # load image as virtual disk device: mdconfig -a -t vnode -f ./memstick.img -u 0 fdisk -iI /dev/md0 bsdlabel -wB /dev/md0s1 newfs /dev/md0s1a mount /dev/md0s1a /mnt cd /usr/src make installkernel installworld DESTDIR=/mnt umount /mnt # insert memstick now, assuming it will be /dev/da0... # raw copy virtual disk content to memstick. dd if=/dev/md0 of=/dev/da0 bs=1m

Saturday, March 30, 2013

VMware VXLAN Deployment Guide

Vyenkatesh Deshpande recently published "VMware Network Virtualization Design Guide" which can be downloaded here. However deployment guide which is here is very valuable if you really want to implement VXLAN in your environment.

Sunday, February 24, 2013

SG3_UTILS: How to send SCSI commands to devices

http://sg.danny.cz/sg/sg3_utils.html
http://linux.die.net/man/8/sg3_utils

The sg3_utils package contains utilities that send SCSI commands to devices. As well as devices on transports traditionally associated with SCSI (e.g. Fibre Channel (FCP), Serial Attached SCSI (SAS) and the SCSI Parallel Interface(SPI)) many other devices use SCSI command sets.


How the Cluster service reserves a disk and brings a disk online

http://support.microsoft.com/kb/309186

This article (link above) describes how the Microsoft Cluster service reserves and brings online disks that are managed by cluster service and related drivers.


Wednesday, February 20, 2013

PuppetLabs | Razor: Next-Generation Provisioning


System administrators require the same agility and productivity from their hardware infrastructure that they get from the cloud. In response, Puppet Labs and EMC collaboratively developed Razor, a next-generation physical and virtual hardware provisioning solution. Razor provides you with unique capabilities for managing your hardware infrastructure, including:
  • Auto-Discovered Real-Time Inventory Data
  • Dynamic Image Selection
  • Model-Based Provisioning
  • Open APIs and Plug-in Architecture
  • Metal-to-Cloud Application Lifecycle Management
Together, Razor and Puppet enable system administrators to automate every phase of the IT infrastructure lifecycle, from bare metal to fully deployed cloud applications.



Monday, February 18, 2013

Automated Storage Tiering - Sub-LUN tiering

Excellent comparisons between Automated Storage Tiering technologies of different vendors.
I personally believe automated storage tiering (AST) is really important for dynamic virtualized datacenter and because AST differs among vendors I'm going to collect important information for design considerations.  I don't want to preferred or offend against any product. Each product has some advantages and disadvantages and we as infrastructure architects has to fully and deeply understand technology to be able prepare good design which is the most important factor for reliable and well performed infrastructure.

Good mid-range storage products on the market (my personal opinion):

  • DELL Compellent
  • Hitachi HUS
  • EMC VNX

DELL Compellent
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 512kb, 2MB (default), 4MB
Tiering optimisation analysis period: [TBD]
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no

Hitachi HUS (HUS 110, HUS 130, HUS 150)
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 32MB
Tiering optimisation analysis period: 30 minutes
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no

EMC VNX 
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 1GB
Tiering optimisation analysis period: 60 minutes
Tiering optimisation relocation period: user defined
Tiering algorithm:
During user-defined relocation window, 1GB slice ae promoted according to both the rank ordering performed in the analysis stage and a tiering policy set by the user. During relocation, FAST VP relocates higher-priority slices to higher tiers; slices are relocated to lower tiers only if the space they occupy is required for a higher-priority slice. This way, FAST VP fully utilized the highest-performing spindles first. Lower-tier spindles are utilized as capacity demand grows. Relocation can be initiated manually or by a user configurable, automated scheduler. The relocation process targets to create 10% free capacity in the highest tiers in the pool. Free capacity in these tiers is used for new slice allocations of high priority LUNs between relocations.
QoS per LUN: yes


I've collected information from several public resources so if there is some wrong information please let me know directly or via comments.



Wednesday, February 13, 2013

Understand SCSI, SCSI command responses and sense codes

During troubleshooting VMware vSphere and storage related issues it is quite useful to understand SCSI command responses and sense codes.

Usually you can see in log something like "failed H:0x8 D:0x0 P:0x0 Possible sense data: 0xA 0xB 0xC"

H: means host codes
D: means device codes
P: means plugin codes
A: is Sense Key
B: is Additional Sense Code
C: is Additional Sense Code Qualifier

Some host codes:
0x2 Bus state busy
0x3 Timeout for other reason
0x5 Told to abort for some other reason
0x8 Bus reset

Some device codes:
00h  GOOD
02h  CHECK CONDITION
04h  CONDITION MET
08h  BUSY
18h  RESERVATION CONFLICT
28h  TASK SET FULL
30h  ACA ACTIVE
40h  TASK ABORTED

Some plugin codes:
00h  No error.
01h  An unspecified error occurred. Note: The I/O cmd should be tried.
02h  The device is a deactivated snapshot. Note: The I/O cmd failed because the device is a deactivated snapshot and so the LUN is read-only.
03h  SCSI-2 reservation was lost.
04h  The plug-in wants to requeue the I/O back. Note: The I/O will be retried.
05h  The test and set data in the ATS request returned false for equality.
06h  Allocating more thin provision space. Device server is in the process of allocating more space in the backing pool for a thin provisioned LUN.
07h  Thin provisioning soft-limit exceeded.
08h  Backing pool for thin provisioned LUN is out of space.

Some SCSI Sense Keys:
SCSI Sense Keys appear in the Sense Data available when a command returns with a CHECK CONDITION status. The sense key contains all the information necessary to understand why the command has failed.

Code Name
0h   NO SENSE
1h   RECOVERED ERROR
2h   NOT READY
3h   MEDIUM ERROR
4h   HARDWARE ERROR
5h   ILLEGAL REQUEST
6h   UNIT ATTENTION
7h   DATA PROTECT
8h   BLANK CHECK
9h   VENDOR SPECIFIC
Ah   COPY ABORTED
Bh   ABORTED COMMAND
Dh   VOLUME OVERFLOW
Eh   MISCOMPARE

There is VMware KB with further details here.

It is worth to read following documents
http://www.tldp.org/LDP/khg/HyperNews/get/devices/scsi.html (this is quite old document for programmers willing to write SCSI driver)
http://en.wikipedia.org/wiki/SCSI
http://en.wikipedia.org/wiki/SCSI_contingent_allegiance_condition
http://en.wikipedia.org/wiki/SCSI_Request_Sense_Command

What is SCSI reservation
http://mrwhatis.com/scsi-reservation.html

SCSI-3 Persistent Group Reservation
http://scsi3pr.blogspot.cz/


Tuesday, February 12, 2013

Using the VMware I/O Analyzer v1.5: A Guide to Testing Multiple Workloads

I encourage you to watch great video about good practice how to use VMware I/O Analyzer (VMware bundle of IOmeter).

There is mentioned very important step to get relevant results. The step is to increase the size of second disk in virtual machine (OVF appliance). Default size is 4GB which is not enough because it hits the cache of almost any storage array and results are unreal and misleading.

Video is here
bit.ly/118kWs1 
or here
http://www.youtube.com/watch?v=zHJr957kN1s&feature=youtu.be

Enjoy.