Wednesday, July 24, 2013

How to downgrade IBM V7000 (Storwize) firmware

Sometimes, especially when you do a problem management, you have a need to downgrade firmwares on some system components. I have such need for IBM V7000 storage array. Downgrade process is not documented in IBM official documentation so here is the downgrade process step by step:
  1. Double check you have IP addresses on management interfaces of both canisters (controllers)
  2. Login to management interface of one particular canister over https. https://[ip_mgmt_canister]. You have to use superuser credentials. Default IBM Storwize superuser password is passw0rd
  3. Switch node to serrvice state. You should wait 15-20 minutes
  4. Login to second node management interface. 
  5. Switch second node to service state. You should wait another 15-20 minutes
  6. Double check both nodes are in service state
  7. Login to one node and choose action "Reinstall software". Browse and upload firmware image via web browser. Software reinstallation takes a while. You have to wait approximattely one or two hours. In the mean time you can ping canyster management IP addresses to check when nodes comming back. 
  8. Repeat software reinstallation for second node.
  9. Please be aware that storage configuration is lost after software reinstallation. Therefore you have to use default password for superuser. Recall it is passw0rd
  10. When both nodes are up and running login to one canister node management interface and exit both nodes from service state. It can takes another 15-20 minutes.
  11. When nodes are active you have to regenerate Cluster ID. You have to go to "Configure Enclosure" and enable checkbox "Reset System ID".
  12. After all these actions you have Storwize ready to form a new Cluster. So create cluster and assign cluster virtual IP address you will use for standard storage management.

Sunday, July 14, 2013

ESX host remote syslog configuration

For remote CLI you can use vMA or vCLI. Here is the example how to configure ESX host (10.10.1.71) to send logs remotely to syslog server listening on IP address 10.10.4.72 on tcp port 514. 

First of all we have to instruct ESX where is the syslog server.
esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog config set --loghost='tcp://10.10.4.72:514'
Then syslog service on ESX host must be restarted to accept configuration change.
esxcli -s 10.10.1.71 -u root -p Passw0rd. system syslog reload

ESX firewall must be reconfigured to allow syslog traffic
esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli -s 10.10.1.71 -u root -p Passw0rd. network firewall refresh

If you want to test or troubleshoot syslog logging you can login to ESX host and use logger command to send test message to syslog.
logger "Test syslog over network"

Tuesday, July 09, 2013

Excellent article: "Anatomy of an Ethernet Frame"

Trey Layton (aka EthernetStorageGuy) wrote excellent article about MTU sizes and Jumbo Frame settings. The article is here. In the article you will learn what MTU size parameters you have to configure in the path among server, network gear and storage. It is crucial to understand difference between payload (usually 1500 or 9000) and different frame sizes (usually 1522 or 9018 or 9022 or  9216) on networking equipment.

Here is summation of  Trey's deep Ethernet frame anatomy in to the simple best practice. "If you want to implement Jumbo Frames use pure datacenter networking equipment and setup MTU size to the device maximum which is usually 9216."

DELL Open Manage Essentials 1.2 has been released

Dell OpenManage Essentials is a 'one to many' console used to monitor Dell Enterprise hardware. It can discover, inventory, and monitor the health of Dell Servers, Storage, and network devices. Essentials can also update the drivers and BIOS of your Dell PowerEdge Servers and allow you to run remote tasks. OME can increase system uptime, automate repetitive tasks, and prevent interruption in critical business operations.

It can be downloaded here.

Fixes & Enhancements
Fixes:
  1. Multiple defect fixes and performance improvements
Enhancements:
  1. Support for Discovery, Inventory and Map View for Dell PowerEdge VRTX devices. 
  2. Addition of Microsoft Windows Server 2012 as a supported operating system for the management station.
  3. Context sensitive Search functionality. 
  4. Ability to configure OpenManage Essentials to send the warranty status of your devices through email at periodic intervals. 
  5. Ability to configure OpenManage Essentials to generate a warranty scoreboard based on your preference and display a notification icon in the heading banner when the warranty scoreboard is available.
  6. Enhanced support for Dell Compellent, Dell Force10 E-Series and C-Series, Dell PowerConnect 8100 series, Dell PowerVault FS7500, and PowerVault NX3500 devices. 
  7. Support for installing OpenManage Essentials on the domain controller.
  8. Device Group Permissions portal. 
  9. Additional reports: Asset Acquisition Information, Asset Maintenance Information, Asset Support Information, and Licensing Information. 
  10. Addition of a device group for Citrix XenServers and Dell PowerEdge C servers in the device tree. 
  11. Availability of storage and controller information in the device inventory for the following client systems: Dell OptiPlex, Dell Latitude, and Dell Precision.
  12. CLI support for discovery, inventory, status polling, and removal of devices from the device tree. 
  13. Availability of sample command line remote tasks for uninstalling OpenManage Server Administrator and applying a server configuration on multiple managed nodes. 
  14. Support for SUDO users in Linux for system updates and OMSA deploy tasks. 
  15. Display of a notification icon in the heading banner to indicate the availability of a newer version of OpenManage Essentials. 
  16. Support for enabling or disabling rebooting after system update for out-of band (iDRAC) system updates.
  17. Support for re-running system update and OpenManage Server Administrator (OMSA) deployment tasks.
  18. Support for Single Sign-On (SSO) for iDRAC and CMC devices. 
  19. Ability to log on as a different user.


Tuesday, July 02, 2013

How to change default path selection policy for particular storage array?

Sometimes the firmware in storage array has some problems and you have to "downgrade" functionality to achieve operable system. That's sometimes happen for some ALUA storage systems where Round Robin path policy or Fixed path policy (aka FIXED) should work but doesn't because of firmware issue.

So relatively simple solution is to switch back from more advanced round robin policy to legacy - but properly functioning -  Most Recently Used path policy (aka MRU) normally used for active/passive storage systems. 

Note: Please be aware that some storage vendors saying they have active/active storage even they have not. Usually and probably more precisely they call it "dual-active storage" which is not same as active/active. Maybe I should write another post about this topic.

You can change Path Selection Policy by several ways and as always the best option depends on your specific requirements and constrains.

However, if you have only one  instance of some storage type connected to your ESX hosts you can simply change default path selection policy for this particular SATP type. Let's assume you have some LSI storage.

Below is simple esxcli command how to do it ...

esxcli storage nmp satp set --default-psp=VMW_PSP_MRU --satp=VMW_SATP_LSI

... and then your default PSP for VMW_SATP_LSI is now VMW_PSP_MRU

One thing you must be aware … if you in the past explicitly changed any devices (disks) to another path selection policy then default PSP will not change even you have another default path policy. There is not esxcli mechanism how to change devices back to accept  default PSP for particular SATP type. Only solution is to edit /etc/vmware/esx.conf

All previous explicit changes are written in /etc/vmware/esx.conf  so it is pretty simple to find it and remove these lines form config file.  I silently assume you do such operations in maintenance mode so after ESX reboot all your paths for your devices will follow default SATP path selection policy. 

BTW: That’s why I generally don’t recommend to change PSP for particular device when not necessary. Sometimes it is necessary for example for RDM’s participating in MSCS cluster. But usually it is abused by admins and implementation engineers.  I strongly believe it is always better to set default PSP to behave as required.

Do you want to test Heavy Load? Try Heavy Load tool.

Bring your PC to its limits with the freeware stress test tool HeavyLoad. HeavyLoad puts your workstation or server PC under a heavy load and lets you test whether they will still run reliably.

Look at http://www.jam-software.com/heavyload/

Sunday, June 30, 2013

Simple UNIX Shell Script for generating disk IO trafic

Here is pretty easy unix shell script for disk I/O generation.
#!/bin/sh
dd_threads="0 1 2 3 4 5 6 7 8 9"
finish () {
  killall dd
  for i in $dd_threads
  do
    rm /var/tmp/dd.$i.test
  done
  exit 0;
}
trap 'finish' INT
while true
do
  for i in $dd_threads
  do
    dd if=/dev/random of=/var/tmp/dd.$i.test bs=512 count=100000 &
  done
done
Generated IOs (aka TPS - transaction per second) can be watched by following command
iostat -d -c 100000
Script can be terminated by pressing CTRL-C.

Thursday, June 27, 2013

Calculating optimal segment size and stripe size for storage LUN backing vSphere VMFS Datastore

Colleague of mine (BTW very good Storage Expert) asked me what is the best segment size for storage LUN used for VMware vSphere Datastore (VMFS). Recommendations can vary among storage vendors and models but I think the basic principles are same for any storage.

I found IBM RedBook [SOURCE: IBM RedBook redp-4609-01] explanation the most descriptive, so here it is.
The term segment size refers to the amount of data that is written to one disk drive in anarray before writing to the next disk drive in the array, for example, in a RAID5, 4+1 array with a segment size of 128 KB, the first 128 KB of the LUN storage capacity is written to the first disk drive and the next 128 KB to the second disk drive. For a RAID1, 2+2 array, 128 KB of an I/O is written to each of the two data disk drives and to the mirrors. If the I/O size is larger than the number of disk drives times 128 KB, this pattern repeats until the entire I/O is completed. For very large I/O requests, the optimal segment size for a RAID array is one that distributes a single host I/O across all data disk drives. 
The formula for optimal segment size is:
LUN segment size = LUN stripe width ÷ number of data disk drives 
For RAID 5, the number of data disk drives is equal to the number of disk drives in the array minus 1, for example:
RAID5, 4+1 with a 64 KB segment size = (5-1) * 64KB = 256 KB stripe width 
For RAID 1, the number of data disk drives is equal to the number of disk drives divided by 2, for example:
RAID 10, 2+2 with a 64 KB segment size = (2) * 64 KB = 128 KB stripe width 
For small I/O requests, the segment size must be large enough to minimize the number ofsegments (disk drives in the LUN) that must be accessed to satisfy the I/O request, that is, to minimize segment boundary crossings. 
For IOPS environments, set the segment size to 256KB or larger, so that the stripe width is at least as large as the median I/O size. 
IBM Best practice: For most implementations set the segment size of VMware data partitions to 256KB.

Note: If I decrypting IBM terminology correctly then IBM mentioned term "stripe width" is actually "data stripe size". We need to clear terminology because normally is the term "stripe width" used as number of disks in RAID group. "Data stripe size" is payload without the parity. The parity is stored on another segment(s) dependent on selected RAID level.

For clear understanding terminology I've created  RAID 5 (4+1) segment/stripe visualization depicted bellow.

RAID 5 (4+1) striping example
RAID 5 (4+1) striping example

Even I found this IBM description very informative I'm not sure why they recommend to use segment size 256KB for VMware. It is true that the biggest IO size issued from ESX can be by default 32MB because bigger IOs issued from guest OS ESX splits into more IOs (for more information about big IO split see this blog post). However the most important is IO size issued from guest OSes. If you want to monitor max/average/median IO size from ESX you can use tool vscsiStats already included in ESXi for such purpose. It allows you to show histogram which is really cool (for more information about vscsiStats read this excellent blog post). So based on all these assumptions and also my own IO size monitoring in the field it seems to me that average IO size issued from ESX is usually somewhere between 32 and 64KB. So let's use 64KB as average data stripe (IO size issued from OS). Then for RAID 5 (4+1) data stripe will be composed from 4 segments and optimal segment size in this particular case should be 16KB (64/4).

Am I right or I missed something? Any comments are welcome and highly appreciated.

Update 2014/01/31:
We are discussing this topic very frequently with my colleague who work as DELL storage specialist. The theory is nice but only the real test can prove any theory. Recently he performed set of IOmeter tests against DELL PV MD3600f which is actually the same array as IBM DS3500. He found that optimal performance (# of IOPS versus response times) is when segment size is as close as possible to IO size issued from operating system. So key takeaway from this exercise is that optimal segment size for example above is not 16KB but 64KB. Now I understand IBM general recommendation (best practice) to use 256KB segment size for VMware workloads as this is the biggest segment size which can be chosen.

Update 2014/07/23:
After more thinking about this topic I've realized that idea to use the segment size bigger than your biggest IO size can make sense from several reasons

  • each IO will get single spindle (disk) to handle this IO which will use queues down the route and will be served in spindle latency time which is the minimal one for this single IO, right?
  • typical virtual infrastructure environment is running several VMs generating several IOs based on queues available in the guest OS, ESX layer disk scheduler settings (see more here on Duncan Epping blog) so at the end of the day you are able to generate lot of IOPSes by different threads and load is evenly distributed across RAID group
However, please note, that all this discussion was related to legacy (traditional) storage architectures. Some modern (virtualized) storages are doing some magic on their controllers like I/O Coalescing. I/O Coalescing is IO optimization leveraging reordering smaller IO writes to another bigger IO in controller cache and sending this bigger IO down to the disks. This can significantly change segment size recommendations so please try to understand particular storage architecture or follow storage vendor best practices and try to understand the reason of these recommendations in your particular use case. I remember EMC Clariions used IO coalescing into 64KB IO blocks. 

Related resources:

Wednesday, June 26, 2013

IOBlazer

IOBlazer is a multi-platform storage stack micro-benchmark. IOBlazer runs on Linux, Windows and OSX and it is capable of generating a highly customizable workload. Parameters like IO size and pattern, burstiness (number of outstanding IOs), burst interarrival time, read vs. write mix, buffered vs. direct IO, etc., can be configured independently. IOBlazer is also capable of playing back VSCSI traces captured using vscsiStats. The performance metrics reported are throughput (in terms of both IOPS and bytes/s) and IO latency.
IOBlazer evolved from a minimalist MS SQL Server emulator which focused solely on the IO component of said workload. The original tool had limited capabilities as it was able to generate a very specific workload based on the MS SQL Server IO model (Asynchronous, Un-buffered, Gather/Scatter). IOBlazer has now a far more generic IO model, but two limitations still remain:
  1. The alignment of memory accesses on 4 KB boundaries (i.e., a memory page)
  2. The alignment of disk accesses on 512 B boundaries (i.e., a disk sector).
Both limitations are required by the gather/scatter and un-buffered IO models.
A very useful new feature is the capability to playback VSCSI traces captured on VMware ESX through the vscsiStats utility. This allows IOBlazer to generate a synthetic workload absolutely identical to the disk activity of a Virtual Machine, ensuring 100% experiment repeatability.

TBD - TEST & WRITE REVIEW

PXE Manager for vCenter

PXE Manager for vCenter enables ESXi host state (firmware) management and provisioning. Specifically, it allows:
  • Automated provisioning of new ESXi hosts stateless and stateful (no ESX)
  • ESXi host state (firmware) backup, restore, and archiving with retention
  • ESXi builds repository management (stateless and statefull)
  • ESXi Patch management
  • Multi vCenter support
  • Multi network support with agents (Linux CentOS virtual appliance will be available later)
  • Wake on Lan
  • Hosts memtest
  • vCenter plugin
  • Deploy directly to VMware Cloud Director
  • Deploy to Cisco UCS blades
TBD - TEST & WRITE REVIEW

vBenchmark

vBenchmark provides a succinct set of metrics in these categories for your VMware virtualized private cloud. Additionally, if you choose to contribute your metrics to the community repository, vBenchmark also allows you to compare your metrics against those of comparable companies in your peer group. The data you submit is anonymized and encrypted for secure transmission.

Key Features:

  • Retrieves metrics across one or multiple vCenter servers
  • Allows inclusion or exclusion of hosts at the cluster level
  • Allows you to save queries and compare over time to measure changes as your environment evolves
  • Allows you to define your peer group by geographic region, industry and company size, to see how you stack up
TBD - TEST & WRITE REVIEW

Tuesday, June 25, 2013

How to create your own vSphere Performance Statistics Collector

Statsfeeder is a tool that enables performance metrics to be retrieved from vCenter and sent to multiple destinations, including 3rd party systems. The goal of StatsFeeder is to make it easier to collect statistics in a scalable manner. The user specifies the statistics to be collected in an XML file, and StatsFeeder will collect and persist these stats. The default persistence mechanism is comma-separated values, but the user can extend it to persist the data in a variety of formats, including a standard relational database or Key-value store. StatsFeeder is written leveraging significant experience with the performance APIs, allow the metrics to be retrieved in the most efficient manner possible.
White paper located at StatsFeeder: An Extensible Statistics Collection Framework for Virtualized Environments can give you better understanding how it work and how to leverage it.




Monday, June 24, 2013

vCenter Single Sign-On Design Decision Point

When you designing vSphere 5.1 you have to implement vCenter SSO. Therefore you have to make design decision what SSO mode to choose.

There are actually three available options

  1. Basic
  2. HA (don't mix with vSphere HA)
  3. Multisite
Justin King wrote excellent blog post about SSO here and it is worth source of information to make right design decision. I fully agree with Justin and recommending Basic SSO to my customers if possible. SSO Server protection  can be achieved by standard backup/restore methods and SSO High Availability can be increased by vSphere HA. All these methods are well known and long time used.

You have to use Multisite SSO when vCenter linked-mode is required but think twice if you really need it and benefits overweight drawbacks.

Thursday, June 20, 2013

Force10 Open Automation Guide - Configuration and Command Line Reference

This document describes the components and uses of the Open Automation Framework designed to run on the Force10 Operating System (FTOS), including:
• Smart Scripting
• Virtual Server Networking (VSN)
• Programmatic Management
• Web graphic user interface (GUI) and HTTP Server

http://www.force10networks.com/CSPortal20/KnowledgeBase/DOCUMENTATION/CLIConfig/FTOS/Automation_2.2.0_4-Mar-2013.pdf

Tuesday, June 18, 2013

How to – use vmkping to verify Jumbo Frames

Here is nice blog post about Jumbo Frame configuration on vSphere and how to test it works as expected. This is BTW excellent test for Operational Verification (aka Test Plan).

Architectural Decisions

Josh Odgers – VMware Certified Design Expert (VCDX) #90 is continuously building database of architectural decisions available at  http://www.joshodgers.com/architectural-decisions/

It is very nice example of one architecture approach.
 

Monday, June 17, 2013

PowerCLI One-Liners to make your VMware environment rock out!

Christopher Kusek wrote excellent blog post about PowerCLI useful scripts fit single line. He call it one-liners. These one-liners can significantly help you on daily vSphere administration. On top of that you can very easily learn PowerCLI constructs just from reading these one-liners.

http://www.pkguild.com/2013/06/powercli-one-liners-to-make-your-vmware-environment-rock-out/

Tuesday, June 04, 2013

Software Defined Networking - SDN

SDN is another big topic in modern virtualized datacenter so it is worth to understand what it is and how it can help us to solve real datacenter challenges.

Brad Hedlund's explanation "What is Network Virtualization"
http://bradhedlund.com/2013/05/28/what-is-network-virtualization/
Bred Hedlund is very well known netwoking expert. Now he works for VMware | Nicira participating on VMware NSX product which should be next network virtualisation platform (aka network hypervisor). He is ex-CISCO and ex-DELL | Force10 so there is big probability he fully understand what is going on.

It is obvious that "dynamic service insertion" is the most important thing in SDN. OpenFlow and CISCO vPath is trying to do it but each in different way. Same goal but with different approach. What is better? Who knows? The future and real experience will show us what is better. Jason Edelman's blog post very nicely and clearly compares both approaches.
http://www.jedelman.com/1/post/2013/04/openflow-vpath-and-sdn.html

CISCO as long term networking leader and pioneer has of course its own vision of SDN. Nexus 1000V and Virtual Network Overlays Play for CISCO Pivotal Role in Software Defined Networks. Very nice explanation of CISCO approach is available at
http://blogs.cisco.com/datacenter/nexus-1000v-and-virtual-network-overlays-play-pivotal-role-in-software-defined-networks/


Saturday, May 25, 2013

PernixData: New storage statusquo is comming

Storage SME's knows for ages that storage design begins with performance. The storage performance is usually much more important then capacity. One IOPS  cost more money then one GB of storage. Flash disks, EFD's and SSD's changed storage industry already. But the magic and the future is in software. PernixData FVP (Flash Virtualization Platform) looks like very intelligent, fully redundant and reliable software cluster aware storage acceleration platform. It leverages any local flash devices to accelerate any back-end storage used for server virtualization. Right now only VMware vSphere is supported but solution is hypervisor agnostic and it is just a matter of time when it will be ported to another server virtualization platform like Hyper-V, Xen, or KVM.

PernixData setups absolutely new storage quality in virtualized datacenter. If you have issue with storage response time (latency) then look at PernixData FVP. But what impressed me is the future because I believe the platform can be improved significantly and new functionality will come soon. I can imagine data compression and deduplication, data encryption, vendor independent replication, clonning, snapshoting, etc.

So software defined storage virtualization just began.

Happy journey PernixData.

For more information look at
http://www.pernixdata.com/
http://www.pernixdata.com/SFD3/