I believe the Next Generation Computing is Software Defined Infrastructure on top of the robust physical infrastructure. You can ask me anything about enterprise infrastructure (virtualization, compute, storage, network) and we can discuss it deeply on this blog. Don't hesitate to contact me.
Saturday, March 30, 2013
Sunday, March 17, 2013
VMware PowerCLI - prepare environment after installation
Set-ExecutionPolicy RemoteSigned
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore
Tuesday, March 12, 2013
Finding slow draining devices
Very nice explanation
http://www.virtualinstruments.com/sanbestpractices/best-practices/finding-slow-draining-devices/
"Slow drain device" definition
https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/defining_san_performance_related_terms9?lang=en_us
How to deal with slow drain devices?
https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/how_to_deal_with_slow_drain_devices20?lang=en
Brocade tools for identification of "slow draining devices"
http://community.brocade.com/thread/4010?start=0&tstart=0
And here is the CISCO way how to detect "slow draining devices"
http://www.cisco.com/en/US/docs/switches/datacenter/mds9000/sw/nx-os/configuration/guides/int/int_cli_4_2_published/intf.html#wp1703520
And here is excellent whitepaper how proactively monitor SAN and storage related issues
http://www.hds.com/assets/pdf/avoiding-san-performance-problems-whitepaper.pdf
http://www.virtualinstruments.com/sanbestpractices/best-practices/finding-slow-draining-devices/
"Slow drain device" definition
https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/defining_san_performance_related_terms9?lang=en_us
How to deal with slow drain devices?
https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/how_to_deal_with_slow_drain_devices20?lang=en
Brocade tools for identification of "slow draining devices"
http://community.brocade.com/thread/4010?start=0&tstart=0
And here is the CISCO way how to detect "slow draining devices"
http://www.cisco.com/en/US/docs/switches/datacenter/mds9000/sw/nx-os/configuration/guides/int/int_cli_4_2_published/intf.html#wp1703520
And here is excellent whitepaper how proactively monitor SAN and storage related issues
http://www.hds.com/assets/pdf/avoiding-san-performance-problems-whitepaper.pdf
Tuesday, February 26, 2013
Sunday, February 24, 2013
SG3_UTILS: How to send SCSI commands to devices
http://sg.danny.cz/sg/sg3_utils.html
http://linux.die.net/man/8/sg3_utils
The sg3_utils package contains utilities that send SCSI commands to devices. As well as devices on transports traditionally associated with SCSI (e.g. Fibre Channel (FCP), Serial Attached SCSI (SAS) and the SCSI Parallel Interface(SPI)) many other devices use SCSI command sets.
http://linux.die.net/man/8/sg3_utils
The sg3_utils package contains utilities that send SCSI commands to devices. As well as devices on transports traditionally associated with SCSI (e.g. Fibre Channel (FCP), Serial Attached SCSI (SAS) and the SCSI Parallel Interface(SPI)) many other devices use SCSI command sets.
How the Cluster service reserves a disk and brings a disk online
http://support.microsoft.com/kb/309186
This article (link above) describes how the Microsoft Cluster service reserves and brings online disks that are managed by cluster service and related drivers.
This article (link above) describes how the Microsoft Cluster service reserves and brings online disks that are managed by cluster service and related drivers.
Wednesday, February 20, 2013
PuppetLabs | Razor: Next-Generation Provisioning
System administrators require the same agility and productivity from their hardware infrastructure that they get from the cloud. In response, Puppet Labs and EMC collaboratively developed Razor, a next-generation physical and virtual hardware provisioning solution. Razor provides you with unique capabilities for managing your hardware infrastructure, including:
- Auto-Discovered Real-Time Inventory Data
- Dynamic Image Selection
- Model-Based Provisioning
- Open APIs and Plug-in Architecture
- Metal-to-Cloud Application Lifecycle Management
Together, Razor and Puppet enable system administrators to automate every phase of the IT infrastructure lifecycle, from bare metal to fully deployed cloud applications.
Monday, February 18, 2013
Automated Storage Tiering - Sub-LUN tiering
Excellent comparisons between Automated Storage Tiering technologies of different vendors.
Good mid-range storage products on the market (my personal opinion):
DELL Compellent
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 512kb, 2MB (default), 4MB
Tiering optimisation analysis period: [TBD]
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no
Hitachi HUS (HUS 110, HUS 130, HUS 150)
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 32MB
Tiering optimisation analysis period: 30 minutes
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no
EMC VNX
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 1GB
Tiering optimisation analysis period: 60 minutes
Tiering optimisation relocation period: user defined
Tiering algorithm:
I've collected information from several public resources so if there is some wrong information please let me know directly or via comments.
- http://searchstorage.techtarget.com/feature/Sub-LUN-tiering-Five-key-questions-to-consider
- http://www.computerweekly.com/feature/Automated-storage-tiering-product-comparison
- http://searchsolidstatestorage.techtarget.com/news/1378753/Vendors-take-different-approaches-to-automated-tiered-storage-software-for-solid-state-drives
- http://www.emc.com/collateral/software/white-papers/h8058-fast-vp-unified-storage-wp.pdf
Good mid-range storage products on the market (my personal opinion):
- DELL Compellent
- Hitachi HUS
- EMC VNX
DELL Compellent
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 512kb, 2MB (default), 4MB
Tiering optimisation analysis period: [TBD]
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no
Hitachi HUS (HUS 110, HUS 130, HUS 150)
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 32MB
Tiering optimisation analysis period: 30 minutes
Tiering optimisation relocation period: [TBD]
Tiering algorithm: [TBD]
QoS per LUN: no
EMC VNX
Tiers: SSD, SAS, NL-SAS (SATA)
AST Sub-LUN tiering block: 1GB
Tiering optimisation analysis period: 60 minutes
Tiering optimisation relocation period: user defined
Tiering algorithm:
During user-defined relocation window, 1GB slice ae promoted according to both the rank ordering performed in the analysis stage and a tiering policy set by the user. During relocation, FAST VP relocates higher-priority slices to higher tiers; slices are relocated to lower tiers only if the space they occupy is required for a higher-priority slice. This way, FAST VP fully utilized the highest-performing spindles first. Lower-tier spindles are utilized as capacity demand grows. Relocation can be initiated manually or by a user configurable, automated scheduler. The relocation process targets to create 10% free capacity in the highest tiers in the pool. Free capacity in these tiers is used for new slice allocations of high priority LUNs between relocations.QoS per LUN: yes
I've collected information from several public resources so if there is some wrong information please let me know directly or via comments.
Wednesday, February 13, 2013
Understand SCSI, SCSI command responses and sense codes
During troubleshooting VMware vSphere and storage related issues it is quite useful to understand SCSI command responses and sense codes.
Usually you can see in log something like "failed H:0x8 D:0x0 P:0x0 Possible sense data: 0xA 0xB 0xC"
H: means host codes
D: means device codes
P: means plugin codes
A: is Sense Key
B: is Additional Sense Code
C: is Additional Sense Code Qualifier
Some host codes:
0x2 Bus state busy
Some device codes:
00h GOOD
02h CHECK CONDITION
04h CONDITION MET
08h BUSY
18h RESERVATION CONFLICT
28h TASK SET FULL
30h ACA ACTIVE
40h TASK ABORTED
Some plugin codes:
00h No error.
01h An unspecified error occurred. Note: The I/O cmd should be tried.
02h The device is a deactivated snapshot. Note: The I/O cmd failed because the device is a deactivated snapshot and so the LUN is read-only.
03h SCSI-2 reservation was lost.
04h The plug-in wants to requeue the I/O back. Note: The I/O will be retried.
05h The test and set data in the ATS request returned false for equality.
06h Allocating more thin provision space. Device server is in the process of allocating more space in the backing pool for a thin provisioned LUN.
07h Thin provisioning soft-limit exceeded.
08h Backing pool for thin provisioned LUN is out of space.
Some SCSI Sense Keys:
SCSI Sense Keys appear in the Sense Data available when a command returns with a CHECK CONDITION status. The sense key contains all the information necessary to understand why the command has failed.
Code Name
0h NO SENSE
1h RECOVERED ERROR
2h NOT READY
3h MEDIUM ERROR
4h HARDWARE ERROR
5h ILLEGAL REQUEST
6h UNIT ATTENTION
7h DATA PROTECT
8h BLANK CHECK
9h VENDOR SPECIFIC
Ah COPY ABORTED
Bh ABORTED COMMAND
Dh VOLUME OVERFLOW
Eh MISCOMPARE
There is VMware KB with further details here.
It is worth to read following documents
http://www.tldp.org/LDP/khg/HyperNews/get/devices/scsi.html (this is quite old document for programmers willing to write SCSI driver)
http://en.wikipedia.org/wiki/SCSI
http://en.wikipedia.org/wiki/SCSI_contingent_allegiance_condition
http://en.wikipedia.org/wiki/SCSI_Request_Sense_Command
What is SCSI reservation
http://mrwhatis.com/scsi-reservation.html
SCSI-3 Persistent Group Reservation
http://scsi3pr.blogspot.cz/
Usually you can see in log something like "failed H:0x8 D:0x0 P:0x0 Possible sense data: 0xA 0xB 0xC"
H: means host codes
D: means device codes
P: means plugin codes
A: is Sense Key
B: is Additional Sense Code
C: is Additional Sense Code Qualifier
Some host codes:
0x2 Bus state busy
0x3 Timeout for other reason
0x5 Told to abort for some other reason
0x8 Bus resetSome device codes:
00h GOOD
02h CHECK CONDITION
04h CONDITION MET
08h BUSY
18h RESERVATION CONFLICT
28h TASK SET FULL
30h ACA ACTIVE
40h TASK ABORTED
Some plugin codes:
00h No error.
01h An unspecified error occurred. Note: The I/O cmd should be tried.
02h The device is a deactivated snapshot. Note: The I/O cmd failed because the device is a deactivated snapshot and so the LUN is read-only.
03h SCSI-2 reservation was lost.
04h The plug-in wants to requeue the I/O back. Note: The I/O will be retried.
05h The test and set data in the ATS request returned false for equality.
06h Allocating more thin provision space. Device server is in the process of allocating more space in the backing pool for a thin provisioned LUN.
07h Thin provisioning soft-limit exceeded.
08h Backing pool for thin provisioned LUN is out of space.
Some SCSI Sense Keys:
SCSI Sense Keys appear in the Sense Data available when a command returns with a CHECK CONDITION status. The sense key contains all the information necessary to understand why the command has failed.
Code Name
0h NO SENSE
1h RECOVERED ERROR
2h NOT READY
3h MEDIUM ERROR
4h HARDWARE ERROR
5h ILLEGAL REQUEST
6h UNIT ATTENTION
7h DATA PROTECT
8h BLANK CHECK
9h VENDOR SPECIFIC
Ah COPY ABORTED
Bh ABORTED COMMAND
Dh VOLUME OVERFLOW
Eh MISCOMPARE
There is VMware KB with further details here.
It is worth to read following documents
http://www.tldp.org/LDP/khg/HyperNews/get/devices/scsi.html (this is quite old document for programmers willing to write SCSI driver)
http://en.wikipedia.org/wiki/SCSI
http://en.wikipedia.org/wiki/SCSI_contingent_allegiance_condition
http://en.wikipedia.org/wiki/SCSI_Request_Sense_Command
What is SCSI reservation
http://mrwhatis.com/scsi-reservation.html
SCSI-3 Persistent Group Reservation
http://scsi3pr.blogspot.cz/
Tuesday, February 12, 2013
Using the VMware I/O Analyzer v1.5: A Guide to Testing Multiple Workloads
I encourage you to watch great video about good practice how to use VMware I/O Analyzer (VMware bundle of IOmeter).
There is mentioned very important step to get relevant results. The step is to increase the size of second disk in virtual machine (OVF appliance). Default size is 4GB which is not enough because it hits the cache of almost any storage array and results are unreal and misleading.
Video is here
bit.ly/118kWs1
or here
http://www.youtube.com/watch?v=zHJr957kN1s&feature=youtu.be
Enjoy.
Video is here
bit.ly/118kWs1
or here
http://www.youtube.com/watch?v=zHJr957kN1s&feature=youtu.be
Enjoy.
Tuesday, January 22, 2013
HP Flex-10 Design, Plan, Implement, Test
Before design phase of VMware vSphere Infrastructure I recommend to read blog post "Understanding HP Flex-10 Mappings with VMware ESX/vSphere" to get general overview about server infrastructure and advanced network interconnect. During design phase prepare detail test plan (aka operational verification) and test it during implementation phase. You can use blog post "Testing Scenario's VMware / HP c-Class Infrastructure" as a template for your test plan. I don't doubt that you normally test infrastructure before put it into production :-)
Saturday, January 19, 2013
MSCS RDMs causing long boot of ESX
That's because RDM LUN attached to MSCS cluster has permanent SCSI reservation initiated by active node of cluster.
In ESX 5 you have to mark all such LUNs as perennially reserved and your ESX boot can be fast as usual.
Here is CLI command to mark LUN
esxcli storage core device setconfig -d naa.id --perennially-reserved=true
This has to be changed on all ESX hosts with visibility to the LUN.
More info at http://kb.vmware.com/kb/1016106
In ESX 5 you have to mark all such LUNs as perennially reserved and your ESX boot can be fast as usual.
Here is CLI command to mark LUN
esxcli storage core device setconfig -d naa.id --perennially-reserved=true
This has to be changed on all ESX hosts with visibility to the LUN.
More info at http://kb.vmware.com/kb/1016106
Wednesday, January 09, 2013
How to calculate storage performance from host perspective
Storage performance is usually quantified as IOPS (I/O transactions per second). The performance from storage perspective is quite easy. It really depends on speed of each particular disk - also known as spindle. Each disk has some speed and bellow are written average values which are usually used for storage performance calculation
Here are most common RAID types used on standard disk arrays:
So performance from storage perspective and from host perspective are different. Performance from storage perspective is simply summation of speed of all disks in RAID group. Performance from host perspective depends on selected RAID type.
To calculate estimated storage performance from host perspective we need to use the formula of several variables.
First of all let's define variables
P=write penalty of selected RAID type
R=Read % of disk workload
W=Write % of disk workload
Do you want to know all steps how to get this formula? It is simple. Start from another formula which describes storage behavior.
R*(1*H) + W*(P*H) = S
Above formula says - each host read IOPS generates single storage IOPS but each write IOPS generates multiple IOPS based on RAID type penalty (P).
Does it make sense? If not example can help you to understand.
My RAID group has 9 SAS disks 600GB/15k RPM and I use RAID 5 (8+1).
So from storage perspective I have 9 disks where each can perform 180 IOPS which means I have performance 1620 IOPS from storage perspective. Let's assume I have strange read/write ratio 20:80.
S = 1620
P = 4 (because of RAID 5)
R = 20% = 0.2
W= 80% = 0.8
I need to know H ... storage performance from host perspective.
H = 1620 / (0.2 + 0.8 * 4) = 1620 / 3.4 = 476.47 IOPS from host perspective.
Note: Modern disk arrays often offer AST (Automated Storage Tiering). The calculation described in this blog post is valid even for those disk arrays. You have to fully understand internal architecture and design of particular storage but generally all storage pools are build from some sub disk groups bundled and protected by some RAID type. So if you have 125 disks bundled by 5 disks in RAID 5 (4+1) then the principle is the same. We have 125 spindles and write penalty is 4 because of RAID 5.
- SATA disk = 80 IOPS
- SCSI DISK(SAS or FC) 10k RPM = 150 IOPS
- SCSI DISK(SAS or FC) 15k RPM = 180 IOPS
- SSD disk (SLC aka EFD) = 6000 IOPS
Here are most common RAID types used on standard disk arrays:
- RAID 0 - no redundancy, disk bundle, higest performance => WRITE PENALTY = 0
- RAID 1 - disk mirror, max bundle of 2 disks, high performance => WRITE PENALTY = 2
- RAID 10 - RAID 1 + RAID 0 for bundling disk pairs, max disk bundle depends on disk array limits, high performance => WRITE PENALTY = 2
- RAID 5 - block level striping with rotated parity, max disk bundle depends on disk array limits, moderate performance => WRITE PENALTY = 4
- RAID 6 - block level striping with double parity, max disk bundle depends on disk array limits, lower performance => WRITE PENALTY = 6
So performance from storage perspective and from host perspective are different. Performance from storage perspective is simply summation of speed of all disks in RAID group. Performance from host perspective depends on selected RAID type.
To calculate estimated storage performance from host perspective we need to use the formula of several variables.
First of all let's define variables
P=write penalty of selected RAID type
R=Read % of disk workload
W=Write % of disk workload
H=IOPS from host perspective
S=IOPS from storage perspective
and now we can write formula to calculate storage performance from host perspective
H = S / (R+W*P)
Do you want to know all steps how to get this formula? It is simple. Start from another formula which describes storage behavior.
R*(1*H) + W*(P*H) = S
Above formula says - each host read IOPS generates single storage IOPS but each write IOPS generates multiple IOPS based on RAID type penalty (P).
Does it make sense? If not example can help you to understand.
My RAID group has 9 SAS disks 600GB/15k RPM and I use RAID 5 (8+1).
So from storage perspective I have 9 disks where each can perform 180 IOPS which means I have performance 1620 IOPS from storage perspective. Let's assume I have strange read/write ratio 20:80.
S = 1620
P = 4 (because of RAID 5)
R = 20% = 0.2
W= 80% = 0.8
I need to know H ... storage performance from host perspective.
H = 1620 / (0.2 + 0.8 * 4) = 1620 / 3.4 = 476.47 IOPS from host perspective.
Note: Modern disk arrays often offer AST (Automated Storage Tiering). The calculation described in this blog post is valid even for those disk arrays. You have to fully understand internal architecture and design of particular storage but generally all storage pools are build from some sub disk groups bundled and protected by some RAID type. So if you have 125 disks bundled by 5 disks in RAID 5 (4+1) then the principle is the same. We have 125 spindles and write penalty is 4 because of RAID 5.
Saturday, January 05, 2013
Cisco Custom Image for ESXi 5
Cisco Custom Image for ESXi 5.1.0 GA Install CD
https://t.co/EGNxWJ5p
https://my.vmware.com/web/vmware/details?downloadGroup=CISCO-ESXI-5.1.0-GA-25SEP2012&productId=285#product_downloads
https://t.co/EGNxWJ5p
https://my.vmware.com/web/vmware/details?downloadGroup=CISCO-ESXI-5.1.0-GA-25SEP2012&productId=285#product_downloads
Thursday, December 20, 2012
Set the Scratch Partition from the vSphere Client
If a scratch partition is not set up, you might want to configure one, especially if low memory is a concern. When a scratch partition is not present, vm-support output is stored in a ramdisk.
For automated scratch partition configuration you can use vCLI, PowerCLI. For details see. VMware KB 1033696.
And here is my PowerCLI script inspired by KB above to set scratch location on all ESXi hosts in particular vSphere clusters.
Prerequisites
Procedure
And here is my PowerCLI script inspired by KB above to set scratch location on all ESXi hosts in particular vSphere clusters.
Wednesday, December 19, 2012
ESXi strange related log entry in /var/log/vmkernel.log
I've just found in /var/log/vmkernel.log lot of following storage errors
2012-12-19T01:34:02.010Z cpu2:4098)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x93 (0x412401965f00, 5586) to dev "naa.60060e80102d5f500511c97d000000d4" on path "vmhba2:C0:T0:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x96 0x32. Act:NONE
2012-12-19T01:34:02.010Z cpu2:4098)ScsiDeviceIO: 2322: Cmd(0x412401965f00) 0x93, CmdSN 0xc6fd5 from world 5586 to dev "naa.60060e80102d5f500511c97d000000d4" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x96 0x32.
The main part of log entry is "failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x96 0x32"
If I understand correctly
D: 0x2 = DEVICE CHECK CONDITION
Sense code 0x5 = ILLEGAL REQUEST
What is it? What doe's it mean?
I have ESXi 5.0 build 768111, storage HDS AMS 2300, CISCO UCS blade system, CISCO FC switches.
Update 1:
I've thought more about the root cause ... important detail is that it is happen when storage vMotion or other data migration is happening. So I've a hypotheses that it is related to VAAI. Storage is VAAI enabled and VAAI is supported. However disk block size is different on datastores (we are just in the middle of migration from VMFS-3 to VMFS-5).
So I've to do deeper diagnostic and root cause troubleshooting.
Stay tuned.
Update 2:
Solved, VAAI primitives must be enabled also on HDS Host Masking. For more information check
http://www.hds.com/assets/pdf/
Friday, December 07, 2012
Storage Queues and Performance
VMware recently published a paper titled Scalable
Storage Performance that delivered a wealth of information on storage with
respect to the ESX Server architecture. This paper contains details about the
storage queues that are a mystery to many of VMware's customers and partners.
I wanted to start a wiki article on some aspects of this paper that may be
interesting to storage enthusiasts and performance freaks.
Blog post for more information is at http://communities.vmware.com/docs/DOC-6490
These information are very useful for deep understanding of full storage stack.
Blog post for more information is at http://communities.vmware.com/docs/DOC-6490
These information are very useful for deep understanding of full storage stack.
Wednesday, December 05, 2012
Best Practices for Faster vSphere SDK Scripts
Source at http://www.virtuin.com/2012/11/best-practices-for-faster-vsphere-sdk.html
READ FULL ARTICLE
The VMware vSphere API is one of the more powerful vendor SDKs available in the Virtualization Ecosystem. As adoption of VMware vSphere has grown over the years, so has the size of Virtual Infrastructure environments. In many larger enterprises, the increasing number of VirtualMachines and HostSystems is driving the architectural requirement to deploy multiple vCenter Servers.
In response, the necessity for automation tooling has grown just as quickly. Automation to create daily reports, perform bulk operations, and aggregate data from large, distributed Virtual Infrastructure environments is a common requirement for managing the increasing virtual sprawl.
In a Virtual Infrastructure comprised of thousands of objects, even a simple script to list all VirtualMachines and their associated HostSystem and Datastores can result in very slow runtime execution. Developing automation with the following, simple best practices can take orders of magnitude off your vSphere API tool's runtime.
READ FULL ARTICLE
Monday, December 03, 2012
DELL Active System Manager
DELL Active System is managed by DELL Active System Manager. This is DELL converged infrastructure solution (blade server, networking, storage) to achieve "mainframe of 21st century" with leveraging server virtualization (hypervisors) to have enough flexibility to achieve required infrastructure SLAs.
http://www.youtube.com/watch?v=xU1I93wEHuU
Configuring a Chassis in Dell Active System Manager
http://www.youtube.com/watch?v=cRO0546yJ8U
http://www.youtube.com/watch?v=xU1I93wEHuU
Configuring a Chassis in Dell Active System Manager
http://www.youtube.com/watch?v=cRO0546yJ8U
Subscribe to:
Posts (Atom)