I have heard about the issue with ESXi 6 Update 2 and HP 3PAR storage where VVOLs are enabled. I have been told that the issue is caused by issuing unsupported SCSI command to PE LUN (256). PE stands for Protocol Endpoint and it is VVOL technical LUN for data path between ESXi and remote storage system.
Observed symptoms:
Possible workarounds
Final solution
HOME LAB EXERCISE
I have tested in my home lab how to mask particular LUN on ESXi host just to be sure I know how to do it.
Below is quick solution for impatient readers.
Let's sat we have following device with following path.
esxcli storage core claimrule add -P MASK_PATH -r 500 -t location -A vmhba36 -C 0 -T 0 -L 1
esxcli storage core claimrule load
esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186
LUN Unmasking
esxcli storage core claimrule remove --rule 500
esxcli storage core claimrule load
esxcli storage core claiming unclaim --type=path --path=vmhba36:C0:T0:L1
esxcli storage core claimrule run
... continue reading for details.
LUN MASKING DETAILS
Exact LUN masking procedure is documented in vSphere 6 Documentation here. It is also documented in these KB articles 1009449 and 1014953.
List storage devices
So we have two device with following NAA IDs
Let's mask iSCSI devices exposed as a LUN 1.
So our path we want to mask is vmhba36:C0:T0:L1 and device UID is naa.6589cfc000000bf5e731ffc99ec35186
So let's create masking rule of path above. In this particular case we have just a single path because it is local device. In real environment we have usually multiple paths and all paths should be masked.
We can list our claim rules to see the result
We can see that new claim rule (500) is in configuration file (/etc/vmware/esx.com) and also loaded in runtime.
However, to really mask our particular device without ESXi host reboot we have to reclaim device
The particular device disappear from ESXi host immediately. ESXi host reboot is not needed.
So we are done. Particular device is not visible to ESXi host anymore.
Note: I was unsuccessful when I was testing LUN masking with local device. Therefore I assume that LUN masking works only with remote disks (iSCSI, Fibre Channel).
LUN UNMASKING
Just in case you would like to unmask device and use it again here is the procedure.
Let's start with removing claimrules for our previously masked path.
You can see that rule is removed from file configuration but it is still running. We have to re-load claimrules from file to runtime.
Here we go. Now there is no rule with id 500.
But the device is still not visible and we cannot execute command
ESXi host would probably help but can we do it without ESXi host reboot?
The answer is yes we can.
We have to unclaim the path to our device and re-run claim rules.
and now we can see both paths to iSCSI LUNs again.
Hope this helps to other vmware users having a need for LUN masking / unmasking.
Observed symptoms:
- ESX 6 Update 2 – issues (ESXi disconnects from vCenter, console is very slow)
- Hosts may take a long time to reconnect to vCenter after reboot or hosts may enter a "Not Responding" state in vCenter Server
- Storage-related tasks such as HBA rescan may take a very long time to complete
- I have been told that ESX 6 Update 1 doesn't experience such issues (there are entries are in log file but no other symptoms occur)
2016-05-18T11:31:27.319Z cpu1:242967)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac0150c3" - issuing command 0x43a657470fc0
2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a657470fc0) to dev "naa.2ff70002ac0150c3" failed on path "vmhba0:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba0:C0:T2:L256 device naa.2ff70002ac0150c3 - triggering path failover
2016-05-18T11:31:27.320Z cpu31:33602)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac0150c3": awaiting fast path state update before retrying failed command again.
Possible workarounds
- ESXi hostd restart helps therefore SSH to ESXi hosts was enabled for quick resolution in case of problem
- LUN masking of LUN 256
UPDATE 2016-09-30: There is most probably another workaround. Changing the Disk.MaxLUN parameter on ESXi Hosts as described in VMware KB 1998
- Application of HP 3PAR firmware patch (unfortunately patch is not available for current firmware thus firmware upgrade has to be planned and executed)
- Investigation of root cause why ESXi 6 Update 2 is more sensitive then ESXi 6 Update 1
- Application of workarounds mentioned above
HOME LAB EXERCISE
I have tested in my home lab how to mask particular LUN on ESXi host just to be sure I know how to do it.
Below is quick solution for impatient readers.
Let's sat we have following device with following path.
- Device: naa.6589cfc000000bf5e731ffc99ec35186
- Path: vmhba36:C0:T0:L1
esxcli storage core claimrule add -P MASK_PATH -r 500 -t location -A vmhba36 -C 0 -T 0 -L 1
esxcli storage core claimrule load
esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186
LUN Unmasking
esxcli storage core claimrule remove --rule 500
esxcli storage core claimrule load
esxcli storage core claiming unclaim --type=path --path=vmhba36:C0:T0:L1
esxcli storage core claimrule run
... continue reading for details.
LUN MASKING DETAILS
Exact LUN masking procedure is documented in vSphere 6 Documentation here. It is also documented in these KB articles 1009449 and 1014953.
List storage devices
[root@esx02:~] esxcli storage core device list naa.6589cfc000000bf5e731ffc99ec35186 Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186) Has Settable Display Name: true Size: 10240 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/
naa.6589cfc000000bf5e731ffc99ec35186
Vendor: FreeNAS Model: iSCSI Disk Revision: 0123 SCSI Level: 6 Is Pseudo: false Status: degraded Is RDM Capable: true Is Local: false Is Removable: false Is SSD: true Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: yes Attached Filters: VAAI Status: supported Other UIDs: vml.010001000030303530353661386131633830300000695343534920 Is Shared Clusterwide: true Is Local SAS Device: false Is SAS: false Is USB: false Is Boot USB Device: false Is Boot Device: false Device Max Queue Depth: 128 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false naa.6589cfc000000ac12355fe604028bf21 Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21) Has Settable Display Name: true Size: 10240 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/
naa.6589cfc000000ac12355fe604028bf21
Vendor: FreeNAS Model: iSCSI Disk Revision: 0123 SCSI Level: 6 Is Pseudo: false Status: degraded Is RDM Capable: true Is Local: false Is Removable: false Is SSD: true Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: yes Attached Filters: VAAI Status: supported Other UIDs: vml.010002000030303530353661386131633830310000695343534920 Is Shared Clusterwide: true Is Local SAS Device: false Is SAS: false Is USB: false Is Boot USB Device: false Is Boot Device: false Device Max Queue Depth: 128 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false
So we have two device with following NAA IDs
- naa.6589cfc000000bf5e731ffc99ec35186
- naa.6589cfc000000ac12355fe604028bf21
[root@esx02:~] esxcli storage nmp path list
iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000bf5e731ffc99ec35186
Runtime Name: vmhba36:C0:T0:L1
Device: naa.6589cfc000000bf5e731ffc99ec35186
Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186)
Group State: active
Array Priority: 0
Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000ac12355fe604028bf21
Runtime Name: vmhba36:C0:T0:L2
Device: naa.6589cfc000000ac12355fe604028bf21
Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21)
Group State: active
Array Priority: 0
Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
Let's mask iSCSI devices exposed as a LUN 1.
So our path we want to mask is vmhba36:C0:T0:L1 and device UID is naa.6589cfc000000bf5e731ffc99ec35186
So let's create masking rule of path above. In this particular case we have just a single path because it is local device. In real environment we have usually multiple paths and all paths should be masked.
esxcli storage core claimrule add -P MASK_PATH -r 500 -t location -A vmhba36 -C 0 -T 0 -L 1
esxcli storage core claimrule load
We can list our claim rules to see the result
[root@esx02:~] esxcli storage core claimrule list
Rule Class Rule Class Type Plugin Matches XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size
---------- ----- ------- --------- --------- ---------------------------------------- ------------------------------- --------------------------- -----------------------
MP 0 runtime transport NMP transport=usb false false 0
MP 1 runtime transport NMP transport=sata false false 0
MP 2 runtime transport NMP transport=ide false false 0
MP 3 runtime transport NMP transport=block false false 0
MP 4 runtime transport NMP transport=unknown false false 0
MP 101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 101 file vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 500 runtime location MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1 false false 0
MP 500 file location MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1 false false 0
MP 65535 runtime vendor NMP vendor=* model=* false false 0
We can see that new claim rule (500) is in configuration file (/etc/vmware/esx.com) and also loaded in runtime.
However, to really mask our particular device without ESXi host reboot we have to reclaim device
[root@esx02:~] esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186
The particular device disappear from ESXi host immediately. ESXi host reboot is not needed.
So we are done. Particular device is not visible to ESXi host anymore.
Note: I was unsuccessful when I was testing LUN masking with local device. Therefore I assume that LUN masking works only with remote disks (iSCSI, Fibre Channel).
LUN UNMASKING
Just in case you would like to unmask device and use it again here is the procedure.
Let's start with removing claimrules for our previously masked path.
[root@esx02:~] esxcli storage core claimrule remove --rule 500
[root@esx02:~] esxcli storage core claimrule list
Rule Class Rule Class Type Plugin Matches XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size
---------- ----- ------- --------- --------- ---------------------------------------- ------------------------------- --------------------------- -----------------------
MP 0 runtime transport NMP transport=usb false false 0
MP 1 runtime transport NMP transport=sata false false 0
MP 2 runtime transport NMP transport=ide false false 0
MP 3 runtime transport NMP transport=block false false 0
MP 4 runtime transport NMP transport=unknown false false 0
MP 101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 101 file vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 500 runtime location MASK_PATH adapter=vmhba36 channel=0 target=0 lun=1 false false 0
MP 65535 runtime vendor NMP vendor=* model=* false false 0
[root@esx02:~]
You can see that rule is removed from file configuration but it is still running. We have to re-load claimrules from file to runtime.
[root@esx02:~] esxcli storage core claimrule load
[root@esx02:~] esxcli storage core claimrule list
Rule Class Rule Class Type Plugin Matches XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size
---------- ----- ------- --------- --------- --------------------------------- ------------------------------- --------------------------- -----------------------
MP 0 runtime transport NMP transport=usb false false 0
MP 1 runtime transport NMP transport=sata false false 0
MP 2 runtime transport NMP transport=ide false false 0
MP 3 runtime transport NMP transport=block false false 0
MP 4 runtime transport NMP transport=unknown false false 0
MP 101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 101 file vendor MASK_PATH vendor=DELL model=Universal Xport false false 0
MP 65535 runtime vendor NMP vendor=* model=* false false 0
[root@esx02:~]
Here we go. Now there is no rule with id 500.
But the device is still not visible and we cannot execute command
esxcli storage core claiming reclaim -d naa.6589cfc000000bf5e731ffc99ec35186
because such device is not visible to ESXi host. We mask it, right? So it is exactly how it should behave.ESXi host would probably help but can we do it without ESXi host reboot?
The answer is yes we can.
We have to unclaim the path to our device and re-run claim rules.
esxcli storage core claiming unclaim --type=path --path=vmhba36:C0:T0:L1
esxcli storage core claimrule run
and now we can see both paths to iSCSI LUNs again.
[root@esx02:~] esxcli storage nmp path list
iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000bf5e731ffc99ec35186
Runtime Name: vmhba36:C0:T0:L1
Device: naa.6589cfc000000bf5e731ffc99ec35186
Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000bf5e731ffc99ec35186)
Group State: active
Array Priority: 0
Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
iqn.1998-01.com.vmware:esx02-096fde38-00023d000001,iqn.2005-10.org.freenas.ctl:test,t,257-naa.6589cfc000000ac12355fe604028bf21
Runtime Name: vmhba36:C0:T0:L2
Device: naa.6589cfc000000ac12355fe604028bf21
Device Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000ac12355fe604028bf21)
Group State: active
Array Priority: 0
Storage Array Type Path Config: {TPG_id=1,TPG_state=AO,RTP_id=3,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
Hope this helps to other vmware users having a need for LUN masking / unmasking.