Thursday, May 28, 2015

How large is my ESXi core dump partition?

Today I have been asked to check the core dump size on ESXi 5.1 host because this particular ESXi experienced PSOD (Purple Screen of Death) with a message that the core dump was not saved completely because out of space.

To be honest, it took me some time to find the way how to find core dump partition size therefore I documented here.

All commands and outputs are from my home lab where I have ESXi 6 booted from USB but the principle should be the same.

To run these commands you have to log in to ESXi shell for example over SSH or ESXi troubleshooting console.

First step is to get information on what disk partition is used for the core dump.
 [root@esx01:~] esxcli system coredump partition get  Active: mpx.vmhba32:C0:T0:L0:9  
   Configured: mpx.vmhba32:C0:T0:L0:9  
Now we know that core dump is configured on disk mpx.vmhba32:C0:T0:L0 partition 9.

Second step is to list disks and disks partitions together with sizes.
 [root@esx01:~] ls -lh /dev/disks/total 241892188  
 -rw-------  1 root   root    3.7G May 28 11:25 mpx.vmhba32:C0:T0:L0  
 -rw-------  1 root   root    4.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:1  
 -rw-------  1 root   root   250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:5  
 -rw-------  1 root   root   250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:6  
 -rw-------  1 root   root   110.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:7  
 -rw-------  1 root   root   286.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:8  
 -rw-------  1 root   root    2.5G May 28 11:25 mpx.vmhba32:C0:T0:L0:9  
You can get the same information by partedUtil.
[root@esx01:~] partedUtil get /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:9326 255 63 5242880
Here you can see the partition has 5,242,880 sectors where each sector is 512 bytes. That's mean 5,242,880 * 512 / 1024 / 1024 / 1024 = 2.5GB

Note: It is 2.5GB because ESXi is installed on 4GB USB. If you have regular hard drive core dump partition should be 4 GB.

BUT all the above information is not valid if you have changed your Scratch Location (here is VMware KB how to do it). If your Scratch Location is changed you can display the current scratch location which is stored on /etc/vmware/locker.conf
 [root@esx01:~] cat /etc/vmware/locker.conf  
 /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cz 0  
and you can list sub directories in your custom scratch location
 [root@esx01:~] ls -la /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cz  
 total 28  
 d---------  7 root   root     4096 May 12 21:45 .  
 d---------  4 root   root     4096 May 3 20:47 ..  
 d---------  2 root   root     4096 May 3 21:17 core  
 d---------  2 root   root     4096 May 3 21:17 downloads  
 d---------  2 root   root     4096 May 28 09:30 log  
 d---------  3 root   root     4096 May 3 21:17 var  
 d---------  2 root   root     4096 May 12 21:45 vsantraces  

Please note that the new scratch location contains the custom core dump subdirectory (core) and also log subdirectory (log).  

Other considerations
I usually change ESXi coredump partition and log directory location to shared datastore. This is done by following ESXi host advanced settings fully described in this VMware KB:
  • CORE DUMP Location: ScratchConfig.ConfiguredScratchLocation
  • Log Location: Syslog.global.logDir and optionally Syslog.global.logDirUnique if you want to redirect all ESXi hosts to the same directory
I also recommend sending logs to the remote syslog server over the network which is done with an advanced setting 
  • Remote Syslog Server(s): Syslog.global.logHost
ESXi core dumps can also be transferred over to the network to the central Core Dump Server. It has to be configured with the following esxcli commands.
 esxcli system coredump network set --interface-name vmk0 --server-ipv4 [Core_Dump_Server_IP] --server-port 6500  
 esxcli system coredump network set --enable true  
 esxcli system coredump network check  

2 comments:

polishpaul said...

"BUT all above information is not valid if you have changed your Scratch Location" - why? Where is this explained?

David Pasek said...

I do not know why. I have just test it in my home lab. So it works like that - or at least worked at the time i tested it. Unfortunately, not everything is documented that's why the lab is essential to really understand how things works.