Note: The session title was “Tips for Troubleshooting VMware ESX Server Faults”
At first I thought this session was just going to be a bunch of fluff. The session started of identifying 5 general areas of trouble: Hardware, Host, VMM, Guest O/S, and Application.
Hardware faults were defined as simply errors with the hardware. Host faults were defined as faults with either the Service Console of the VMkernel. VMM (Virtual Machine Monitor) was identified as virtual hardware errors. Guest O/S was defined as a fault with the Guest Operating System. You can figure out what Application Faults were defined as…
At first the presenter spoke about basic troubleshooting, and how it relates to the defined 5 areas. However there was some GREAT information that came from this session. Here’s a kind of brain dump of some of the things I took from it:
* If you are having errors in the VMM layer you should review the vmware.log file. This will have clues as to what caused the problem.
* If you are experiencing a repetative problem with a Guest O/S, he recommeneded VMotioning the VM to another ESX server and see if the problem is isolated to the single ESX server.
* After a PSOD (Purple Screen of Death) you should reboot the ESX server. A coredump file should be placed in the /root directory. Run “vmkdump -l <coredump>”. This will extract a vmware-log.1 file that you can use to try and identify the cause of the PSOD.
* Many times with a PSOD, either on the PSOD screen, or in the log file you will find a CPU Exception. Some of the “codes” are:
8 - Double Fault
10 - Invalid Task Switch
12 - Stack Segmentation Fault
13 - General Protection Fault
14 - Page Fault
17 - Alignment Check
You should also look for Machine Check Exceptions (MCE). These can be caused by:
CPU Errors
Cache Errors
Bus Control Errors
RAM Errors
(On AMD systems) PCI NorthBridge Errors
I/O Access Errors
* If a PSOD was caused by a MCE, then most of the time it is unrecoverable and you should contact your hardware vendor.
* If you are dealing with a BSOD, there will be a Memory.DMP file in the Windows VM. This is a memory dump, and you can use WinDB to debug it.