Root cause analysis is one of the toughest job.If its come to operating system, We mostly engage the Operating system vendors to find the root cause by analyzing the system logs and crashdump.The system logs will not alone help you to identify the root cause all the time and you need crashdump very badly. How the crashdump can be generated if the operating system is hung ? In Most of the Unix hardware has option to trigger the operating system panic from the hardware console. So that crashdump can be generated . But now a days most of the Linux,windows and Solaris x86 deployments are happening VMware vSphere environment and there is no direct option to initiate the panic outside the operating system. In my current environment, most of the solaris guest OS use to hung frequently on vsphere environment ,we are not able to produce the crashdump . As a last option,we will just reset the guest OS from VMware console.
Recently i have found utility called “vmsscore” from VMware labs . This tool can convert the guest OS snapshot as a core file. We just need to take the snapshot of guest OS during hung state and where the guest OS can’t produce the crashdump. The vmss2core tool can produce core dump files for the Windows debugger (WinDbg), Red Hat crash compatible core files, a physical memory view suitable for the Gnu debugger gdb, Solaris MDB (XXX), and Mac OS X formats.
If you have provision more storage to the VM , snapshot might consume more disk space. In those cases, just suspend the virtual machine and use the *.vmss file instead of *.vmsn file.
Here we will see how we can generate the crashdump/coredump using the vmware snapshot.
1.Login to vSphere client and navigate to guest OS which is hung. Take the snapshot of that guest OS prior to reset.
2.Click on the summary of the guest operating system and see which datastore has been used.
3. Just right click on the datastore and select browse. You will get a below pop-up window.
4.Select the most recent *.vmsn file .(Vmware snapshot file) and download to your laptop. (Ex: D:\)
5.Download the “vmsscore” file from vmware labs for windows operating system (Ex: D:\). This file is available for Linux as well.
[highlight]Make sure your windows laptop has Visual Studio 2008 Service Pack 1 runtime installed.[/highlight]
6.Once you have “vmsscore.exe” and guest os snapshot (*.vmsn) or suspended (*.vmss) file, you can start convert the snapshot/suspended state files using the following method.
Open a command prompt in widows box. Start – > run – > cmd
Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Test1>d: D:\> D:\>vmss2core_win.exe -M SOL10-Snapshot2.vmsn vmss2core version 3156346 Copyright (C) 1998-2013 VMware, Inc. All rights reserv ed. Started core writing. Writing note section header. Writing 2 memory section headers. Writing notes. ... 10 MBs written. ... 20 MBs written. ... 30 MBs written. ... 40 MBs written. ... 50 MBs written. ... 60 MBs written. ... 70 MBs written. ... 80 MBs written. <<<<
The above command will generate the file called vmss.core . This file needs to uploaded to respective operating system vendors support portal for review.For an example, if the guest operating system is oracle solaris , you need to uploaded the core to oracle support website for review.If it a redhat linux , then you need to upload the core file to redhat.
For windows ,you need to generate *.dmp extension file. (memory dump).
Here is list of options for vmsscore utility.
Options | Explanation |
(none) | Without any options, produces linear views of memory (vmss.core<n>) one per virtual CPU. |
-W | Creates aWinDbgfile (memory.dmp)of aWindowsvirtualmachinewithcommonlyusedbuildnumbers, 2195 for Win32 and 6000 for Win64. |
-W<num> | Creates a WinDbg file (memory.dmp) with <num> as the build number,for example: -W2600 |
-WDDB<num> | Creates aWinDbg file (memory.dmp) with <num> as thedebugger data block address in hexadecimal,for example: -W12ac34de |
-WSCAN | Creates aWinDbg file (memory.dmp) and scan all of memory forthe debugger data block, instead ofjust the lower 256 MB |
-M | Creates a core file (vmss.core) with a physical memory view suitable forthe Gnu debugger gdb |
-l <str> | Specifies the starting andending offsets ofLinux kerneldata structures for use by the -N and-P options, with <str> expressed as 0xHEXNUM,0xHEXNUM. Ignored when used with other options. |
-N | Red Hat crash core file (vmss.core) for an arbitrary Linux version as defined by the -l option. |
-N4 | Red Hat crash core file (vmss.core) for Linux kernel version 2.4. |
-N6 | Red Hat crash core file (vmss.core) for Linux kernel version 2.6. |
-P | Prints a list of processes running in the Linux virtual machine at checkpoint time |
-P<pid> | Creates a core file (core.<pid>) forthe Linux process number <pid>. Itis likely thatprograms compiledwithsymboltables (notremoved)willyieldbetterdebug information |
-X<nn-v> | Mac OS core dump with <nn-v> representing architecture and Darwin kernel version |
-q | Quiet operation. |
I am sure that this tool will be very helpful when the guest operating system experiences a soft/hard hang, such as a Windows blue screen, or a Linux kernel panic with interrupts disabled.You can simply perform the VMware snapshot and convert as core for root cause analysis.
Hope this article is informative to you.Share it ! Comment it !! Be Sociable !!!
Hassan Khalil says
how to generate Solaris MDB file?
The instructions at https://www.vmware.com/pdf/snapshot2core_technote.pdf just talk about windows, linux and Mac OS X, but does not say how to generate Solaris MDB file.
Even the vmss2core command options doesnt list any option to get Solaris crash dump generation.
Vikrant Aggarwal says
Very impressive article 🙂
Lingeswaran R says
Thank you !!!
Duane Haas says
were you ever able to find out what solaris was hanging on VMware? Having similar issue. Trying to run the tool against my memory snapshot but i keep getting a message stating “cannot write flashram page”
http://i.imgur.com/qhkdM3z.png
Lingeswaran R says
1. First thing , we should not have Solaris VM in VMware .
2. If you can’t avoid , install the Solaris using the disk slice (instead of using SVM or ZFS.)
3. SVM would the main culprit . ZFS will eat up the memory .
4. Check with VMware team about the memory ballooning
Regards
Lingesh