There are two things certain in windows life, blue screens of death and program crashes. Sometimes it's the programmer's fault, sometimes a windows bug, sometimes it is down to user configuration. All the same the program you distribute on the internet will invariably crash and burn and many times the problem will be unique to the user's PC and no amount of effort will allow you to reproduce it on your developer platform. But all is not lost, the Windows OS offers an excellent debugging infrastructure that will help you track and ultimately correct the faulty code.
There are many alternatives for remote troubleshooting, and much depends on the end user. If you are lucky the fault will be reported by an intelligent and savvy user who can describe the problem accurately, and can send a snapshot or crash dump file. For all other users you can grab the information you need using a remote access service like LogMeIn, assuming you get the user's consent to access his or her computer remotely.
|
|
You can get a rough idea if the user gives you the crash address, where the program instruction pointer was when he saw the GPF. This is easy to obtain, just ask him to get the extra information from the standard Windows GPF dialog as seen above right. The crash location will immediately clarify if the crash is your program's fault or somebody else's e.g. a misbehaving shell extension. In case it is mea culpa and you have a mapfile of your executable you can translate the crash address to the function that caused it (even down to the exact source code file and line number if you use /MAPINFO:LINES compiler switch).
The crash address alone is seldom enough, especially when the crash is in a faulty system or 3rd party DLL that took your program down with it. You must learn which part of your program invoked code from said DLL, that is you need a stack trace. Once upon a time it was feasible to ask a customer for the Dr Watson (the old default debugger) crash log which among other things contains the stack trace that could be used with the mapfile. However this approach has reached an evolutionary dead-end after Dr Watson was removed from Vista and mapfiles were dumbed down with latest versions of the Visual Studio compilers, especially the 64 bit versions.
My preferred remote debugging method is using crash mini-dumps. These contain a lot of useful information like the stack trace, local variables at the crash location etc, and are much smaller than a full memory dump, typically under 100KB — so it’s easy to send via email. What more, recent versions of the Developer Studio can open such *.MDMP files, and "run" them pressing <F5> so you see all the debugging information like as if you were doing it yourself! Obviously this is a static snapshot; you cannot execute any more code, but you get rich information on the crash location. With the matching program database (PDB) file you will also see the relevant source code, no more looking up mapfiles and such last century measures.
How do you get hold of a minidump for your program crash? You could join microsoft’s WinQual project to receive crash reports (that’s where all the data end up when one clicks “Send Error Report” button in the standard windows crash dialog). As a small ISV I find the entry requirements to this WinQual overkill — they’re changing them all the time too. The alternative is to create your own minidumps when your program crashes, and either send them to yourself programmatically via email (after politely asking the customer of course) or ask your user to find the MDMP file and email it to you manually.
Get ready for minidump remote crash analysis
- Install a crash handler that saves a minidump file when your program crashes. Briefly speaking you install a crash handler that calls MiniDumpWriteDump API, which is a few extra lines of source code to your program that go a long way. This will work for all windows versions of any consequence in today’s software market, windows XP or later (even for windows 2000 if you redistribute a later DBGHELP.DLL).
- Enable full debug information and store the program database (PDB) file matching the distributed executable. Even release builds can have "debugging information". This results in an insignificant increase to the executable size and has no risks for your intellectual property — just keep the PDB file, which can theoretically allow one to reverse engineer your code, to yourself. As a precaution I add the version number when I create the minidump file (e.g. x2minidump-1.7.2.3 unicode x64.mdmp) so I can combine it with the correct PDB file version.
- Download windbg (the GUI version of the kernel debugger KD.EXE). Minidumps can be opened in latest versions of developer studio (not VS6) but 32 bit versions cannot open 64 bit minidumps. Just get the 32 bit version of windbg — part of the windows debugging tools, which does load both 32 and 64 bit minidumps.
Analyze the mini dump with windbg
The windows debugger is a big and complicated beast but you can ignore most of its functionality and use it more or less like the trusty developer studio built-in debugger. We just want to do our job, i.e. analyse a static snapshot of our program crash, not become windbg experts. Once you get hold of a minidump file and fire up windbg, follow these steps:
- Use File > Open crash dump menu to open the MDMP file the user emailed to you. I usually put the minidump in the output folder where the xplorer² executable, program database and intermediate files are written in.
- Setup windbg symbol, source and image paths. Here we are not dealing with device drivers but user mode executables. Using commands from File menu (e.g. File > Symbol file path) we enter the various paths. Usually the image *.EXE and symbols *.PDB are in the same output folder where we dropped the MDMP file, and the source code is one level up. It is important to have the symbols matching the executable that has actually crashed on the remote PC. That's why I stressed naming the minidump file reflecting the version number.
- Optionally setup the system symbols connecting to the windows online symbol server, adding something like SRV*E:\srvcache*http://msdl.microsoft.com/download/symbols to the windbg symbol file path. Personally I find this useless even annoying — windbg gets very sluggish fetching online symbols for the various DLLs my program uses. We are not after a windows bug, no? So omit this step if you don't need system symbols.
- Arrange windbg workspace to your taste. Most of the usual debugging panes are available from View menu, like call stack, processes and threads, local variables etc.
- Make sure the command pane is shown then type .ecxr and press <Return>. If all is setup properly, windbg will jump to the instruction in your source code that crashed. You can't single step your code or anything, but rich static information is available, including the stack trace, even local variables!
The beauty of it is that you don't have to learn a lot to use windbg, and the information it extracts from the minidump is in technicolor compared to Dr Watson crash logs. If you need more information on windbg commands there are many resources including this basic command reference.
For more clarifications on the minidump static analysis procedure see today's demo video
If you pay close attention to the video you'll see that the minidump was for a drag-drop crash in windows vista but I've used windbg on my windows XP to figure it out. And all that with setting a couple of symbol paths and typing .ecxr. Much easier than squaring the circle! Remember, bug-free may be a programmer’s utopia, but if your program has less bugs than the next competitor’s, it is a significant marketing tool that can drive in more sales.
Post a comment on this topic
|