Reading Tea Leaves: The Fine Art of Debugging

Here’s my notes from Danny‘s session yesterday. Again, this is a big brain dump, and even if most of it looks boring and commonsense, keep reading – there’s some good stuff buried in here. Somewhere. So here goes:

It was let slip that they would repeat what they did last year – we will get a demo build of Delphi 2006 at the con. I was hoping so, but it’s good to have it confirmed. Yay!

Before You Begin

  • The point where it fails is often a while after the point where it goes wrong.
  • Find a reproducible test case. If you can’t repro it, you’ll just be grabbing at straws. And often it’s operator error…
    • Access Violations are good, because they’re really easy to identify. Something like “This one report prints upside-down, and we have no idea why” is a lot harder to track down.
  • Understand what the code is supposed to be doing. If the user says “It should be blue”, find out whether they’re right. Get a baseline description of the expected/desired behavior.
  • Discover what the code is actually doing.
  • Identify assumptions violated by reality.
    • Need to be able to separate yourself (observer) from what you thought you wrote – look at what it’s actually doing. Be objective.
  • Fix the problem without breaking what works.
    • Experienced programmers tend to have a fear of changing working code.
    • How many other places is this routine called? Don’t break the 90% case.
    • If you’re a library vendor, be careful of adding a virtual method to a base class – will break binary compatibility.

Process: Iterative Convergence

  • Find the bug’s “critical point”
    • It’s rare that you’ll be able to go straight to the line of code (unless it’s an AV or other exception)
  • Study the execution path to the critical point
  • Breakpoint bracketing
  • Start at a high level and work down (and move back up if it doesn’t hit any of your breakpoints)
  • Watch out for bug movement
    • You move your breakpoints down a level, and suddenly it’s breaking outside where it used to look like it was.
    • Probably means it’s not as consistently reproducible as you thought.
    • Or that you’re debugging a paint message with only one monitor.
    • Or that you have more than one bug. When you’re in a buggy area, there tends to be more than one bug, because it’s probably code that hasn’t been exercised very well before. (Fix one, another pops up, etc.)
  • Watch out for Heisenberg effects
    • When the act of debugging changes the behavior.
    • Very rare in modern debuggers, but still sometimes true
    • Win32 code can check whether it’s being debugged – some code might choose to run differently based on that flag
    • The way Windows allocates memory is slightly different under the debugger – can occasionally make things work under the debugger when they fail outside it
    • How to deal with it?
      • Console output
      • OutputDebugString
      • Writing to a log file
      • Watch out – any of these could actually change the behavior, too (especially if multithreaded, or lots of disk I/O, etc.)

Bug Types

  • Crashes and Exceptions
    • If this happens in a method that the debugger doesn’t have debug info for, it’ll stop at the next place up the call stack that it does have debug info for.
  • Resource management (memory leaks, handle leaks)
    • Things like not being able to do the same thing twice in a row, with an error like “cannot open file”
    • Do try..finally
  • Recursion and re-entrancy
    • Watch out for API calls that do their own message pumping
    • Especially if your message handlers deal with global state
  • Memory overwrites – always fun
    • Especially if the memory has already been freed. It’s not a failure that it got overwritten – it’s a failure that somebody’s still pointing to it!
  • Thread collisions accessing shared resources
    • Test multithreaded code on single and dual processor – deadlock, etc. may not fail on a single-processor machine, and then may consistently fail within 10 seconds on a multi-processor machine
    • Single-proc machine might switch threads every 10 ms, so you’re really unlikely to get interrupted within this particular set of three critical instructions (or whatever). Not true on a multi-processor or multi-core machine.
    • Hyperthreaded machine is probably good enough for testing most of these, but you may need multiple actual processors to check for things like bus contention
    • Very little difference between a 2-processor machine and a 16-processor one. Huge difference between 1- and 2-processor.
    • Hyperthreading is the Way of the Future. Intel and AMD are both saying they’ve maxed out on CPU speed – only way up is to add cores.
    • Few apps are multithreaded today, because it’s hard to do. Danny wants Borland to make this easier for coders. (applause)
    • Rats, I forgot to ask him whether the Delphi compiler is going multithreaded.
  • Timing problems
    • Arrival of events
  • When the app just disappears:
    • Might be a stack overflow, plus there’s nowhere else for the system to go.
    • But often, you’ve got a window handle whose WndProc is in a DLL, and the DLL has been unloaded, and the page is no longer in memory. Next time you call it, you get an exception, and there’s no exception handler in the stack for that window-proc call. Windows message dispatching is not exception-safe. Bye-bye.
    • May be COM – it doesn’t unload DLLs immediately. There’s an API to force it to unload them all (OleClear? OleFlush?). Not necessarily ActiveX, but often.
    • Delphi IDE: Load ActiveX control that likes to create window handles outside the IDE frame and doesn’t free them properly. Will blow up when something broadcasts a message to those dangling handles after the ActiveX DLL is unloaded (minimizing IDE, changing system settings, etc.)
  • Special interactions
    • Painting
      • Solution: ask your boss for dual monitors
      • Could try remote debugging, but that brings up timing issues
    • Focus changes
      • Hard to avoid when debugging – focus does go to the debugger! Time for WriteLn and OutputDebugString again.
    • Mouse drag
      • Putting a breakpoint in the middle of mouse-drag code is… problematic.
      • Here, a remote debugger could help. But you’d need two people (or be extremely dexterous, to handle both mice).
  • If a bug occurs when you compile with optimizations, and not if you compile without optimizations, you may be looking at an uninitialized variable.
  • Calling virtual method on garbage pointer

Programmer, Know Thy Tools

  • Breakpoints
    • Breakpoint Properties: It can actually be useful to look at the path in the “Filename”, especially if the IDE isn’t showing blue dots in a file you know you’ve compiled. You might be editing the file from the wrong directory.
    • Scenario: Problem is related to volume of traffic. Run a report, and somewhere over halfway through, it fails.
      Tactic: Use Pass Count, and overguess. The Pass Count decrements each time it goes through, so when it does blow up, look at the pass count at that point, and figure out exactly what to set the pass count to next time. (Cool!)
    • Breakpoint groups – useful if you have several breakpoints that you turn on as a group, or off as a group (typically in code that’s used a lot). Ignore them during application startup, then turn them on later.
    • Set a breakpoint that doesn’t break, but does other special stuff like turning breakpoint groups on/off, log stuff, log the call stack, etc.
    • Q: Can you only break within a particular thread? A: Sort of – use the Threads view to find the thread ID, then set a condition to check GetThreadID. Check with Chris Hesik, there may be something nicer going using named threads.
    • Data Breakpoints (note: it will break on the line after the assignment)
  • Watches
  • Local variable auto-watch
  • Flyover evaluator tooltips
    • Quickly evaluate a bunch of different things
    • “Sort of like reading a cactus by Braille”
  • Evaluator
    • Fails if you enter expressions at debug time that aren’t in the executable – if something got smart linked out, you can’t evaluate it
  • Threads view
    • In workstation .NET, GC happens in-process. In server .NET, GC happens in its own thread.
    • When you break, the thread you broke on says “Stopped”, and the others say “Suspended”. When you single-step the one you’re on, some of the others may also get to run briefly – watch out for this!
  • Modules view
    • Select a module, then it’ll show the addresses of all the entry points
    • If the debugger knows it’s Delphi code, and has DCUs with debug info, it’ll also show the units and methods in that module
  • Event log
    • Shows module relocations (Borland automatically runs rebase as part of their build process – maybe we should look into this?)
    • Way back in the day, when things were shipping on floppy, the BDE was shipped without fixups (~20% size difference), because “We know where the Windows DLLs are loaded, so it’ll be okay.” Which was true, until the next major release of Windows. And it *still* ships without fixups today. Ewww.
    • TDUMP will dump (among other things) the relocation table, so you can see how many fixups a DLL actually has – help you figure out the cost of not rebasing
  • Call Stack
  • Inspectors
  • CPU Disassembly and FPU Views
    • Top of CPU view shows something like “[$00409300]=$0040A214 Thread #1668”. This shows up if the current instruction accesses a memory address.
    • Mixed source: source lines are shown in CPU view. These may not be in the original order, and a single line might be repeated (won’t happen often with Delphi compiler, but will happen in C++).
    • First column is address of instruction in memory. To convert that to address in file, subtract base address ($00400000), then look in the binary and find the offset of the TEXT section and subtract that.
    • Then shows the hex representation of the instruction.
    • Function names that start with @ are compiler magic. WriteLn is very, very magic.
    • Lower-right pane is the stack.
      • Right-click > Follow > Near Code (neat!)
      • If you see something starting with “0040…”, that’s probably code (or a VMT, which also gets crammed into the code segment).
      • “7C…” is probably in a Windows DLL.
      • “0012…” is probably a stack address.
    • Ctrl+Left and Ctrl+Right tells the CPU View to slide the disassembly start address one byte left or right.

Code Surfing

  • Logical view, programmer’s perspective (not CPU’s perspective)
  • Demo high-level tools
  • Prologue code generated at start of method
  • EBP is part of the stack-frame stuff. If you have local variables, or a try..finally, you’ll have a stack frame, which generates EBP stuff. If the method is small enough, you might not have EBP, which can make debugging tough.
  • Try..finally is actually codegenned as a little inner method, with RET at the end and everything.
  • “FS:” register is a magic segment register, which is thread-local. Each thread has its own FS.
  • FS:[0] stores the address of a global linked list of pointers to exception frames.
  • If you have a local variable of type string, a try..finally will be magically added behind the scenes
  • @LStrClr finalizes an AnsiString variable
  • For CALL, JMP, etc., the CPU view will look for method addresses it knows about, so it’ll show “CALL TObject.Create” instead of just some cryptic address.
  • Parameter-passing conventions (see Help). Usually not all parameters go on the stack (EAX, EDX, and ECX usually hold the first three parameters), and the return value comes back in EAX. Some registers (like EAX) are considered volatile and can be trashed by any method you call. Others (like EDI) should be preserved.
  • In a method, EAX passes in the Self pointer.

Data Surfing

  • After some code that instantiates a TFoo, and stores the reference in, say, EBX, right-click in the bottom pane of the CPU view, select “Goto Address”, and type “EBX”. Then view as DWORDs. The first DWORD is the VMT pointer.
  • Since the EXE is loaded at the same address every time, you have a high degree of address stability between program runs.
  • [EBX] is the VMT (all classes have a VMT pointer at offset 0; this isn’t old-style objects). [EBX+4] has the first instance variable. Compiler doesn’t currently reorder fields to pack sub-DWORD-sized fields together, but the compiler semantics allow for that to be added later.
  • [VMT+0] points to the code for the first virtual method; [VMT+4] is the second; etc. Magic stuff at negative VMT offsets.
  • Class name usually comes at the end of the VMT.
  • To tell whether this is a valid object instance:
    • Follow the VMT pointer.
    • Should be able to find the class name at the end of the VMT.
    • Everything in the VMT should point to valid code.
  • VMT may not be in your code segment, especially in the IDE. E.g., “TForm1”, when you’re editing it, points to a fake VMT that’s synthesized by the IDE. Runtime-generated proxy. Fancy stuff.
  • Right-click on the current statement when it’s a CALL, and select “Follow” to see the code there. So you can tell before you do a virtual call into nowhere. Lots of “???”s are a bad sign.
  • “Goto Current EIP” moves the focus back to current instruction. “New EIP” moves EIP to the instruction you right-clicked on. Both really confusing names.

Memory blocks

  • This assumes Delphi 2005 and earlier (will change in Delphi 2006 with FastMM)
  • Memory block is preceded, and followed, by block size
  • If the block size preceding the block is odd (low bit set), that block has already been freed.
  • When the block is freed, the first four bytes are overwritten to point to the next free block. (Meaning, if that block was an object instance, you just trashed the VMT pointer! I knew it usually crashed if you called a virtual method on a freed object, but I didn’t realize it was guaranteed.)
  • So, if you crash at a call to a virtual method, go back and look at the DWORD preceding the instance pointer. If it’s odd, you know the object has already been freed.

CPU Space

  • Looking at the app from the perspective of the machine
  • Try to debug somebody else’s code that you don’t have source for. Reverse engineering.
  • “Say what? Do that again.” tricks.
  • Rerouting around AVs
  • Strings: string pointer points to the beginning of the string content. Go back 4 bytes = length. Go back another 4 bytes = refcount. If refcount is -1, the RTL knows that string is a constant, and skips refcounting.
  • String concatenations: @LStrCatN does it all in one swoop (i.e., adds all the lengths, then does one allocation). So A := B + C + D + E is better than A := B + C; A := A + D; A := A + E;
  • String literals are stored in the code segment in Win32, in a separate segment in Linux, and in a separate section in .NET.
  • Side note: typed constants in .NET (const X: string = ‘Foo’) are actually assignments that get run at startup. Regular constants (const X = ‘Foo’) aren’t assignments and are inlined where they’re used.
  • Can do a do-over with “New EIP”, but be careful that all the interesting registers are still intact. In the sample Danny gave, he called a method that trashed EAX, which is where an instance method was being kept. So we couldn’t go back and redo that same method call. 🙁
  • Be careful using “New EIP” with loops. Be careful that all necessary registers are set.
  • Slightly mind-bending thought: When you do “Evaluate/Modify”, and change the value of, say, a local string variable, the debugger has to be smart enough to notice any registers that were pointing to that string, and update their values. Eek.
  • How did it get here? = Where does it return to?
    • Note: New feature that may be coming in DeXter = color-coding addresses in the stack pane of the CPU view: code addresses are one color, heap addresses another color, etc.
    • If you follow the current stack location as code, you’d expect that to be the line you’re going to return to.
    • If the thing there isn’t good code (e.g., if it’s the beginning of a method, instead of the part of a method just after it called something else), look for something a ways up that is a stack address followed by a code address. This is probably a stack frame, and the code address should be good. (But if the code address is just a call to @HandleFinally, you’ll probably want to keep going to the next frame.)
  • If you’re following a stack address and it looks like a string, look at the 8 bytes preceding the string data. If it’s got a refcount and a length, it’s a Delphi string (not a C string).
  • When you’re looking at the stack, be careful of big uninitialized local variables (structures, buffers, etc.) – they may be leftovers from an earlier call stack, and look like a real stack trace even though they’re really bogus.
  • If something is odd (low bit set), it’s not a pointer.
  • Really dirty trick: Right-click in the data pane, “Goto address”, and type “EIP”. Now you can directly hack the code bytes, and e.g. change the immediate argument of a MOV (change a MOV EAX, 15 to a MOV EAX, 42 without recompiling). See, we’ve already got Edit and Continue, if you don’t mind directly hacking machine code…

How to…

  • Recognize a freed memory block
  • Recognize a bad instance pointer
  • Crawl the stack when the debugger can’t
  • Figure out what interface you’re inside (hmm, I don’t think he actually covered this – it was on the slide, but I think he was running out of time and didn’t go into details)
  • Discover what other IIDs an interface implements: call QueryInterface, step into it, look for code that does 8-byte compares, and see what’s there
    (shouldn’t that be 16-byte compares? A GUID is 128 bits, right?)
  • Get a name for a GUID
    (Hmm, looks like we didn’t actually go over this one either)
  • Find the name of the class you’re inside
  • Determine how many methods an interface has (look for a bunch of code pointers in the VMT; you can at least make an educated guess)

And Sometimes the Bear Gets You

  • Modifying CPU registers… pay attention! If you change something like EIP, there’s no Undo.
  • Goto Address is relative to focused pane. Right-clicking on the pane does not focus it.
  • Unfortunate module names (winspool.exe) – another thing that was on the slide but he didn’t have a chance to explain, and that I didn’t write down to ask him about after the session
  • Calling destructive functions in the evaluator (calling Destroy in the evaluator?)
  • Debugging the wrong source file

Other miscellanea

  • When the debugger breaks on an exception, it’s too late to do a postmortem – you can’t Evaluate local variables, you’ve lost the Self pointer, etc. Someone asked who we would talk to about why that is and whether it can be changed. Answer: Talk to Chris Hesik. He should be at the Meet the Team. (Almost seems like I remember that this was improved in Delphi 2005 last year, but I don’t have it on this machine to test.)
  • Since we covered objects’ memory layout, I asked what it is that an interface reference actually points to. Answer: COM requires that it point to a pointer to a VMT; beyond that, COM doesn’t care what’s at the end of that pointer. So Delphi allocates a slot within your object’s data, and puts a VMT pointer that points to stubs. Those stubs know which class and which interface they go with; they subtract the offset where that interface VMT pointer is stored within the instance, then call the actual method on the class. If the method on the class is virtual, then the stub is static, but contains a virtual method call. Everybody got that? (I understand what’s going on, but I have no idea whether my explanation is making any sense at all!)

Leave a Reply

Your email address will not be published. Required fields are marked *