Reading Tea Leaves: The Fine Art of Debugging

Here’s my notes from Danny‘s session yesterday. Again, this is a big brain dump, and even if most of it looks boring and commonsense, keep reading – there’s some good stuff buried in here. Somewhere. So here goes:

It was let slip that they would repeat what they did last year – we will get a demo build of Delphi 2006 at the con. I was hoping so, but it’s good to have it confirmed. Yay!

Before You Begin

  • The point where it fails is often a while after the point where it goes wrong.
  • Find a reproducible test case. If you can’t repro it, you’ll just be grabbing at straws. And often it’s operator error…
    • Access Violations are good, because they’re really easy to identify. Something like “This one report prints upside-down, and we have no idea why” is a lot harder to track down.
  • Understand what the code is supposed to be doing. If the user says “It should be blue”, find out whether they’re right. Get a baseline description of the expected/desired behavior.
  • Discover what the code is actually doing.
  • Identify assumptions violated by reality.
    • Need to be able to separate yourself (observer) from what you thought you wrote – look at what it’s actually doing. Be objective.
  • Fix the problem without breaking what works.
    • Experienced programmers tend to have a fear of changing working code.
    • How many other places is this routine called? Don’t break the 90% case.
    • If you’re a library vendor, be careful of adding a virtual method to a base class – will break binary compatibility.

Process: Iterative Convergence

  • Find the bug’s “critical point”
    • It’s rare that you’ll be able to go straight to the line of code (unless it’s an AV or other exception)
  • Study the execution path to the critical point
  • Breakpoint bracketing
  • Start at a high level and work down (and move back up if it doesn’t hit any of your breakpoints)
  • Watch out for bug movement
    • You move your breakpoints down a level, and suddenly it’s breaking outside where it used to look like it was.
    • Probably means it’s not as consistently reproducible as you thought.
    • Or that you’re debugging a paint message with only one monitor.
    • Or that you have more than one bug. When you’re in a buggy area, there tends to be more than one bug, because it’s probably code that hasn’t been exercised very well before. (Fix one, another pops up, etc.)
  • Watch out for Heisenberg effects
    • When the act of debugging changes the behavior.
    • Very rare in modern debuggers, but still sometimes true
    • Win32 code can check whether it’s being debugged – some code might choose to run differently based on that flag
    • The way Windows allocates memory is slightly different under the debugger – can occasionally make things work under the debugger when they fail outside it
    • How to deal with it?
      • Console output
      • OutputDebugString
      • Writing to a log file
      • Watch out – any of these could actually change the behavior, too (especially if multithreaded, or lots of disk I/O, etc.)

Bug Types

  • Crashes and Exceptions
    • If this happens in a method that the debugger doesn’t have debug info for, it’ll stop at the next place up the call stack that it does have debug info for.
  • Resource management (memory leaks, handle leaks)
    • Things like not being able to do the same thing twice in a row, with an error like “cannot open file”
    • Do try..finally
  • Recursion and re-entrancy
    • Watch out for API calls that do their own message pumping
    • Especially if your message handlers deal with global state
  • Memory overwrites – always fun
    • Especially if the memory has already been freed. It’s not a failure that it got overwritten – it’s a failure that somebody’s still pointing to it!
  • Thread collisions accessing shared resources
    • Test multithreaded code on single and dual processor – deadlock, etc. may not fail on a single-processor machine, and then may consistently fail within 10 seconds on a multi-processor machine
    • Single-proc machine might switch threads every 10 ms, so you’re really unlikely to get interrupted within this particular set of three critical instructions (or whatever). Not true on a multi-processor or multi-core machine.
    • Hyperthreaded machine is probably good enough for testing most of these, but you may need multiple actual processors to check for things like bus contention
    • Very little difference between a 2-processor machine and a 16-processor one. Huge difference between 1- and 2-processor.
    • Hyperthreading is the Way of the Future. Intel and AMD are both saying they’ve maxed out on CPU speed – only way up is to add cores.
    • Few apps are multithreaded today, because it’s hard to do. Danny wants Borland to make this easier for coders. (applause)
    • Rats, I forgot to ask him whether the Delphi compiler is going multithreaded.
  • Timing problems
    • Arrival of events
  • When the app just disappears:
    • Might be a stack overflow, plus there’s nowhere else for the system to go.
    • But often, you’ve got a window handle whose WndProc is in a DLL, and the DLL has been unloaded, and the page is no longer in memory. Next time you call it, you get an exception, and there’s no exception handler in the stack for that window-proc call. Windows message dispatching is not exception-safe. Bye-bye.
    • May be COM – it doesn’t unload DLLs immediately. There’s an API to force it to unload them all (OleClear? OleFlush?). Not necessarily ActiveX, but often.
    • Delphi IDE: Load ActiveX control that likes to create window handles outside the IDE frame and doesn’t free them properly. Will blow up when something broadcasts a message to those dangling handles after the ActiveX DLL is unloaded (minimizing IDE, changing system settings, etc.)
  • Special interactions
    • Painting
      • Solution: ask your boss for dual monitors
      • Could try remote debugging, but that brings up timing issues
    • Focus changes
      • Hard to avoid when debugging – focus does go to the debugger! Time for WriteLn and OutputDebugString again.
    • Mouse drag
      • Putting a breakpoint in the middle of mouse-drag code is… problematic.
      • Here, a remote debugger could help. But you’d need two people (or be extremely dexterous, to handle both mice).
  • If a bug occurs when you compile with optimizations, and not if you compile without optimizations, you may be looking at an uninitialized variable.
  • Calling virtual method on garbage pointer

Programmer, Know Thy Tools

  • Breakpoints
    • Breakpoint Properties: It can actually be useful to look at the path in the “Filename”, especially if the IDE isn’t showing blue dots in a file you know you’ve compiled. You might be editing the file from the wrong directory.
    • Scenario: Problem is related to volume of traffic. Run a report, and somewhere over halfway through, it fails.
      Tactic: Use Pass Count, and overguess. The Pass Count decrements each time it goes through, so when it does blow up, look at the pass count at that point, and figure out exactly what to set the pass count to next time. (Cool!)
    • Breakpoint groups – useful if you have several breakpoints that you turn on as a group, or off as a group (typically in code that’s used a lot). Ignore them during application startup, then turn them on later.
    • Set a breakpoint that doesn’t break, but does other special stuff like turning breakpoint groups on/off, log stuff, log the call stack, etc.
    • Q: Can you only break within a particular thread? A: Sort of – use the Threads view to find the thread ID, then set a condition to check GetThreadID. Check with Chris Hesik, there may be something nicer going using named threads.
    • Data Breakpoints (note: it will break on the line after the assignment)
  • Watches
  • Local variable auto-watch
  • Flyover evaluator tooltips
    • Quickly evaluate a bunch of different things
    • “Sort of like reading a cactus by Braille”
  • Evaluator
    • Fails if you enter expressions at debug time that aren’t in the executable – if something got smart linked out, you can’t evaluate it
  • Threads view
    • In workstation .NET, GC happens in-process. In server .NET, GC happens in its own thread.
    • When you break, the thread you broke on says “Stopped”, and the others say “Suspended”. When you single-step the one you’re on, some of the others may also get to run briefly – watch out for this!
  • Modules view
    • Select a module, then it’ll show the addresses of all the entry points
    • If the debugger knows it’s Delphi code, and has DCUs with debug info, it’ll also show the units and methods in that module
  • Event log
    • Shows module relocations (Borland automatically runs rebase as part of their build process – maybe we should look into this?)
    • Way back in the day, when things were shipping on floppy, the BDE was shipped without fixups (~20% size difference), because “We know where the Windows DLLs are loaded, so it’ll be okay.” Which was true, until the next major release of Windows. And it *still* ships without fixups today. Ewww.
    • TDUMP will dump (among other things) the relocation table, so you can see how many fixups a DLL actually has – help you figure out the cost of not rebasing
  • Call Stack
  • Inspectors
  • CPU Disassembly and FPU Views
    • Top of CPU view shows something like “[$00409300]=$0040A214 Thread #1668”. This shows up if the current instruction accesses a memory address.
    • Mixed source: source lines are shown in CPU view. These may not be in the original order, and a single line might be repeated (won’t happen often with Delphi compiler, but will happen in C++).
    • First column is address of instruction in memory. To convert that to address in file, subtract base address ($00400000), then look in the binary and find the offset of the TEXT section and subtract that.
    • Then shows the hex representation of the instruction.
    • Function names that start with @ are compiler magic. WriteLn is very, very magic.
    • Lower-right pane is the stack.
      • Right-click > Follow > Near Code (neat!)
      • If you see something starting with “0040…”, that’s probably code (or a VMT, which also gets crammed into the code segment).
      • “7C…” is probably in a Windows DLL.
      • “0012…” is probably a stack address.
    • Ctrl+Left and Ctrl+Right tells the CPU View to slide the disassembly start address one byte left or right.

Code Surfing

  • Logical view, programmer’s perspective (not CPU’s perspective)
  • Demo high-level tools
  • Prologue code generated at start of method
  • EBP is part of the stack-frame stuff. If you have local variables, or a try..finally, you’ll have a stack frame, which generates EBP stuff. If the method is small enough, you might not have EBP, which can make debugging tough.
  • Try..finally is actually codegenned as a little inner method, with RET at the end and everything.
  • “FS:” register is a magic segment register, which is thread-local. Each thread has its own FS.
  • FS:[0] stores the address of a global linked list of pointers to exception frames.
  • If you have a local variable of type string, a try..finally will be magically added behind the scenes
  • @LStrClr finalizes an AnsiString variable
  • For CALL, JMP, etc., the CPU view will look for method addresses it knows about, so it’ll show “CALL TObject.Create” instead of just some cryptic address.
  • Parameter-passing conventions (see Help). Usually not all parameters go on the stack (EAX, EDX, and ECX usually hold the first three parameters), and the return value comes back in EAX. Some registers (like EAX) are considered volatile and can be trashed by any method you call. Others (like EDI) should be preserved.
  • In a method, EAX passes in the Self pointer.

Data Surfing

  • After some code that instantiates a TFoo, and stores the reference in, say, EBX, right-click in the bottom pane of the CPU view, select “Goto Address”, and type “EBX”. Then view as DWORDs. The first DWORD is the VMT pointer.
  • Since the EXE is loaded at the same address every time, you have a high degree of address stability between program runs.
  • [EBX] is the VMT (all classes have a VMT pointer at offset 0; this isn’t old-style objects). [EBX+4] has the first instance variable. Compiler doesn’t currently reorder fields to pack sub-DWORD-sized fields together, but the compiler semantics allow for that to be added later.
  • [VMT+0] points to the code for the first virtual method; [VMT+4] is the second; etc. Magic stuff at negative VMT offsets.
  • Class name usually comes at the end of the VMT.
  • To tell whether this is a valid object instance:
    • Follow the VMT pointer.
    • Should be able to find the class name at the end of the VMT.
    • Everything in the VMT should point to valid code.
  • VMT may not be in your code segment, especially in the IDE. E.g., “TForm1”, when you’re editing it, points to a fake VMT that’s synthesized by the IDE. Runtime-generated proxy. Fancy stuff.
  • Right-click on the current statement when it’s a CALL, and select “Follow” to see the code there. So you can tell before you do a virtual call into nowhere. Lots of “???”s are a bad sign.
  • “Goto Current EIP” moves the focus back to current instruction. “New EIP” moves EIP to the instruction you right-clicked on. Both really confusing names.

Memory blocks

  • This assumes Delphi 2005 and earlier (will change in Delphi 2006 with FastMM)
  • Memory block is preceded, and followed, by block size
  • If the block size preceding the block is odd (low bit set), that block has already been freed.
  • When the block is freed, the first four bytes are overwritten to point to the next free block. (Meaning, if that block was an object instance, you just trashed the VMT pointer! I knew it usually crashed if you called a virtual method on a freed object, but I didn’t realize it was guaranteed.)
  • So, if you crash at a call to a virtual method, go back and look at the DWORD preceding the instance pointer. If it’s odd, you know the object has already been freed.

CPU Space

  • Looking at the app from the perspective of the machine
  • Try to debug somebody else’s code that you don’t have source for. Reverse engineering.
  • “Say what? Do that again.” tricks.
  • Rerouting around AVs
  • Strings: string pointer points to the beginning of the string content. Go back 4 bytes = length. Go back another 4 bytes = refcount. If refcount is -1, the RTL knows that string is a constant, and skips refcounting.
  • String concatenations: @LStrCatN does it all in one swoop (i.e., adds all the lengths, then does one allocation). So A := B + C + D + E is better than A := B + C; A := A + D; A := A + E;
  • String literals are stored in the code segment in Win32, in a separate segment in Linux, and in a separate section in .NET.
  • Side note: typed constants in .NET (const X: string = ‘Foo’) are actually assignments that get run at startup. Regular constants (const X = ‘Foo’) aren’t assignments and are inlined where they’re used.
  • Can do a do-over with “New EIP”, but be careful that all the interesting registers are still intact. In the sample Danny gave, he called a method that trashed EAX, which is where an instance method was being kept. So we couldn’t go back and redo that same method call. 🙁
  • Be careful using “New EIP” with loops. Be careful that all necessary registers are set.
  • Slightly mind-bending thought: When you do “Evaluate/Modify”, and change the value of, say, a local string variable, the debugger has to be smart enough to notice any registers that were pointing to that string, and update their values. Eek.
  • How did it get here? = Where does it return to?
    • Note: New feature that may be coming in DeXter = color-coding addresses in the stack pane of the CPU view: code addresses are one color, heap addresses another color, etc.
    • If you follow the current stack location as code, you’d expect that to be the line you’re going to return to.
    • If the thing there isn’t good code (e.g., if it’s the beginning of a method, instead of the part of a method just after it called something else), look for something a ways up that is a stack address followed by a code address. This is probably a stack frame, and the code address should be good. (But if the code address is just a call to @HandleFinally, you’ll probably want to keep going to the next frame.)
  • If you’re following a stack address and it looks like a string, look at the 8 bytes preceding the string data. If it’s got a refcount and a length, it’s a Delphi string (not a C string).
  • When you’re looking at the stack, be careful of big uninitialized local variables (structures, buffers, etc.) – they may be leftovers from an earlier call stack, and look like a real stack trace even though they’re really bogus.
  • If something is odd (low bit set), it’s not a pointer.
  • Really dirty trick: Right-click in the data pane, “Goto address”, and type “EIP”. Now you can directly hack the code bytes, and e.g. change the immediate argument of a MOV (change a MOV EAX, 15 to a MOV EAX, 42 without recompiling). See, we’ve already got Edit and Continue, if you don’t mind directly hacking machine code…

How to…

  • Recognize a freed memory block
  • Recognize a bad instance pointer
  • Crawl the stack when the debugger can’t
  • Figure out what interface you’re inside (hmm, I don’t think he actually covered this – it was on the slide, but I think he was running out of time and didn’t go into details)
  • Discover what other IIDs an interface implements: call QueryInterface, step into it, look for code that does 8-byte compares, and see what’s there
    (shouldn’t that be 16-byte compares? A GUID is 128 bits, right?)
  • Get a name for a GUID
    (Hmm, looks like we didn’t actually go over this one either)
  • Find the name of the class you’re inside
  • Determine how many methods an interface has (look for a bunch of code pointers in the VMT; you can at least make an educated guess)

And Sometimes the Bear Gets You

  • Modifying CPU registers… pay attention! If you change something like EIP, there’s no Undo.
  • Goto Address is relative to focused pane. Right-clicking on the pane does not focus it.
  • Unfortunate module names (winspool.exe) – another thing that was on the slide but he didn’t have a chance to explain, and that I didn’t write down to ask him about after the session
  • Calling destructive functions in the evaluator (calling Destroy in the evaluator?)
  • Debugging the wrong source file

Other miscellanea

  • When the debugger breaks on an exception, it’s too late to do a postmortem – you can’t Evaluate local variables, you’ve lost the Self pointer, etc. Someone asked who we would talk to about why that is and whether it can be changed. Answer: Talk to Chris Hesik. He should be at the Meet the Team. (Almost seems like I remember that this was improved in Delphi 2005 last year, but I don’t have it on this machine to test.)
  • Since we covered objects’ memory layout, I asked what it is that an interface reference actually points to. Answer: COM requires that it point to a pointer to a VMT; beyond that, COM doesn’t care what’s at the end of that pointer. So Delphi allocates a slot within your object’s data, and puts a VMT pointer that points to stubs. Those stubs know which class and which interface they go with; they subtract the offset where that interface VMT pointer is stored within the instance, then call the actual method on the class. If the method on the class is virtual, then the stub is static, but contains a virtual method call. Everybody got that? (I understand what’s going on, but I have no idea whether my explanation is making any sense at all!)

Mid-day

Here in Danny‘s session, soon to start. I did make it here early enough to get a power outlet. Unfortunately, both the side walls of this room are partitions, so no power; and Danny said he didn’t know of any spare power outlets up front. So I’m stuck heckling from the back of the room. I’ll have to speak up when I ask questions. Or maybe I’ll just submit them via QC; might be easier.

Anyway. Just for posterity (and amusement), here’s the description of this session:

Reading Tea Leaves: The Fine Art of Debugging

You can do a lot with today’s advanced debugging tools, but there are still times when the program ends up so far off the map that it’s beyond the help of any mechanical tool. To figure out when, where, and why the program went astray, sometimes you just have to roll up your sleeves and break out the CPU view and assume the mind meld position. Learn to think like the processor, navigate high level data structures in raw hex dumps, recognize encrypted blonds and brunettes at a glance, and manipulate the time streams with your bare hands.

His slides actually list the subtitle as “The Mystic Art of Debugging”. For what it’s worth.

Agile and Extreme Programming: A Pragmatic Approach (part 4)

Stand-Up Meeting

  • Every day starts with a stand-up meeting
    • Problems
    • Solutions
    • Obstacles
  • You have to stand up. Otherwise people will get comfortable and it’ll take too long.
  • More efficient to have one short meeting with everyone, than a lot of little meetings
  • Cross-team communication is the primary purpose
  • Part of the daily feedback loop
  • Substitute with something that provides daily updates, problems encountered and solved, etc. (wiki, information radiators) but these aren’t as ruch a communications channel

Fix XP when it breaks

  • Nothing should be prescriptive – if something isn’t working for you, fix it (making sure you’re not losing any feedbackloops)

Build Your Own Tools

  • for tracking projects
  • Evolve your approach over time
  • Changes from project to project
  • You get better and better at it
  • Each PM should create their own tracking sheet – not take someone else’s and mush it around. Psychological – owning the thing you built.
  • Project Manager is liaison between programmers and management – PM’s job is to provide facts to management

Designing Practices

Simplicity

  • A simple design takes more time than a complex one
  • When you find something complex, replace it with something simple
  • No substitute!

System Metaphor

  • Keeps team on the same page by naming classes and methods consistently
  • Stick as close to the problem domain as possible
  • Take time to do this no matter your methodology

CRC Cards (Class, Responsibility, and Collaboration)

  • Use for design sessions
  • Break away from procedural thought, more fully appreciate OO
  • Individual CRC cards represent objects
  • Write class name at the top, responsibilities (what it’s responsible for) along the left, collaborations along the right
  • Shuffle them around on the table to find relationships you like
  • Prefer hand-written on paper, instead of something created in Visio (this also applies to UI brainstorming)
  • Step through workflow, spot weaknesses and problems
  • Easy to change, but pick something flexible, interactive, whole-team, serves stakeholders, allows “What If” games
  • Many companies prefer more documentation than CRC cards provide
    • “We have to have documentation so that…” Make manager finish this sentence
    • …we can re-create the whole project?
    • …we can kill more trees?
    • …we know what it does?
    • The tests are better documentation than any outdated Word document
    • You can verify, at any time, that the code is doing what it’s supposed to – just run the tests
  • If you must write documentation for the code, apply DRY (Don’t Repeat Yourself). There should be one canonical source of truth for each piece of information – if you need it somewhere else, find a way to generate it.
    • yDoc – tool that creates UML diagrams by looking at the actual code (!)
    • If you really need to manually create developer documentation, make a story card for it in the iteration, so management can see how much more you could be doing if it wasn’t required

Spikes

  • Proof of concept
  • Not called “prototypes” because people try to keep prototypes and turn them into the actual project. Never do this – always throw spikes away.

YAGNI

  • No functionality is added early
  • Don’t substitute this!

Refactor

  • whenever and wherever possible
  • You must be willing to change code any time it should be changed
  • Don’t live with broken windows
    • Nobody will bother the abandoned warehouse until the first window gets broken… then all the windows get broken, graffiti, etc.
    • The first broken window means “I don’t care.”
    • Read “The Pragmatic Programmer”
  • Too much software becomes brittle because of hundreds of tiny compromises

Coding Practices

Customer is always available

  • Business stakeholder must be available
  • If you can’t get a business person, get a worthy substitute
  • Tell them: requirements docs are out of date as soon as they’re written
  • Programmers aren’t business people

Coding Standards

 

Test-First Coding

  • Least done, most important
  • First write the test, then make it pass
  • Helps developer to really consider what needs to be done
  • Requirements are nailed down firmly by tests
  • Can’t misunderstand a spec written in executable code
  • The most effective way to write code. You get feedback right away.
  • You get nearly as good feedback if you write tests concurrently. (But do it – planning to write them next week doesn’t count!)
  • Tests are the only way you can write code with confidence
  • Tell management: “This *is* the set of requirements”; “Here’s the coverage test that tells how much of our code is verified.”

Pair Programming

  • The most Extreme practice in XP!
  • Two people working at the same computer will add about as much functionality as 2 people working separately – but it will be much higher quality
    • Person at keyboard thinks about low-level details
    • Pair thinks about larger context
  • Increases software quality without impacting time to deliver
  • Works best with 2 developers of more or less equal skill
  • Truck number again
  • Pair programming feels awkward at first
  • ThoughtWorks has two monitors, two mice, and two keyboards at each station
  • Tough to substitute because it’s the most immediate code feedback loop (on the order of minutes)
  • Try sneaking it in on the hard problems. Then show quantifiable results (bug counts, passed tests).
  • Treat it as real-time code review.

Integration

 

Collective Code Ownership

  • Hard!
  • Try to adopt this whatever your methodology

Leave optimization to last

 

No overtime

  • Sucks the spirit and motivation out of the team
  • Write worse code when you’re worn out

Tests

  • Unit tests
  • Acceptance tests for everything, because that’s the only way you know when the story is done

Demonstrate that XP works! Pilot projects, metrics, etc.

 

Substitution Scale: table summarizing what’s easy to substitute and what’s hard (will post this later today)

Agile and Extreme Programming: A Pragmatic Approach (part 3)

Wow, this is going on for a while. Looks like this is part 3 of 4. There’s some Q&A stuff, and then a lot of bits about how you can replace some of the scary bits of XP so management doesn’t duck and run.

Miscellaneous stuff coming out of Q&A at the end of the break:

  • How do you account for time taken by documentation?
    • Business analysts are usually an iteration ahead of the developers
    • Formal documentation (Help files, etc.) is usually an iteration behind the developers
    • When you’re ready to ship to the outside world, documentation usually catches up during the final testing phase
  • Who writes the stories?
    • Typically, the business analyst creates the stories and the release plan, and then gets it approved by the actual end-user.
    • Then technical stories get inserted (and may have dependencies). Keep these to a minimum, but they’re always there.
    • Don’t necessarily break everything down into sub-tasks.
  • How do you get started with the first iteration?
    • Iteration 0
    • Iteration 0 is where you create spikes, and establish architectural guidelines for things you know will be difficult to change.
    • Also find hardware, set up version control, CruiseControl, etc.
    • Ideally you want to be able to sit down and start coding at the beginning of Iteration 1.
    • If necessary, add an Iteration -1.
  • “If you are FedEx-ing hardware around, you have a serious build problem.”

Project Velocity

  • Measure of how fast work is getting done on your project
    • Count how many user stories (or programming tasks) were completed during the iteration
    • Total up the estimates that these stories received
  • Helps future estimation during the next Iteration Planning Meeting
  • During the Iteration Planning Meeting, devs sign up for last week’s velocity worth of stories
  • Don’t divide the velocity by # of developers – not a useful individual metric
    • Doesn’t help determine overall velocity
    • Can’t compare it to other projects
    • Discourage team building
    • If A is behind and B is ahead, B should step forward and say “I’ll help so we can finish everything on time”
    • Re-estimate if needed
  • OK to use other metrics as long as they’re reasonable, accurate, team-based, provide real value, and are simple enough to understand.
    • Cyclomatic complexity – number of decisions/loops per method
    • Together (for Java or .NET) will run audits and metrics for you
    • See Brian Sletten’s “Applied Object-Oriented Metrics” or David Bock’s “Software Metrics and the Great Pyramids of Giza
  • Metrics are actually really simple. Estimated vs. actual.
  • In general, don’t try to measure this with Microsoft Project

Iterations give you defense against manager’s “This new feature’s gotta go in right now!”

  • “Sure, we’ll be happy to do that. Which of these accepted features do we take out of the iteration?”
  • They get to prioritize it.

Iterative Development

  • Divide everything into iterations
  • Schedule tasks in the Iteration Planning Meeting
  • It is against the rules to look ahead and try to implement anything not scheduled in this iteration (YAGNI – You Ain’t Gonna Need It)
  • There’s always time later to respond to changing requirements. Don’t try to do it now.
  • Hard to substitute
    • Tight iterations provide valuable feedback
    • Keep team focused
  • Avoid “Big Bang” deployments
    • Even if you must deploy that way, do internal “Little Bang” deployments
    • Deploy often to User Acceptance Testing
  • Tell the boss: “Find bugs faster!”

Iteration Planning

  • Meeting at the beginning of each iteration
  • User stories chosen by the customer (from the release plan, most important first)
  • Failed acceptance test
  • Translated onto index cards (before iteration planning, stories might be in Excel, or whatever is useful to the customer)
    • Each task should be 1-3 ideal programming days (apply a multiplier to get “real world” days)
    • Developers sign up for tasks, then estimate how long each will take
  • Velocity lets you tell if the iteration is overbooked
  • Don’t be tempted to change your task and story estimates
    • Planning relies on cold reality of consistent estimates
    • Fudging them to be lower means you’re lying to yourself – you’re telling management “we don’t trust our estimates”, and management is telling you “we don’t trust your estimates, either”
  • Estimates does include the time it will take to write the unit tests, but shouldn’t assume time for bug-fixes
  • It’s normal to re-estimate stories and release plan every 3-5 iterations
  • Always do the most important stories (per the customer) first
  • If you’ve got Big New Stuff That Everyone Needs To Learn, try to include that in the estimates, or add a spike as a technical card (can’t really do it as a spike, since everyone needs to learn it)
  • Substitutions: Any kind of similar planning is OK, but keep it iterative
  • Long-term estimates must be fluid
  • Developers must create their own estimates. Gives dev buy-in, and estimates get better over time.
  • Tell management: “We’re getting developers more interested in the project management side, and honing their estimation skills.”

Move People Around

  • Change dev assignments weekly, daily, or hourly
  • Prevents serious knowledge loss and bottlenecks; spreads overall knowledge around
  • May be as simple as encouraging everyone to try working on a different section of the system at least part of each iteration
  • Pair programming eliminates transition time – move one pair at a time
  • Harder if you don’t pair program
  • Feedback goal is to spread understanding around the project
  • Can accomplish this with regular code reviews – less effective but better than nothing
  • Helps avoid the Truck Number anti-pattern (see http://c2.com/cgi/wiki?TruckNumberFixed)
  • Explain Truck Number to management. Explain that it’s risk. Managers understand risk.

Agile and Extreme Programming: A Pragmatic Approach (part 2)

Random note: Wouldn’t it be fun to work on a project code-named “Sisyphus”?

 

Some agile methodologies:

  • Extreme programming
  • Scrum
  • Crystal methodologies (different ones for different team sizes)

XP

  • ThoughtWorks has decided XP is the best thing they’ve tried. Always use it for flat-bid projects, and refuse projects that can’t be done with some agile process.
  • XP started in mid-1990s (predates Agile Manifesto)
  • Kent Beck and Ward Cunningham thinking about what made software simple to create, and what made it difficult
  • In March 1996, Kent started a project at DaimlerChrysler using new concepts, called it “Extreme Programming”

The 4 XP dimensions

  • Communication
  • Simplicity
  • Feedback
  • Courage

Can you create a less-scary XP by substituting out the stuff that scares the boss?

  • XP is carefully calibrated for feedback, and relies on feedback to work. (XP is actually a highly-coupled process!)
  • Lots of backward arrows in the XP workflow diagrams (see e.g. http://www.extremeprogramming.org/map/project.html)
  • If you randomly remove some of those backwards arrows, you lose feedback, and you harm everything beyond that point
  • Anything you replace has to provide approximately the same level of feedback
  • You can’t randomly pick and choose – you’ll create something that’s worse than many other methodologies, and you’re sure to fail

The Planning Practices

  • User stories
    • Not requirements lists
    • Narratives that describe the ideal way in which the user plans to use the system for one piece of behavior
    • Trying to capture that rich face-to-face communication through a narrative
    • Used to create time estimates for release planning and acceptance testing
    • Example: “I can add a customer to the system.”
    • Usually about 3 sentences written by the customer, in the customer’s terminology, without techno-speak. You’re not allowed to use words like “database”.
    • Not nearly as detailed as traditional requirements. You discover things like “which database fields do I need?” as you start to write the code. (You’ve got full-time access to a business analyst, right?)
  • Aside: testing
    • Unit tests for developers
    • Acceptance testing to say whether it’s complete
    • There’s no such thing as “80% done” for a single story. A story is either done or it’s not.
  • Acceptance tests:
    • One story may have several acceptance tests (e.g. to cover validation requirements for that new customer, etc.)
    • Business analyst, or end-user, defines the acceptance test – not the programmer!
    • Worst case: acceptance tests in an Excel spreadsheet, and someone runs the tests manually
  • Can you do XP without index cards?
    • Sure
    • Must be very flexible
    • Must be granular
    • “CaliberRM is an outstanding choice for this”
    • Requirements should be as narrative as possible, not just a dry list of facts
    • Caliber is looking at features to make it more agile-friendly
    • Session on Thursday on using Caliber with agile
    • Neither user stories nor requirements should delve into technical details
  • Aside: wikis
    • ThoughtWorks has two systems for shared document management: Lotus Notes and Confluence
    • VeryQuickWiki (Java servlet)
    • Instiki (written in Ruby!) – Three steps: you download it, run it, and there is no third step
  • Release planning
    • Creates the schedule
    • Used to create iteration plans for each iteration
    • Decisions
      • Technical decisions made by technical people
      • Business decisions made by business people
    • Dev team estimates each user story in terms of ideal programming days/weeks (very rough)
    • Do not change estimates just because management is displeased
      • Can’t tell accounting that they have less time to do the taxes this year
      • But they try to do the same thing with developers
    • Project can be quantified by scope, resources, time, quality
      • Management can pick three; developers pick the other
      • Management usually picks scope, resources, and time, and leaves quality to the developers, because they understand the first three, and can’t understand how to quantify quality.
      • Lowering quality may have impact later in the project
    • Quantifying quality
      • Code coverage statistics (AutomatedQA does this for Delphi/Win32)
      • No “I think it’s pretty high quality” or “it’s 80% done” – give actual statistics
      • If you want to let quality slide, I can’t be responsible for bugs – because if I don’t have tests for it, I don’t know whether it works or not.
      • Doesn’t guarantee it does what the business analyst – that’s what acceptance tests are for
  • Release Planning Substitutions
    • Goal: consensus between devs, management, and business people
    • Don’t stop until everyone is happy (or at least equally unhappy)
    • Don’t leave out the business stakeholders
    • Don’t leave out the developers (you wouldn’t change accounting practices without accounting there, would you?)
  • Small Releases
    • Frequent small releases to customers (or customer proxies)
    • These are *releases*, not “90% done”. Done has to be binary (either done or not), not a percentage. That’s the only way you can gather meaningful statistics. (See http://excastle.com/blog/archive/2005/07/18/1928.aspx)
    • It’s releasable – and done – when all the acceptance tests pass
    • Do important stuff first. The longer you wait to add an important feature, the less time you’ll have to fix it.
    • Never slip an iteration date. Stuff can hang over (if it hasn’t passed its acceptance tests yet), but the date can’t move.
    • Hard to substitute. Can simulate by doing small releases within your department, but if you slip into “90% done” mode, you’re sunk.
    • Try hard to get User Acceptance Testing in.
    • If management says “Finding bugs is your job”, say “I’ve found all the technical bugs. I need you to find the business bugs.”

Agile and Extreme Programming: A Pragmatic Approach (part 1)

The first of the preconference tutorials I’m going to this year. Presented by Neal Ford, Application Architect, ThoughtWorks.

We’re at the first break, so here’s a brain-dump of all the notes I’ve taken so far. I haven’t had a chance to simplify these yet (grin). This part is mostly a background of where agile came from, and some of the principles behind it, so if you’re already on XP you can probably skim this one.

Overview:

  • Agile in general, XP in particular
  • ThoughtWorks uses XP on a daily basis
  • What makes XP a discipline, not just a random set of activities
  • How to gently introduce agile in general, and XP in particular, into your organization
  • It’s called “Agile” for a reason. Be suspicious of anything that claims to be the “One True Way“. It won’t happen. You will have to adapt the process.

 

  • Waterfall, or BDUF (Big Design Up Front), works well for engineering physical objects
  • Assumes that requirements don’t change
    • Does anyone (besides NASA) have software requirements like this?
    • Not today, in the age of Internet time!
    • By the time you create software this way, the need for it will have disappeared
  • “Let me write the whole thing, and then I’ll tell you exactly how long it’s going to take”
  • Requirements change a lot, and those changes have to be implemented quickly
  • “Waterfall” has gotten negative connotations; euphemisms have sprung up, like SDLC (Software Development Life Cycle)

Cost of change?

  • The later in the dev cycle that a requirement changes, the bigger the impact
  • 50-200 times as costly to change late in the cycle as it is early in the cycle – ripple effects
  • How can you make coding changes with confidence?
    • Agile makes this rule not true anymore – cost becomes mostly linear
    • This idea came from mid-70s – not true anymore (except with waterfall)
    • In a waterfall shop, try asking “We’re still doing development the same way we did in the mid-70s – why don’t we try doing accounting with mechanical calculators and ledger sheets? It was good enough then!”

  • Software isn’t like engineering
  • Tolerances are much tighter
    • If you get a light-switch cover wrong in a building, the building won’t fall down
    • The same is not true of software
  • More flexible – you can test those tolerances in ways you just can’t do in the real world
    • Can mathematically prove the strength of a bridge, but you can’t really do that with software
    • Can’t do test-first with a bridge, because you lose a lot of trucks
    • You can throw as many trucks as you want at a piece of software
  • Instead of telling the developers how to write software, why don’t we find out how they work best?
    • Tight time constraints
    • Business risk of building software is too high
    • Observe successful and unsuccessful projects, and figure out what they did right/wrong
  • 17 Really Smart Guys met in Snowbird, Utah in February 2001
    • XP, Scrum, etc. were all around
    • Expected to spend the whole time arguing
    • Actually wound up writing the Agile Manifesto
    • http://agilemanifesto.org/
    • You cannot create a prescriptive process that says “This is how you write software. Go do this.”

Principles

  • Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
    • Delivery of working software – not design documents
    • Quick wins, high visibility, customer sees it sooner
    • Customer sees it sooner, so you know whether you’re still on the right track
    • Emphasis is on delivering most important (not hardest, not easiest) parts first, as defined by the customer
    • “Most important” will change throughout the course of the project. This is okay.
  • Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
    • Most projects run 1-3 month cycles, with shorter iterations within that
    • Anything longer than that, you fall back into the bad zone
    • The iteration is complete at the end of the iteration, at the expense of moving requirements
    • IT thinks management doesn’t want to hear bad news, so we sugarcoat the news. “I’ll be done in a week.”
    • They don’t mind bad news as long as there’s a good reason. But they hate feeling like they’re being lied to. But you’re not always lying, because you don’t always know how much it’s going to take to be finished.
    • Actual users review actual software during the process
  • Working software is the primary measure of progress.
    • Does not discard the need for design documentation – but you diagram to understand, then move to code, and let the diagram wither and die
    • Whiteboards
    • “Do Not Erase” must have a date
    • Draw on a whiteboard, then take a digital photo
  • Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
    • If you can gracefully accommodate changing requirements and your competition can’t, you win.
  • Business people and developers must work together daily throughout the project.
    • There is a strong correlation between links to users and project success. [Frakes 1995]
    • When you get to the point of actually writing the code, design questions come up that you couldn’t have anticipated. If the customer isn’t available, you have to guess, and you probably won’t come up with what they want.
    • The longer it takes to get information to and from the developers, the more damage will occur in the project. The farther you are from the information, the worse the information is.
    • XP insists that business stakeholders are part of the development team. Full-time.
      • This doesn’t really fly, because the customer has other work to do
      • Business analysts can be an effective go-between
  • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
    • “It is better to have motivated, skilled people communicating well and using no process at all than a well-defined process used by unmotivated individuals.” [Cockburn 2002]
    • Avoid interruptions
      • Development is both creative and technical
      • Works best when you get into “flow” mode
      • If you get broken out of flow, it takes ~15 minutes to get back in
      • The more interruptions, the harder it is to get back into the flow
      • If you have a lot of interruptions (meetings, support calls, etc.), try instituting Quiet Time for the programmers. Turn off the phone and e-mail, no meetings, etc.
      • If a question comes up, go ask the business analyst – that’s OK, it’s still on-topic and won’t interrupt flow like a random phone call will.
  • Developers are the Key
    • Hire the smartest people you can find
    • If you read the wikipedia description of Asperger’s Syndrome, you’ll swear it’s every programmer you’ve ever met
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
    • Written requirements are not rich and not interactive
    • Two people at a whiteboard are both rich and interactive
  • “Knowledge radiators” – inadvertent communication
    • Move stories up as they’re completed
    • Put user stories near a hallway or a coffee machine – people accidentally find out the status of the whole project
  • The best architectures, requirements, and designs emerge from self-organizing teams.
    • In small steps over time
    • All 3 must be allowed to grow and change
    • No ivory-tower architects that hand something down from on high
    • “Architecture”: Stuff that’s hard to change later. Have as little as possible.
    • Favor chaos over order?
  • Continuous attention to technical excellence and good design enhances agility.
    • Create good designs initially, then review and improve regularly
    • Design cleanup as compared to debt [Cunningham 2001]
  • Agile promotes sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
    • Social aspect
      • Alert, engaged staff is more effective than a tired, plodding staff
      • Long hours are a symptom that something has gone wrong
      • Only work 8 hours a day
      • Never put in overtime 2 weeks in a row
    • Technical aspect
      • Development can be viewed as a long strategic game
      • Each move sets up the next move
  • Simplicity – the art of maximizing the amount of work not done – is essential.
    • “This letter is longer than I wish, for I had not the time to make it shorter.” – Blaise Pascal
    • Sometimes more difficult to make things simpler
    • DRY (Don’t Repeat Yourself)
    • “Design for simplicity; add complexity only where you must.” Art of Unix Programming, Raymond 2003
    • “Code smell”
  • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.
    • Be constantly aware
    • Never get lazy
    • XP encourages this: one step is to fix XP when it breaks
    • How light is too light?
    • How simple is too simple?
    • Can we make this simpler?

BorCon 2005: Beginning day one

Got into San Francisco last night. All went well, although it was a bit frustrating to be on a 3.5 hour flight with a laptop with a 1.5 hour battery. Had a bit of trouble finding outlets in the hotel room, too; wound up plugging it in in the bathroom to charge overnight. I’m in luck this morning, though; I found a seat right next to an outlet.

Added bonus: in all the preconference tutorial rooms, they’ve got an empty glass at each seat and a pitcher of ice water on each table. Cool. I hope that’ll extend to the rest of the con.

This morning I’m sitting in on the Agile and XP session, and this afternoon, of course, is Danny’s session on deep-voodoo debugging. Further updates as circumstances warrant.

Countdown to BorCon

Wow, I’m getting behind in my blogging.

For anyone who might care, I’m going to BorCon next week (though I guess they’re calling it DevCon this year). I’m not sure who was in charge of scheduling it this time around; the Tuesday-through-Thursday schedule is pretty bizarre. But I’m flying out on Sunday, and on Monday I’m taking a couple of preconference tutorials, including Danny’s “Reading Tea Leaves” session. Deep voodoo debugging, from a compiler hacker. You can bet I’ll be blogging then.