Telling Windows we’re not really “Not Responding”

Our app does a lot of data processing, and sometimes it goes for long enough that Task Manager will show the app as “Not Responding”. Our users tend to assume that “Not Responding” means the app is locked up and not coming back, so they’ll hit “End Task”, which frequently ends up leaving corrupted files and other such badness in its wake.

Last iteration, I suggested, and got approval for, a one-hour spike to see what we could do about this. I had a suspicion that there was an easy fix. Not a total fix, mind you, but a significant improvement. And so it was.

According to MSDN (in the docs for IsHungAppWindow), Windows considers an application to be “hung” if it goes for more than five seconds without reading anything from its Windows message queue. The idea is to detect apps that are “locked up” from the user’s point of view: not redrawing when they should, not able to be resized or minimized, etc.

There are two main approaches to fix this the “right” way:

  • The simple fix is to call Application.ProcessMessages every now and then, to read the next message from the Windows message queue, and process it appropriately.
  • The better fix is to go multithreaded: do all the lengthy processing in a second thread, leaving the main thread free to process whatever messages come its way.

But we can’t really use either of these. The biggest reason is global variables. Our app predates a lot of the current best-practice recommendations, and as a result, it has way too many globals (among other things), which right away rules out multithreading — you can’t do multiple threads unless you rigidly control who can access any global data. And our global data isn’t grouped into objects as well as it might be, so there are places where our app could crash if we handled a paint message at an unexpected time (after we had changed one data structure, but before we had synchronized those changes with another object). For safety’s sake, we had to rule out both of those “right way” fixes.

What’s left? Well, cheating. There’s a clue in that IsHungAppWindow documentation: a window is considered hung if it doesn’t call PeekMessage for five seconds. So, we just have to call it more often than that, while avoiding the pitfalls of ProcessMessages. PeekMessage is capable of a non-destructive read, and there’s nothing that says the app actually has to do anything with the messages it reads. So this code, if executed more often than once per five seconds, would be enough to suppress the “Not Responding”:

  Msg: TMsg;
  PeekMessage(Msg, 0, 0, 0, PM_NOREMOVE);

This will not cause the app to paint, or to be resizable, or minimizable, or anything like that. The user still sees the app as frozen. The only benefit is that Task Manager will still list the app as “Running”, so the user is less likely to kill it. But for us, that was an improvement worthy of an hour’s time investment.

The downside to this code is that PeekMessage is on the slow side. Running it 100,000 times in a loop took about half a second. I wanted something that took close enough to zero time that I could call it just about everywhere, including tight calculation loops or anywhere else. So I added some code to call GetTickCount, which has a granularity of about 15 milliseconds. Calling PeekMessage every 15 ms should be plenty good enough.

  LastPeekMessageTime: Cardinal = 0;

procedure TellWindowsWeArentFrozen;
  Msg: TMsg;
  if GetTickCount <> LastPeekMessageTime then
    PeekMessage(Msg, 0, 0, 0, PM_NOREMOVE);
    LastPeekMessageTime := GetTickCount;

Much faster: it takes 15,000,000 calls to this code before it takes half a second, 150 times faster than calling PeekMessage every time.

The remaining work involved calling this procedure frequently during long-running operations. It takes a lot of time to figure out every spot where that would need to be done, but to do it in most spots was no big trick. We already had our own set of wrapper classes around all of our data-access code, and the slowest parts of our program are usually slow because they’re accessing a lot of data. So I plugged in calls to TellWindowsWeArentFrozen in strategic spots in our database wrapper classes: in Open, First, Next, Prior, Delete, Post, ExecSql, etc. Boom, the majority of slow spots in our code no longer report the app as “Not Responding”. And it only took an hour, including research and optimization.

Not everything is fixed, of course. We have some UPDATE and DELETE queries that take more than five seconds. We have some slow file-access code. We have some slow in-memory calculation code that doesn’t touch the database. But in one hour of work, we made a heck of a lot of progress. And it narrowed the field: the testers are already writing up story cards for the specific parts of the program that still give “Not Responding”.

All in all, not bad for an hour’s work.

Do-It-Yourself Security Checkpoint

Photo from What-the-Hack. A couple of choice quotes:

Research has shown that people need to get inspected to feel secure, even if the actual inspection is a complete farce.

When was the last time you were inspected by someone as smart as you? With our patented inherent adaptive inspection intelligence technology the terrorists don’t stand a chance.

Read the whole thing to get the full effect. This is just too funny.

Via Bruce Schneier‘s weblog.

Windows SSH server

I’ve used SSH before (it’s darned cool — I need to do another Mere Moments Guide sometime), but I’ve always used Linux for the server side. Today, Bill MacMillan, from the church technology committee (which I’m co-chair of, by the way), told me about a Windows SSH server he found. It’s called freeSSHd, and in his words, “I got it to work — almost too easily.” Link hereby filed for future reference.

Status: Stoked

We’ve been using a cork bulletin board as our XP status board. At the beginning of each week-long iteration, we pin our story cards up under a “Pending” heading; as we start working on a given story, it moves to the “In Progress” column; and from there, the card gets a big check mark and moves to “Done”.

We’re planning to ship at the end of this iteration, and it’s been going well this week. Really well. It’s Tuesday — two-fifths of the way through the iteration — and this is what our status board looks like right now.

The far right column is “Pending”. The next column over is “In Progress”.

All the rest of it is “Done”. We’re running out of space on the board. The “Done” area won’t even fit into the four columns we’ve got room for — we’ve got cards overlapping other cards, probably five or six columns’ worth.

And it’s only Tuesday. We are kicking butt this week, and I really think the status board is part of it. It does wonders for motivation to see all those “Done” cards pile up.

Our status board really isn’t big enough for us now. We’ve gotten better at breaking big tasks into small stories, which means lots of cards — more than can fit. Usually, partway through the week, someone will take the huge mass of “Done” cards and clip them all together into one stack. I always hate when that happens, because suddenly it’s like we haven’t accomplished anything all week. This week, just before we release, I’m going to push hard to keep all those cards up on the board.

And when we move into our new bullpen area, I’ve already put my vote in for a bigger status board.

Application.HandleException’s parameter

The interesting bit first: You can pass nil to TApplication.HandleException.

Okay, now on to the background. When you’re writing Delphi code, and you throw an exception from a button’s OnClick handler or some such, your app doesn’t crash. (WinForms could learn a thing or two from Delphi.) Instead, you get a dialog box showing the exception message. Once you click OK to that dialog, your application continues to run.

This is handled by the top-level message loop, which catches exceptions and calls Application.HandleException. HandleException, by default, shows the error dialog. (You can customize this by hooking Application.OnException, in which case the dialog is not shown, and your OnException handler is called instead.)

Actually, it’s not just the top-level message loop; quite a few places in the VCL call HandleException — basically anywhere exceptions could occur, and are recoverable. And TApplication.HandleException is public, so your code can call it too. If you’re catching exceptions and showing them to the user yourself, consider it a best practice to use Application.HandleException, instead of ShowMessage or MessageDlg.

HandleException takes one parameter, and for the longest time, I thought I had to pass the exception instance for that parameter, like so:

  on E: Exception do

It turns out I was wrong — I was doing more work than I needed to. That one parameter is, in fact, the ubiquitous Sender: TObject of TNotifyEvent fame. The VCL code usually passes Self.

Even more interestingly, this parameter is never used. Oh, it gets passed along to your OnException handler, if you have one; but what on earth would you do with it? What good would it do to know which control threw an exception? (Actually, you don’t even get that. A quick look through the VCL source shows that the parameter might be a TControl, a TDragObject, a TDataModule, a TDataSet, the TApplication… good luck finding out anything useful from that parameter.)

Bottom line: HandleException’s Sender parameter is, in effect, worthless; it doesn’t tell you anything worth knowing.

So now I just do this: