Delphi bug of the day: FPU stack leak

Brian and I spent a couple hours debugging this afternoon. Our code was throwing an EInvalidOp exception: “Invalid floating-point operation”. And it was doing it in code that looked kinda like this:

S := S + '*';
Figure 1

Yyyyep. String concatenation. Was causing a floating-point exception.

It took us a good long while to figure out why, but we finally did. See, this EInvalidOp was a secondary exception. Before getting to the string concatenation, we had run through some code that did something like this:

Result := SomeArray[Index] - SomeArray[Index - 1];
Figure 2

Important details:

  • SomeArray was an array of Double.
  • Index happened to be the lowest index for SomeArray.
  • Range checking was on.

The code in Figure 2 was blowing up with an ERangeError, because SomeArray[Index - 1] was outside the bounds of the array. The code expected this, caught the exception (it was a unit test that specifically tested how the code reacted to boundary conditions), and continued on its way.

And then it did some string concatenation and blew up.

It took a lot of grovelling around in the CPU view and, finally, even the FPU view (which I have never before opened on purpose), to find the answer.

See, the floating-point unit (FPU) has its own stack. A fairly small one: the FPU stack has eight slots. And Delphi code (probably most code, for that matter) appears to pretty much expect the FPU stack to be empty most of the time. As a general thing, any given Delphi statement would start and end with an empty FPU stack. (That’s fairly standard for a stack-based architecture.) So a statement like the one in Figure 2 would break down something like this:

  • Start out with an empty FPU stack.
  • Get the value SomeArray[Index] and push it onto the FPU stack.
  • Get the value SomeArray[Index - 1] and push it onto the FPU stack.
  • Give the FPU a “subtract” instruction, which pops the top two items off the FPU stack, does the subtraction, and puts the result back on the stack. There’s now only one item on the FPU stack.
  • Pop the result off the FPU stack and put it into Result.
  • End the statement with an empty FPU stack again.

Now let’s switch gears for a minute. When you do string concatenation in place (e.g., S := S + something), the runtime tries to reuse the existing memory block. But if the new string doesn’t fit in the old block, it may need to allocate a new block somewhere else on the heap, and copy the existing content into it. (See SysReallocMem, in $(BDS)\source\Win32\rtl\sys\getmem.inc, if you’re really curious about how this works. Bring a flashlight.)

Delphi 2006 ships with the FastMM4 memory manager. Among other things, FastMM has very highly-optimized copy routines as part of its ReallocMem code. So highly-optimized that they have different routines for different block sizes. Here’s the one for a 36-byte block (copied from $(BDS)\source\Win32\rtl\sys\getmem.inc):

procedure Move36(const ASource; var ADest; ACount: Integer);
asm
fild qword ptr [eax]
fild qword ptr [eax + 8]
fild qword ptr [eax + 16]
fild qword ptr [eax + 24]
mov ecx, [eax + 32]
mov [edx + 32], ecx
fistp qword ptr [edx + 24]
fistp qword ptr [edx + 16]
fistp qword ptr [edx + 8]
fistp qword ptr [edx]
end;

In case you can’t read that gibberish, it amounts to, “push four quad-words (8 bytes each) onto the FPU stack, then pop them into a different memory location”. The remaining four bytes are done with a regular 32-bit MOV instruction.

This takes advantage of the fact that the FPU can move around eight bytes at a time, but basically it amounts to abusing the FPU stack as scratch space. And it makes assumptions about how many FPU stack slots are available — four for Move36, the one shown above. Move44 needs five slots, Move52 needs six, Move60 needs seven, and Move68 assumes that all eight slots are available. (Beyond that, they use a different strategy.) And those assumptions should be safe, since the FPU stack is always supposed to stay empty.

Maybe you can see where I’m going with this.

Let’s go through those steps again, this time paying particular attention to what happened in our code today.

  • Start out with an empty FPU stack.
  • Get the value SomeArray[Index] and push it onto the FPU stack.
  • Get the value SomeArray[Index - 1], and — oops. Index - 1 is out of bounds. Raise an ERangeError.
  • And then, of course, you end with an empty FPU stack… right?

Oops.

No, it turns out that if the second argument to a floating-point operation throws an exception, you end up with an FPU stack leak. And when there are only eight slots, leaks are Not a Good Thing.

Especially when your memory manager relies on the FPU stack.

I’ve already logged this in QualityCentral, as QC#51215. If you’re curious, the QC report includes a repro case you can play around with. Happy floating-point erroring.

Leave a Reply

Your email address will not be published. Required fields are marked *