Delphi grammar quirk of the day: sealed and abstract classes

Delphi’s syntax for sealed and abstract classes is a bit bizarre. The following all compile:

TFoo = class sealed sealed sealed sealed

TBar = class abstract abstract abstract abstract

If you put the sealed keyword more than once, does that make the class somehow extra-sealed?

Interestingly, sealed and abstract are both what I’m calling “semikeywords” — they only have a special meaning in a particular context. Outside that particular context, they can be used as plain old identifiers. So you could actually have a field in the class called sealed or abstract… as long as it’s not the first thing after the class keyword. Add another field first, or a visibility specifier, and you’re fine:

TFoo = class sealed
Index: Integer;
Abstract: Boolean;

TBar = class abstract
strict private
Sealed: string;

But there is a bit of sense amidst all this oddity. They at least don’t let you mix and match sealed (cannot descend from this class) with abstract (must descend from this class):

TBaz = class abstract sealed

[Pascal Error] Project3.dpr(17): E2383 ABSTRACT and SEALED cannot be used together

Now, if they would just make it so that class abstract actually did something… (see QC#24662)

Unfinished Delphi feature of the day: virtual class helper methods

I’ve been researching the syntax for class helpers, and found some very interesting things. First, that class helpers can descend from other class helpers. And second, that they can have virtual methods.

Class helpers, for anyone not familiar with them, are a way of adding methods to an existing class — or at least, making it look like you do. The existing class is left unchanged; you might just as well be writing unit procedures that take an instance as their first parameter, except that class helpers make the code look nicer because you’re actually saying “Foo.NewThing” rather than “NewThing(Foo, …)”.

Now, since you’re not actually modyfing the existing class, your class helper can’t have any fields (there’s no place to store them). Nor can you override methods from the class you’re helping (since that would involve changing the VMT). So this whole “virtual” thing really surprised me.

So back to the interesting discoveries. First, class helpers can descend from other class helpers, but the syntax isn’t what I would have guessed:

TFooHelper = class helper for TFoo
TOtherFooHelper = class helper (TFooHelper) for TFoo

Presumably this would only make sense if they were helpers for different classes, but that’s the syntax: the parent class goes after “helper”, not after the entire “class helper for” clause. The parent must be another class helper (not an ordinary class).

Now, the really interesting bit: the compiler lets you put virtual methods on these guys.

TFooHelper = class helper for TFoo
procedure DoSomething; virtual;

TBarHelper = class helper (TFooHelper) for TBar
procedure DoSomething; override;

Except that as soon as you put “virtual;” on a class helper method, the compiler starts griping:

[Pascal Error] Project3.dpr(29): E2003 Undeclared identifier: 'QueryInterface'
[Pascal Error] Project3.dpr(29): E2003 Undeclared identifier: '_AddRef'
[Pascal Error] Project3.dpr(29): E2003 Undeclared identifier: '_Release'

Do what happened?

I was curious; I added those methods. Since you can’t add fields (e.g. FRefCount) to a class helper, I made _AddRef and _Release both return -1, to indicate that the class wasn’t refcounted. Then I wrote some code that called that virtual method, and ran my app.

The instruction at "0x004112f5" referenced memory at "0x00000000". The memory could not be "read".


So I looked at the assembly code that was generated for that virtual helper call. And sure enough, it was looking for an interface:

Project3.dpr.124: Foo.VirtualHelperMethod;
004112E2 8D4DEC           lea ecx,[ebp-$14]
004112E5 8B15B4004100     mov edx,[$004100b4]
004112EB 8BC3             mov eax,ebx
004112ED E8F629FFFF       call @GetHelperIntf // emphasis mine
004112F2 8B45EC           mov eax,[ebp-$14]
004112F5 8B10             mov edx,[eax]
004112F7 FF520C           call dword ptr [edx+$0c]

Very interesting, says I. That explains why it wanted me to implement IInterface on the helper: it somehow uses interfaces to deal with this “virtual helper method” business. But exactly which interface was it looking for, and why was it crashing? What else did I need to do? What interfaces did I need to implement? How could I implement interfaces, when the compiler doesn’t allow interface syntax (“class helper (TFooHelper, ISomething)” fails with “‘)’ expected but ‘,’ found”)?

So I opened up System.pas to look for this @GetHelperIntf method. Here’s the lone line of code in its implementation, along with the answer to why the app crashed…

function _GetHelperIntf(Instance: TObject; Cls: TClass): IInterface;

Result := nil;

Delphi bug of the day: FPU stack leak

Brian and I spent a couple hours debugging this afternoon. Our code was throwing an EInvalidOp exception: “Invalid floating-point operation”. And it was doing it in code that looked kinda like this:

S := S + '*';
Figure 1

Yyyyep. String concatenation. Was causing a floating-point exception.

It took us a good long while to figure out why, but we finally did. See, this EInvalidOp was a secondary exception. Before getting to the string concatenation, we had run through some code that did something like this:

Result := SomeArray[Index] - SomeArray[Index - 1];
Figure 2

Important details:

  • SomeArray was an array of Double.
  • Index happened to be the lowest index for SomeArray.
  • Range checking was on.

The code in Figure 2 was blowing up with an ERangeError, because SomeArray[Index - 1] was outside the bounds of the array. The code expected this, caught the exception (it was a unit test that specifically tested how the code reacted to boundary conditions), and continued on its way.

And then it did some string concatenation and blew up.

It took a lot of grovelling around in the CPU view and, finally, even the FPU view (which I have never before opened on purpose), to find the answer.

See, the floating-point unit (FPU) has its own stack. A fairly small one: the FPU stack has eight slots. And Delphi code (probably most code, for that matter) appears to pretty much expect the FPU stack to be empty most of the time. As a general thing, any given Delphi statement would start and end with an empty FPU stack. (That’s fairly standard for a stack-based architecture.) So a statement like the one in Figure 2 would break down something like this:

  • Start out with an empty FPU stack.
  • Get the value SomeArray[Index] and push it onto the FPU stack.
  • Get the value SomeArray[Index - 1] and push it onto the FPU stack.
  • Give the FPU a “subtract” instruction, which pops the top two items off the FPU stack, does the subtraction, and puts the result back on the stack. There’s now only one item on the FPU stack.
  • Pop the result off the FPU stack and put it into Result.
  • End the statement with an empty FPU stack again.

Now let’s switch gears for a minute. When you do string concatenation in place (e.g., S := S + something), the runtime tries to reuse the existing memory block. But if the new string doesn’t fit in the old block, it may need to allocate a new block somewhere else on the heap, and copy the existing content into it. (See SysReallocMem, in $(BDS)\source\Win32\rtl\sys\, if you’re really curious about how this works. Bring a flashlight.)

Delphi 2006 ships with the FastMM4 memory manager. Among other things, FastMM has very highly-optimized copy routines as part of its ReallocMem code. So highly-optimized that they have different routines for different block sizes. Here’s the one for a 36-byte block (copied from $(BDS)\source\Win32\rtl\sys\

procedure Move36(const ASource; var ADest; ACount: Integer);
fild qword ptr [eax]
fild qword ptr [eax + 8]
fild qword ptr [eax + 16]
fild qword ptr [eax + 24]
mov ecx, [eax + 32]
mov [edx + 32], ecx
fistp qword ptr [edx + 24]
fistp qword ptr [edx + 16]
fistp qword ptr [edx + 8]
fistp qword ptr [edx]

In case you can’t read that gibberish, it amounts to, “push four quad-words (8 bytes each) onto the FPU stack, then pop them into a different memory location”. The remaining four bytes are done with a regular 32-bit MOV instruction.

This takes advantage of the fact that the FPU can move around eight bytes at a time, but basically it amounts to abusing the FPU stack as scratch space. And it makes assumptions about how many FPU stack slots are available — four for Move36, the one shown above. Move44 needs five slots, Move52 needs six, Move60 needs seven, and Move68 assumes that all eight slots are available. (Beyond that, they use a different strategy.) And those assumptions should be safe, since the FPU stack is always supposed to stay empty.

Maybe you can see where I’m going with this.

Let’s go through those steps again, this time paying particular attention to what happened in our code today.

  • Start out with an empty FPU stack.
  • Get the value SomeArray[Index] and push it onto the FPU stack.
  • Get the value SomeArray[Index - 1], and — oops. Index - 1 is out of bounds. Raise an ERangeError.
  • And then, of course, you end with an empty FPU stack… right?


No, it turns out that if the second argument to a floating-point operation throws an exception, you end up with an FPU stack leak. And when there are only eight slots, leaks are Not a Good Thing.

Especially when your memory manager relies on the FPU stack.

I’ve already logged this in QualityCentral, as QC#51215. If you’re curious, the QC report includes a repro case you can play around with. Happy floating-point erroring.

Generating the grammar document

My Delphi grammar document is built from two pieces: my research, and a tool.

The research

The first step was to do all the research to figure out what the Delphi grammar is. This is not easy. The Delphi 5 documentation included an incomplete, and sometimes wildly inaccurate, BNF grammar. The Delphi 2006 documentation no longer includes the grammar; either the documentation team lost it (along with the docs for the IDE’s regular-expression syntax), or they gave up because it was so far out of date. The language has added loads of features since then: strict private, records with methods, even generics on the way.

So I had a rough sketch to start from, and an undergraduate compiler-design class from ten years ago. The rest — correcting the errors, and filling in the (large) blanks — is trial and error, and a lot of refactoring.

The upshot is, if you see something I’m missing, let me know. Fatih Tolga Ata already put class helpers and generics on my radar — although I can’t really do much with generics yet. Since there is no official (correct) grammar from CodeGear, my main method of discovering the grammar is to type stuff into the IDE and see what compiles (and, often more instructively, what doesn’t), I won’t be able to figure out the generic grammar until I have Highlander.

The tool

As I puzzle out the grammar, I document it in a YAML file. Here’s a snippet from this file:

Doc: |
! -> <number>
. -> <stringliteral>
! -> Ident
! -> NIL
. -> '(' Expression ')'
. -> SetLiteral
Doc: |
. -> BEGIN StatementList END

The ! at the beginning of a line means “I’ve implemented this in my parser”; the . means “I haven’t implemented this yet”. That’s what drives the “(Completed)” and “(In progress)” notations in the grammar document.

I wrote a Ruby script that reads this YAML file and generates the HTML Delphi grammar documentation. That Ruby script is the part that’s cool enough to figure out which rules are fully implemented (shown with a solid underline), which are partially implemented (e.g., Atom, as shown above; shown with a dashed underline), and which ones I haven’t started on yet (no underline). It also figures out the “Likely Targets” — the rules whose dependencies are (fully or partially) done: the ones I can (probably) work on next.

I edit the YAML file frequently — as you can imagine, since it reflects my completion status. And I refer to the generated HTML document just as frequently. So I’ve made the Ruby script part of my Rakefile. It works out fairly well.

Of course, uploading the HTML doc to my Web site happens… a little less frequently. I just uploaded the latest version this morning, but before that, it looks like it had been a little more than a year since my last update. I’ll try to keep it a little more active.

Delphi grammar trivia of the day: double-quoted strings

Quoted strings, in Delphi, use single quotes. Almost always.

But did you know that the Delphi compiler supports double-quoted strings, too?

I didn’t either, until I wrote a Delphi lexer. I set it loose on all of the library source that ships with Delphi, and it complained about this " character that it didn’t understand. In SysUtils.pas, as it turns out, there’s a line that looks like this:

CMP     AL,"'"

Yep. Inside an asm block, you can have a double-quoted character literal. And since I use the same lexer for all source files (i.e., I don’t have a separate lexer for asm blocks), I too have to support double-quoted strings.

But I only support double quotes around a single apostrophe: "'". After all, I don’t want to encourage this sort of thing.

Packrat Parsing

Over the past few days, I’ve taken a couple of looks back at parser generators, with an eye toward (possibly) resuming work on DGrok, my project to write a parser for the Delphi language.

Along the way, I ran across a term I wasn’t familiar with: packrat parsing. It claims to be a way to get the simplicity of a recursive-descent parser, together with the flexibility of an LL or LR parser like yacc. It also claims unlimited lookahead in linear time (at the cost of higher memory requirements).

Don’t know yet how promising packrat parsing will be — I’ve only read the first page or so of the paper — but it should at least be an interesting read.