8.3 backward compatibility, or, why *.dpr returns .dproj files

CodeGear, it turns out, chose poorly when they picked .dproj as the extension to replace .bdsproj.

If you have Delphi 2007 and have tried to run DGrok 0.5, you’ve already seen what might look like a bug in DGrok: when the file masks include *.dpr, DGrok looks at all of your .dproj files as well. Obviously this is wrong, right? I mean, you searched for *.dpr, not *.dpr*. And since .dproj files are XML (which doesn’t so much fit with the Delphi grammar), every .dproj file is listed as a parse failure.

The thing is, it’s not really a DGrok bug (although it is broken, whoever’s fault it may be, and I will have a fix in 0.6). It’s a combination of a Windows backwards-compatibility feature, and someone at CodeGear not knowing about that Windows backwards-compatibility feature (and making a bad choice as a result).

First, a bit of background. Apparently there exist applications — perhaps vital line-of-business apps that run Fortune 500 companies — that still, 12 years after Windows 95 was released, can’t deal with anything but 8.3 filenames. That isn’t too surprising, really; if the app still works, who’s going to mess with it? But (here’s the kicker) some of these 8.3-only apps use the full Win32 API to search for files. I haven’t quite puzzled out how that combination comes to be — it’s not like there was a time when there were Win32 APIs but not long filenames — but after a few years of reading Raymond Chen‘s blog, I’m forced to believe that such apps really do exist somewhere (and I suspect I really don’t want to know how many there are).

Every file has a long filename and and a short (8.3) filename. (I think you can turn off 8.3 filename generation, but nobody does, because 8.3 filenames are handy for apps that can’t deal with spaces in path names.) So if you create a file called, say, MyProj.dproj, it will also be given an 8.3 name like, say, MyProj.dpr.

And because of the possibility of 32-bit 8.3 apps, whenever you supply a search mask, the 32-bit file-search APIs (FindFirstFile, etc.) will match that mask against both the long filename and the short filename. Which means, if you search for *.dpr, Windows will happily give you both .dpr files (whose long and short names both match *.dpr) and .dproj files (whose short names match *.dpr, even though their long names do not). And it doesn’t even have the decency to return the short filename when that’s what it matched — no, it returns the full long filename, the one that specifically didn’t match the mask you gave it.

Yes, it’s all very weird, and the Windows behavior isn’t exactly friendly. Then again, it’s been that way for, let’s see, 12 years. This behavior is a known, a constant. And it’s up to us, as software developers, to deal with it. It’s just another developer tax we have to pay, like it or not.

Which means that, CodeGear, you blew it with that .dproj extension. By choosing something that’s 8.3-ambiguous with .dpr, you made life hard for every third-party vendor who wants to do anything with Delphi source files. Here’s the rule (for future reference): When you invent a new extension, its first 3 characters should never be the same as an exactly-3-character extension you already own.

I just coded a workaround for 8.3 compatibility, which will be in the next release of DGrok. Specifically, I added a loop that looks at each filename and grabs its extension. If the extension is “.dproj” (case insensitive), I skip over that file. (This means that if you’re silly enough to type in a mask of “*.dproj”, the new version won’t find any files at all.) Clumsy? Yeah. But .dproj files are the only ones (currently) likely to come up as false positives, and they’ll never contain valid Delphi code (unless, of course, you specifically rename them to taunt me). And an awkward fix is better than a broken app.

DGrok 0.5 released: Enter {$INCLUDE}

DGrok 0.5 is available for download.

Here’s what’s new:

  • {$INCLUDE} works. However, it doesn’t yet have a way to specify search paths for those include files. Currently it assumes that the included file is in the same directory as the including file. (Relative paths, such as {$I ..\Foo.inc}, should also work.) This also means that include files don’t work in the Ad-Hoc Parsing view, since the “current file” doesn’t have a directory.
  • In the demo app, double-clicking on a failed file no longer brings up the two-pane window with a “Parse” button; it only brings up one pane showing the file with the error. (Yes, it still puts the cursor at the error position.) It shows the file that actually contains the error, even if that’s an include file (cool!). The downside is that you can’t comment out the offending line and click “Parse” again, but I have no idea how that would work if you’re not looking at the top-level file.
  • {$DEFINE} and {$UNDEF} are now supported. They work with {$IFDEF}, {$IFNDEF}, {$IF Defined(...)}, and {$IF not Defined(...)}. (More-complicated $IF expressions, like {$IF Defined(Foo) OR Defined(Bar)}, must still be addressed manually in code.)
  • Improved default list of conditional defines: CONDITIONALEXPRESSIONS, CPU386, MSWINDOWS, WIN32, and a stab at VERxxx.
  • Fixed colon syntax to allow two colons.
  • Allow a number with a decimal point but no digits after it.
  • Demo app: added a place to specify file masks (so you can parse files other than *.pas).
  • Demo app: made source-tree parsing run in a background thread to improve responsiveness.
  • Strong typing on all node types. (As much as possible. There’s not much I can do for, say, statements, which can be represented by any number of different node types.)

Next up: the Visitor pattern.

DGrok 0.4 released: grammar 100% done

I’m now able to parse 100% of the Delphi grammar, as far as I can tell — and 100% of the Delphi RTL source. (Not all the source — just the RTL directory and its subdirectories — but still.) Sounds like an excellent time to do another release. DGrok 0.4 is now available.

Important caveat: I’m not yet handling {$INCLUDE}. And I know the Delphi RTL uses it, so it’s possible I’m not parsing everything — there may well be some gotchas inside those include files. {$INCLUDE} will most likely be the next thing I tackle.

For any who are curious, I didn’t make Str‘s colon a general binary operator, because that broke case statements. (This is why I would never dream of writing a parser any way but test-first!) Instead, I added another expression type, ParameterExpression, that’s only used in parameter lists.

Bugrit!

The end was in sight. 18 files were failing with “Expected EndKeyword but found WithKeyword”, 2 with “Expected FinallyKeyword but found WithKeyword”. Parsing the “with” statement looked like the only thing between me and 100% parsing of the Delphi RTL source.

So I wrote the tests for RuleType.WithStatement, made them pass, and…

Expected CloseParenthesis but found Colon (1)

D’oh! Now that it was getting past the with statement, it found something else in System.pas. I forgot about Str(), one of those pesky compiler-magic functions. Here’s the offending line of code:

Str(val:0, S);

I think Str is the only function in Delphi that takes that colon syntax. How much code do you suppose is hanging around in the compiler just to handle that one special case?

Sigh. Well, I’ll probably just implement the colon as another binary operator, and figure out what its precedence should be. When I release 0.4 (probably no later than this weekend), it should be able to fully parse all the Delphi RTL source files.

Well, except for {$INCLUDE}. But that’s a different feature…

DGrok 0.3 released

Halfway there! DGrok can now parse 46 of the 91 Delphi RTL source files — a hair over half.

Except that, of course, it’s way more than halfway; it took a lot of work to get this far. Most of what’s left is the various statements like repeat, with, try..finally, etc.

And then, of course, there’s the fact that it’s probably way less than halfway. Include files ({$I} / {$INCLUDE}) aren’t working, and I haven’t figured out how they’ll fit into the demo app yet. And there’s the whole issue around symbol tables, which I’ll need to do anything useful like refactoring.

Still, it’s a major milestone, so I figured I’d do a release. DGrok 0.3 is now available for download. Major new stuff in this release:

  • If you double-click a file that’s failing because of a compiler directive, it will now take you to the error location rather than showing a .NET “unhandled exception” dialog.
  • Began adding statement handling. Currently it can’t handle much — mostly method calls, assignments, and if statements. This is my main area of focus right now.
  • Parsing of method implementations, including the smarts to not expect a method body if the method is declared “forward” or “external”.
  • Parsing of unit implementation sections.
  • Parsing of “program” and “library” files.
  • Fixed parsing of “const” sections that come inside a class or record. (When the const section was followed by another visibility section, it was getting confused and thinking the “public” was the name of another constant; it doesn’t anymore.)
  • Many minor tweaks to the grammar (it turns out that semicolons are optional after field declarations; I didn’t have threadvar in the grammar yet; initialized record-type variables; operator overloads; that sort of thing).
  • Lots of exciting behind-the-scenes stuff that you wouldn’t recognize as cool unless you’d already been working with the code: strongly-typed node properties, generic ListNode and DelimitedItemNode, and partial classes.

U of I, ISU selling student data to credit-card companies

I was in Ames this weekend at a church meeting. On the way back, I stopped at a gas station that offers free Sunday papers with a tank of gas.

Cover story: U of I, ISU use student data to sell credit cards. Subhead: “Bank of America cards are promoted despite public worries about debt loads”.

I can’t even believe it. Someone had the balls to think it was okay to sell private student data to these predators? Someone thought it was okay to railroad these students into debt? (Yes, the students can choose whether to go into debt — but it’s not an educated choice; nobody’s teaching them what debt is actually going to mean to their lives. Nobody but the credit-card companies… and, apparently, these two universities.)

How many students already declare bankruptcy as their first act after graduation? We want this to get worse?

Apparently most of the money isn’t even going to benefit the schools; it’s going to privately-run alumni associations. Alumni associations. They’re preying on the younger generation. And these people can sleep at night?

If I still lived in Iowa, I’d be working my butt off writing to the legislature right now. As it is, I should probably do what I can to make sure this doesn’t happen in Nebraska (or gets fixed, if it’s already happening), and write to the federal Congress as well. This is unacceptable.

Somebody should be getting fired over this stunt. Somebody should be getting arrested.

DGrok 0.2 released

DGrok 0.2 is now available.

It can now parse 13 of the Delphi RTL source files (only 78 more to go). It should be able to parse interface sections in full, with the exception of class (and record) helpers and records with variant sections.

I also made some improvements to the demo app. The biggest change, apart from fixing the hard-coded path, is that you can double-click a filename in the treeview. This brings up that file in a new window — and if there was an error parsing the file, the cursor is positioned at the error location. (But no, there’s no GUI yet for specifying IFDEFs.)

Next stop: Statements.

Parsing directives

You learn the darndest things about a language when you try to parse it.

Directives — the decorations that go after a method heading, like stdcall, virtual, etc. — were, I had thought, always set off by semicolons:

procedure Foo; forward;
procedure Bar; stdcall; deprecated;

Until I ran my parser on Math.pas, and it choked on this:

procedure SinCos(const Theta: Extended; var Sin, Cos: Extended) register;

Well, that’s just great. I had thought that directives came after the required semicolon, and that each directive had its own required trailing semicolon. Now I come to find that they come before the required semicolon, and each directive has its own optional, leading semicolon. Which meant I had to spend a couple of hours rearranging my grammar and updating my regression tests, since all directives now needed to use the same class (with its anomalous leading semicolon) and the logical order of MethodHeadingNode’s properties had changed.

And then there are procedural types (type TFoo = procedure, procedure of object, etc). I hadn’t even realized that they could have directives, since most directives don’t make any sense for a procedural type, but I forgot about calling conventions. As it turns out, not only do procedural types support directives, they support three very distinct syntaxes:

type
TFoo = procedure of object; stdcall;
TBar = procedure of object stdcall;
TBaz = procedure stdcall of object;

Yep, any directives can show up either before or after “of object”. And the semicolons are optional in the ones after “of object”. But no, you can’t put any semicolons before “of object”. Consistency? Who needs it?

(And no, Sam, procedure of stdcall object doesn’t compile. Sorry.)

You can even mix and match:

type
TFoo = procedure assembler of object cdecl; far; //1

Where this gets really weird (what, it wasn’t already?) is where it means that you can have a semicolon in the middle of a variable declaration:

var
Foo: procedure; stdcall = nil;

Try reading that as a sentence.

Makes me really glad I’m hand-coding my parser — automated parsing tools would’ve gone nuts with the ambiguity in something like “zero or more directives, followed by an optional ‘of object’, followed by zero or more possibly semicolon-delimited directives”. Not to mention the variable declaration with a semicolon in the middle. It might be doable with an automated tool, but not without a lot of pain. With a hand-coded parser, it’s just a place to write more automated tests to cover the goofy behavior.

But… goofy or not, it’s valid Delphi, so I’m going to do my best to parse it. So my ProcedureTypeNode now has two properties for directives: FirstDirectives (which comes before Of and Object), and SecondDirectives (which comes after). For any non-of object types, SecondDirectives is always empty.

Sheesh.

1 No, assembler doesn’t make any sense in a procedural type. But that’s okay, because far doesn’t make any sense in Win32 at all. The compiler just ignores them both, so they’re harmless, if meaningless. The only other directives that are allowed on procedural types (besides near) are calling conventions, and I figured an example with assembler and far would be less confusing than an example with two or three contradictory calling conventions. (Which is perfectly valid, by the way, although I have no idea what it does.)

DGrok download updated

If you tried to download DGrok before now, you may have had trouble opening the ZIP. Sorry about that.

I use 7-Zip on my home dev machine, because it’s free and open-source. It also comes with a command-line EXE, so I made a Rake task to automatically build a ZIP file for the DGrok distribution (took me most of Sunday to get everything right). It, ah, didn’t occur to me that 7-Zip would default to using its own file format, instead of standard ZIP. (It worked fine on my machine!)

I dug through the docs and found the “no, really, make a ZIP file” parameter (-tzip, if you’re interested). The updated DGrok 0.1 is now available for download. Let me know if the download causes you any problems.