DGrok 0.5 released: Enter {$INCLUDE}

DGrok 0.5 is available for download.

Here’s what’s new:

  • {$INCLUDE} works. However, it doesn’t yet have a way to specify search paths for those include files. Currently it assumes that the included file is in the same directory as the including file. (Relative paths, such as {$I ..\Foo.inc}, should also work.) This also means that include files don’t work in the Ad-Hoc Parsing view, since the “current file” doesn’t have a directory.
  • In the demo app, double-clicking on a failed file no longer brings up the two-pane window with a “Parse” button; it only brings up one pane showing the file with the error. (Yes, it still puts the cursor at the error position.) It shows the file that actually contains the error, even if that’s an include file (cool!). The downside is that you can’t comment out the offending line and click “Parse” again, but I have no idea how that would work if you’re not looking at the top-level file.
  • {$DEFINE} and {$UNDEF} are now supported. They work with {$IFDEF}, {$IFNDEF}, {$IF Defined(...)}, and {$IF not Defined(...)}. (More-complicated $IF expressions, like {$IF Defined(Foo) OR Defined(Bar)}, must still be addressed manually in code.)
  • Improved default list of conditional defines: CONDITIONALEXPRESSIONS, CPU386, MSWINDOWS, WIN32, and a stab at VERxxx.
  • Fixed colon syntax to allow two colons.
  • Allow a number with a decimal point but no digits after it.
  • Demo app: added a place to specify file masks (so you can parse files other than *.pas).
  • Demo app: made source-tree parsing run in a background thread to improve responsiveness.
  • Strong typing on all node types. (As much as possible. There’s not much I can do for, say, statements, which can be represented by any number of different node types.)

Next up: the Visitor pattern.

DGrok 0.4 released: grammar 100% done

I’m now able to parse 100% of the Delphi grammar, as far as I can tell — and 100% of the Delphi RTL source. (Not all the source — just the RTL directory and its subdirectories — but still.) Sounds like an excellent time to do another release. DGrok 0.4 is now available.

Important caveat: I’m not yet handling {$INCLUDE}. And I know the Delphi RTL uses it, so it’s possible I’m not parsing everything — there may well be some gotchas inside those include files. {$INCLUDE} will most likely be the next thing I tackle.

For any who are curious, I didn’t make Str‘s colon a general binary operator, because that broke case statements. (This is why I would never dream of writing a parser any way but test-first!) Instead, I added another expression type, ParameterExpression, that’s only used in parameter lists.


The end was in sight. 18 files were failing with “Expected EndKeyword but found WithKeyword”, 2 with “Expected FinallyKeyword but found WithKeyword”. Parsing the “with” statement looked like the only thing between me and 100% parsing of the Delphi RTL source.

So I wrote the tests for RuleType.WithStatement, made them pass, and…

Expected CloseParenthesis but found Colon (1)

D’oh! Now that it was getting past the with statement, it found something else in System.pas. I forgot about Str(), one of those pesky compiler-magic functions. Here’s the offending line of code:

Str(val:0, S);

I think Str is the only function in Delphi that takes that colon syntax. How much code do you suppose is hanging around in the compiler just to handle that one special case?

Sigh. Well, I’ll probably just implement the colon as another binary operator, and figure out what its precedence should be. When I release 0.4 (probably no later than this weekend), it should be able to fully parse all the Delphi RTL source files.

Well, except for {$INCLUDE}. But that’s a different feature…

DGrok 0.3 released

Halfway there! DGrok can now parse 46 of the 91 Delphi RTL source files — a hair over half.

Except that, of course, it’s way more than halfway; it took a lot of work to get this far. Most of what’s left is the various statements like repeat, with, try..finally, etc.

And then, of course, there’s the fact that it’s probably way less than halfway. Include files ({$I} / {$INCLUDE}) aren’t working, and I haven’t figured out how they’ll fit into the demo app yet. And there’s the whole issue around symbol tables, which I’ll need to do anything useful like refactoring.

Still, it’s a major milestone, so I figured I’d do a release. DGrok 0.3 is now available for download. Major new stuff in this release:

  • If you double-click a file that’s failing because of a compiler directive, it will now take you to the error location rather than showing a .NET “unhandled exception” dialog.
  • Began adding statement handling. Currently it can’t handle much — mostly method calls, assignments, and if statements. This is my main area of focus right now.
  • Parsing of method implementations, including the smarts to not expect a method body if the method is declared “forward” or “external”.
  • Parsing of unit implementation sections.
  • Parsing of “program” and “library” files.
  • Fixed parsing of “const” sections that come inside a class or record. (When the const section was followed by another visibility section, it was getting confused and thinking the “public” was the name of another constant; it doesn’t anymore.)
  • Many minor tweaks to the grammar (it turns out that semicolons are optional after field declarations; I didn’t have threadvar in the grammar yet; initialized record-type variables; operator overloads; that sort of thing).
  • Lots of exciting behind-the-scenes stuff that you wouldn’t recognize as cool unless you’d already been working with the code: strongly-typed node properties, generic ListNode and DelimitedItemNode, and partial classes.

U of I, ISU selling student data to credit-card companies

I was in Ames this weekend at a church meeting. On the way back, I stopped at a gas station that offers free Sunday papers with a tank of gas.

Cover story: U of I, ISU use student data to sell credit cards. Subhead: “Bank of America cards are promoted despite public worries about debt loads”.

I can’t even believe it. Someone had the balls to think it was okay to sell private student data to these predators? Someone thought it was okay to railroad these students into debt? (Yes, the students can choose whether to go into debt — but it’s not an educated choice; nobody’s teaching them what debt is actually going to mean to their lives. Nobody but the credit-card companies… and, apparently, these two universities.)

How many students already declare bankruptcy as their first act after graduation? We want this to get worse?

Apparently most of the money isn’t even going to benefit the schools; it’s going to privately-run alumni associations. Alumni associations. They’re preying on the younger generation. And these people can sleep at night?

If I still lived in Iowa, I’d be working my butt off writing to the legislature right now. As it is, I should probably do what I can to make sure this doesn’t happen in Nebraska (or gets fixed, if it’s already happening), and write to the federal Congress as well. This is unacceptable.

Somebody should be getting fired over this stunt. Somebody should be getting arrested.

DGrok 0.2 released

DGrok 0.2 is now available.

It can now parse 13 of the Delphi RTL source files (only 78 more to go). It should be able to parse interface sections in full, with the exception of class (and record) helpers and records with variant sections.

I also made some improvements to the demo app. The biggest change, apart from fixing the hard-coded path, is that you can double-click a filename in the treeview. This brings up that file in a new window — and if there was an error parsing the file, the cursor is positioned at the error location. (But no, there’s no GUI yet for specifying IFDEFs.)

Next stop: Statements.

Parsing directives

You learn the darndest things about a language when you try to parse it.

Directives — the decorations that go after a method heading, like stdcall, virtual, etc. — were, I had thought, always set off by semicolons:

procedure Foo; forward;
procedure Bar; stdcall; deprecated;

Until I ran my parser on Math.pas, and it choked on this:

procedure SinCos(const Theta: Extended; var Sin, Cos: Extended) register;

Well, that’s just great. I had thought that directives came after the required semicolon, and that each directive had its own required trailing semicolon. Now I come to find that they come before the required semicolon, and each directive has its own optional, leading semicolon. Which meant I had to spend a couple of hours rearranging my grammar and updating my regression tests, since all directives now needed to use the same class (with its anomalous leading semicolon) and the logical order of MethodHeadingNode’s properties had changed.

And then there are procedural types (type TFoo = procedure, procedure of object, etc). I hadn’t even realized that they could have directives, since most directives don’t make any sense for a procedural type, but I forgot about calling conventions. As it turns out, not only do procedural types support directives, they support three very distinct syntaxes:

TFoo = procedure of object; stdcall;
TBar = procedure of object stdcall;
TBaz = procedure stdcall of object;

Yep, any directives can show up either before or after “of object”. And the semicolons are optional in the ones after “of object”. But no, you can’t put any semicolons before “of object”. Consistency? Who needs it?

(And no, Sam, procedure of stdcall object doesn’t compile. Sorry.)

You can even mix and match:

TFoo = procedure assembler of object cdecl; far; //1

Where this gets really weird (what, it wasn’t already?) is where it means that you can have a semicolon in the middle of a variable declaration:

Foo: procedure; stdcall = nil;

Try reading that as a sentence.

Makes me really glad I’m hand-coding my parser — automated parsing tools would’ve gone nuts with the ambiguity in something like “zero or more directives, followed by an optional ‘of object’, followed by zero or more possibly semicolon-delimited directives”. Not to mention the variable declaration with a semicolon in the middle. It might be doable with an automated tool, but not without a lot of pain. With a hand-coded parser, it’s just a place to write more automated tests to cover the goofy behavior.

But… goofy or not, it’s valid Delphi, so I’m going to do my best to parse it. So my ProcedureTypeNode now has two properties for directives: FirstDirectives (which comes before Of and Object), and SecondDirectives (which comes after). For any non-of object types, SecondDirectives is always empty.


1 No, assembler doesn’t make any sense in a procedural type. But that’s okay, because far doesn’t make any sense in Win32 at all. The compiler just ignores them both, so they’re harmless, if meaningless. The only other directives that are allowed on procedural types (besides near) are calling conventions, and I figured an example with assembler and far would be less confusing than an example with two or three contradictory calling conventions. (Which is perfectly valid, by the way, although I have no idea what it does.)

DGrok download updated

If you tried to download DGrok before now, you may have had trouble opening the ZIP. Sorry about that.

I use 7-Zip on my home dev machine, because it’s free and open-source. It also comes with a command-line EXE, so I made a Rake task to automatically build a ZIP file for the DGrok distribution (took me most of Sunday to get everything right). It, ah, didn’t occur to me that 7-Zip would default to using its own file format, instead of standard ZIP. (It worked fine on my machine!)

I dug through the docs and found the “no, really, make a ZIP file” parameter (-tzip, if you’re interested). The updated DGrok 0.1 is now available for download. Let me know if the download causes you any problems.

DGrok 0.1 released

I can successfully parse four of the source files that ship with Delphi. I’d say that’s a major milestone. So I’m releasing version 0.1 (alpha) of DGrok.

The source code is included in the download. DGrok is open-source, under the Open Software License (I’d rather use GPL, but I’m stuck with OSL because of NUnitLite).

I’ve included a GUI demo app that shows off the current capabilities a bit. It has two major screens: Ad-Hoc Parsing and Parse Source Tree.

Screenshot of DGrok's Ad-Hoc Parsing screen

Ad-Hoc Parsing lets you type in some source code, select which parse rule you want to use, and click Parse (Alt+P). The box in the lower right shows either the parse tree (if parsing was successful) or the error message. Additionally, if there was an error, the focus is put back in the edit box, and the cursor is moved to the error location.

If you want to type an entire source file, select the Goal rule (this is selected by default when the app starts). Or you could use Unit, if you know it’s really a unit and not a project, library, or package. If you just want to play around with expression parsing, select Expression. Or whatever. There’s a Grammar.html included in the ZIP that shows which rules are working in this release, and to what extent.

Screenshot of DGrok's Parse Source Tree screen

The Parse Source Tree tab lets you point DGrok at a directory, and set it loose. It will automatically search through subdirectories for .pas files (I should probably make it look at .dpr and .dpk files as well), load them, and try parsing them. (Since it knows they’re entire files, it doesn’t need to ask you which rule to apply; it automatically uses Goal.) If a file parses successfully, it gets listed under the “Passing” node; otherwise, the files are listed by error message. As you can see, there are still a lot of errors, so I must not be done yet.

Note that there isn’t currently a GUI for telling it which $IFDEFs evaluate to “true” and which evaluate to “false”. And if it doesn’t know whether something is true or false, it bombs out with an error. This is on purpose — I wanted to make absolutely sure I didn’t miss anything that should be defined — but it’s probably inconvenient if you’re trying to parse anything other than the Delphi RTL source that I’ve already tuned it for. I’ll get a GUI for this in a future version.

Happy parsing!