DGrok 0.8.1: multithreading, default options, GPL

Version 0.8.1 of the DGrok Delphi parser and tools are now available for download. Download DGrok 0.8.1 here.

What is DGrok?

DGrok is a set of tools for parsing Delphi source code and telling you stuff about it. Read more about it on the DGrok project page.

What’s new in 0.8.1?

Quick summary of what’s new (more information below):

  • Now GPL-licensed.
  • Reasonable defaults for {$IFOPT}.
  • Multithreaded parser.
  • Less memory usage when parsing twice.
  • Copy tree results to clipboard.

Now GPL-licensed

Prior versions of DGrok used NUnitLite for their unit tests, and therefore had to ship under the same license as NUnitLite: the OSL (Open Software License). I’ve never been happy about that. The world really doesn’t need yet another tiny variation on the GPL, especially when that variation isn’t GPL-compatible.

So for this release, I dumped NUnitLite and switched to NUnit. That let me drop the OSL and switch to an industry-standard open-source license, the GPL (GNU General Public License).

There are a few downsides. NUnit has tremendous overhead; on my laptop, it takes about fifteen seconds just to start the NUnit console runner and load the tests, plus the time to run them (which is also slower than under NUnitLite). It also adds an extra 321 Kb to the download size. And now I have to clutter my test code with a bunch of stupid [Test] attributes.

If I think that’s a good trade, then apparently the OSL annoyed me more than I thought.

Reasonable defaults for “{$IFOPT}”.

An annoyance in previous versions (even to me) was that, if you were parsing code that contained things like {$IFOPT C+}, you would have to switch to DGrok’s “Options” page to tell it which compiler settings it should consider to be “on” and which are “off”. If it hit an {$IFOPT} you hadn’t told it about, it would fail to parse that source file.

In 0.8.1, that’s no longer the case. DGrok knows about the default compiler options in a clean install of Delphi, and by default, it assumes you’re using those options. You can still use the Options page to override those settings one by one (e.g. if you compile with range checking on, and want DGrok to parse code inside your {$IFOPT R+} sections), but it’s no longer necessary to do it for every single option.

If anyone’s curious, here are the settings DGrok uses. I just opened Delphi (actually Turbo Delphi) and pressed Ctrl+O Ctrl+O, which prefixes the current file with all the compiler directives currently in effect. Then I did a bit of testing on the odd cases, like A and Z (which can have numbers in addition to + or -, and which do have numbers when inserted by Ctrl+O Ctrl+O). Here’s what I wound up with:

B-, C+, D+, E-, F-, G+, H+, I+, J-, K-, L+, M-, N+, O+, P+, Q-, R-, S-, T-, U-, V+, W-, X+, Y+, Z-

You may notice that A isn’t listed. A is an oddball case, in that it’s treated as neither on nor off. That is, {$IFOPT A+} and {$IFOPT A-} will both evaluate as “false”. There’s a compelling reason for that: it’s what Delphi does under the default settings! So don’t blame me; I’m just being compatible with the real Delphi compiler.

Multithreaded parser

When you use the DGrok demo app to parse a source tree, it now spins up multiple threads to do the parsing. There’s a setting on the “Options” tab to control how many threads you want it to use.

I actually implemented this a few months back, and since then, it’s occurred to me that I was making the problem too complicated — life would be simpler if I’d just used the ThreadPool, and queued a work item for each file I wanted to parse. Oh well; what’s there seems to work. I’ll probably do the thread-pool thing in the future, though.

Less memory usage when parsing twice

I’m embarrassed by this one. In previous versions, if you clicked “Parse” more than once in the same program run (e.g. if you were tweaking the “Options” to deal with {$IFOPT}s), DGrok would temporarily take twice as much memory as it needed to. That’s because I built the new list, and then stored it in the top-level variable… so the old list (stored in that same variable) was still “live” as far as the GC knew, up until the point when I overwrote its reference at the very end.

It’s better now — it nulls out the reference before it starts parsing, so the old list gets GCed as soon as the new parse run starts allocating gobs of memory. So if you regularly parse a million-line code base (like I do), you’ll notice significantly less thrashing.

Copy tree results to clipboard

Pretty simple. There’s a “Copy” button under the tree that shows the parse results. This is mainly useful when you’ve used DGrok to search for, for example, all the with statements in your code, and now want to copy that list into Excel for easy sorting and printing.

Happy parsing!

Parallel processing made easy

Microsoft Research and the CLR team are working on something called the Task Parallel Library, which looks like it will make .NET-based parallel processing a whole lot easier than it is today. The library is supposed to be released as a CTP sometime yet this year.

The article in MSDN Magazine has all the details, and is a great read. The parallel “for” loop looks extremely useful, and accordingly gets a lot of column inches in the article — including some tips on how much parallelism is too much, and things to think about when you tune its performance. They’ve also got a convenient way to execute two (or more) methods in parallel. The higher-level stuff is all implemented in terms of task and future classes that you can use on their own.

One of the best things about this library is that it handles exceptions nicely. If you run a loop in parallel, and any of the parallel threads throws an exception, all the other threads are canceled, and the exception is re-thrown in the place you originally started the loop — just like it would be if you’d used an ordinary “for” loop. This exception handling is baked into the base Task class, so it’s there for everything in the library.

I particularly liked some of the ways they squeeze the best performance out of the library. For example, if you create a future, and later you ask for its return value, but it hasn’t yet started to run, it’ll just be run immediately on the calling thread. No task switching. That’s kinda cool! They also optimize for single-core (e.g., Parallel.For just runs as a simple loop if you only have a single-core processor), and they scale up the number of worker threads if there’s a lot of blocking going on.

All in all, this looks like it’ll be a terrific library when it ships. Of course, I found this article after I’d put experimental multithreading support into DGrok! I’ll probably rip out my code and use theirs when they ship — they’re much more expert at multithreading than I am, so not only are they a lot more likely to get all the fringe cases right than I am, but they should be faster than mine as well.

Thanks to Allen Bauer for the link to the MSDN article.

DGrok 0.8: Faster and more configurable

DGrok 0.8 is now available.

You can now configure compiler options (IFDEFs, IFOPT, etc.) from the GUI. Your settings (compiler options, search paths) are saved from session to session. And the parser is probably around 20-25% faster (thanks to Brian for profiling and finding the hot spots).

Here’s the list of new GUI features:

  • FIX: .dproj-file exclusion got broken in 0.7. Made it work again.
  • CHG: Rules are now shown in a treeview instead of as hyperlinks. And the “Parse” button is back (along with its Alt+P accelerator).
  • ENH: Much-requested: Added “Compiler Options” page, where you can specify what to do with IFDEFs and IFOPTs.
  • ENH: Much-requested: Save settings (including search paths and compiler options) on exit, and reload on program startup.
  • ENH: Sped up parsing.
  • ENH: Show elapsed time after parsing.
  • ENH: When running an action, show a summary node with the action name and the number of hits.
  • ENH: Give a better error on duplicate filenames (it now shows these errors in the tree, instead of popping up an “unhandled exception” dialog).
  • ENH: Added Find (Ctrl+F) and Find Next (F3) to source-edit boxes.

New “find stuff” actions:

  • ENH: Added FindAsmBlocks action.
  • ENH: Added FindVariantRecords action.

Things you’ll see if you’re using the parser in your own code:

  • FIX: ToCode() was throwing exceptions on nodes with a trailing, empty ListNode (commonly seen with “procedure of object”). Fixed.
  • CHG: All node-type properties now end with the word Node.
  • ENH: Add a Projects collection to CodeBase.
  • ENH: Added ParentNode property to AstNode.
  • ENH: Added ChildNodes property to AstNode. When you need to iterate a node’s children, this is easier than using Properties. It also skips over nulls.
  • ENH: Added ParentNodeOfType<> method to AstNode.

DGrok 0.7 released: putting the parser to work

DGrok 0.7 is now available.

What’s new:

  • FIX: Initialized variables and typed constants now accept the empty list, ‘()’. The grammar doc already indicated this, but the code didn’t do it. It does now.
  • CHG: In package files, allow assembly attributes after the ‘requires’ and ‘contains’ clauses, rather than before. (I know they can come after, and that’s what I currently need to parse. If they can come other places, I’ll have to extend this later.)
  • ENH: Added support for undocumented {$SOPREFIX}, {$SOSUFFIX}, and {$SOVERSION} compiler directives.
  • ENH: The “search paths” box now allows a semicolon-delimited list of directories. As a side effect, there’s no longer a Browse button.
  • ENH: The “search paths” box now lets you put “\**” at the end of a search path. This means “include subdirectories”. Without that, it only does one specific directory. In a semicolon-delimited list, this gives you the ability to recurse into some directories and not others.
  • ENH: Added features to find all “with” statements, nested methods, and global variables in the code.

You’ll notice that there’s no longer a “Parse” button; instead it’s a hyperlink. This means that it no longer has a keyboard accelerator. Sorry — I miss it too. I’ll probably add a shortcut at some point.

There are five hyperlinks in this version:

  • Parse All: parses all the source files in the search path(s). You have to do this before any of the other links will enable.
  • Show Parse Results: resets the display to show the results of Parse All. This is only useful after you’ve clicked one of the later links, and want to get back to the thing that shows which files did and didn’t parse successfully.
  • FindGlobalVariables: Gives you a list of all global variables.
  • FindNestedMethods: Gives you a list of all nested methods (procedure inside a procedure, etc.).
  • FindWithStatements: Gives you a list of all the “with” statements.

Those last three are the bare beginnings of DxCop, a tool I’m eventually planning to write on top of the DGrok parser. DxCop (named after the .NET Framework’s FxCop) will analyze your code, looking for places where you aren’t following best practices. FxCop looks at your compiled code, while DxCop will look at the source, but it’s the same idea.

For those interested in looking at the code, all three “find” features will show you how to put the Visitor pattern to use. They’re only the beginning. Feel free to suggest other such features to add (or to send me code) — but keep in mind that DGrok doesn’t yet have symbol-table support, so anything involving “find references” isn’t possible just yet.

DGrok 0.6.5 released: fixed $IFNDEF

DGrok 0.6.5 is available. It’s a minor bugfix release, hence the half-version.

What’s new:

  • FIX: {$IFNDEF UnrecognizedSymbol} was ignoring its contents. That’s the right thing for {$IFDEF} but not for {$IFNDEF}. Fixed.
  • ENH: Allow attributes in package files.
  • ENH: Add support for {$INLINE} and {$STACKCHECKS} undocumented compiler directives.

Ordinarily I would have gotten more done in an evening. It’s all Sam’s fault for distracting me with his Erlang talk last night.

DGrok 0.6 released: enter Visitor

DGrok 0.6 is now available for download.

New in this version:

  • All documented compiler directives are supported (plus one undocumented one, {$VARSTRINGCHECKS})
  • Added support for hard casts to the file keyword, e.g., file(AUntypedVarParameter)
  • Changed PointerType to allow any type, not just QualifiedIdent. This allows things like ^string.
  • Fixed runtime cast error when a program or unit has a beginend block or asm block for its initialization section. For simplicity’s sake, all init sections are now represented by InitSectionNode.
  • Ignore .dproj files when you search for *.dpr.
  • Ignore {$DEFINE} and {$UNDEF} when they occur inside a false {$IF...} block. Oops.
  • Ignore unrecognized compiler directives if they’re inside a false {$IF...} block. This lets me ignore, say, FreePascal compiler directives (since they usually occur inside {$IFDEF FPC} blocks).
  • Allow dotted names for packages (in the “Package Name” line).
  • Contains clauses now support “in filename” specifiers. (Actually, contains clauses are now handled by the same rule as uses clauses.)
  • Added support for [assembly: Attribute(...)] syntax. Yes, mostly I’ve been focusing on parsing Win32, but I have some need to deal with .NET code as well.
  • Changed it so that not-yet-defined symbols are treated as false by {$IFDEF} rather than explicitly raising an error. The error served its purpose earlier, but I reached the pain point and shut it off.
  • Added a Visitor class.

Whew. Not bad for one evening’s work.

I think Visitor will really be the way you put the parser to work. I was hoping to get some examples into today’s build, but I ran out of time. Play around if you like.

8.3 backward compatibility, or, why *.dpr returns .dproj files

CodeGear, it turns out, chose poorly when they picked .dproj as the extension to replace .bdsproj.

If you have Delphi 2007 and have tried to run DGrok 0.5, you’ve already seen what might look like a bug in DGrok: when the file masks include *.dpr, DGrok looks at all of your .dproj files as well. Obviously this is wrong, right? I mean, you searched for *.dpr, not *.dpr*. And since .dproj files are XML (which doesn’t so much fit with the Delphi grammar), every .dproj file is listed as a parse failure.

The thing is, it’s not really a DGrok bug (although it is broken, whoever’s fault it may be, and I will have a fix in 0.6). It’s a combination of a Windows backwards-compatibility feature, and someone at CodeGear not knowing about that Windows backwards-compatibility feature (and making a bad choice as a result).

First, a bit of background. Apparently there exist applications — perhaps vital line-of-business apps that run Fortune 500 companies — that still, 12 years after Windows 95 was released, can’t deal with anything but 8.3 filenames. That isn’t too surprising, really; if the app still works, who’s going to mess with it? But (here’s the kicker) some of these 8.3-only apps use the full Win32 API to search for files. I haven’t quite puzzled out how that combination comes to be — it’s not like there was a time when there were Win32 APIs but not long filenames — but after a few years of reading Raymond Chen‘s blog, I’m forced to believe that such apps really do exist somewhere (and I suspect I really don’t want to know how many there are).

Every file has a long filename and and a short (8.3) filename. (I think you can turn off 8.3 filename generation, but nobody does, because 8.3 filenames are handy for apps that can’t deal with spaces in path names.) So if you create a file called, say, MyProj.dproj, it will also be given an 8.3 name like, say, MyProj.dpr.

And because of the possibility of 32-bit 8.3 apps, whenever you supply a search mask, the 32-bit file-search APIs (FindFirstFile, etc.) will match that mask against both the long filename and the short filename. Which means, if you search for *.dpr, Windows will happily give you both .dpr files (whose long and short names both match *.dpr) and .dproj files (whose short names match *.dpr, even though their long names do not). And it doesn’t even have the decency to return the short filename when that’s what it matched — no, it returns the full long filename, the one that specifically didn’t match the mask you gave it.

Yes, it’s all very weird, and the Windows behavior isn’t exactly friendly. Then again, it’s been that way for, let’s see, 12 years. This behavior is a known, a constant. And it’s up to us, as software developers, to deal with it. It’s just another developer tax we have to pay, like it or not.

Which means that, CodeGear, you blew it with that .dproj extension. By choosing something that’s 8.3-ambiguous with .dpr, you made life hard for every third-party vendor who wants to do anything with Delphi source files. Here’s the rule (for future reference): When you invent a new extension, its first 3 characters should never be the same as an exactly-3-character extension you already own.

I just coded a workaround for 8.3 compatibility, which will be in the next release of DGrok. Specifically, I added a loop that looks at each filename and grabs its extension. If the extension is “.dproj” (case insensitive), I skip over that file. (This means that if you’re silly enough to type in a mask of “*.dproj”, the new version won’t find any files at all.) Clumsy? Yeah. But .dproj files are the only ones (currently) likely to come up as false positives, and they’ll never contain valid Delphi code (unless, of course, you specifically rename them to taunt me). And an awkward fix is better than a broken app.

DGrok 0.5 released: Enter {$INCLUDE}

DGrok 0.5 is available for download.

Here’s what’s new:

  • {$INCLUDE} works. However, it doesn’t yet have a way to specify search paths for those include files. Currently it assumes that the included file is in the same directory as the including file. (Relative paths, such as {$I ..\Foo.inc}, should also work.) This also means that include files don’t work in the Ad-Hoc Parsing view, since the “current file” doesn’t have a directory.
  • In the demo app, double-clicking on a failed file no longer brings up the two-pane window with a “Parse” button; it only brings up one pane showing the file with the error. (Yes, it still puts the cursor at the error position.) It shows the file that actually contains the error, even if that’s an include file (cool!). The downside is that you can’t comment out the offending line and click “Parse” again, but I have no idea how that would work if you’re not looking at the top-level file.
  • {$DEFINE} and {$UNDEF} are now supported. They work with {$IFDEF}, {$IFNDEF}, {$IF Defined(...)}, and {$IF not Defined(...)}. (More-complicated $IF expressions, like {$IF Defined(Foo) OR Defined(Bar)}, must still be addressed manually in code.)
  • Improved default list of conditional defines: CONDITIONALEXPRESSIONS, CPU386, MSWINDOWS, WIN32, and a stab at VERxxx.
  • Fixed colon syntax to allow two colons.
  • Allow a number with a decimal point but no digits after it.
  • Demo app: added a place to specify file masks (so you can parse files other than *.pas).
  • Demo app: made source-tree parsing run in a background thread to improve responsiveness.
  • Strong typing on all node types. (As much as possible. There’s not much I can do for, say, statements, which can be represented by any number of different node types.)

Next up: the Visitor pattern.

DGrok 0.4 released: grammar 100% done

I’m now able to parse 100% of the Delphi grammar, as far as I can tell — and 100% of the Delphi RTL source. (Not all the source — just the RTL directory and its subdirectories — but still.) Sounds like an excellent time to do another release. DGrok 0.4 is now available.

Important caveat: I’m not yet handling {$INCLUDE}. And I know the Delphi RTL uses it, so it’s possible I’m not parsing everything — there may well be some gotchas inside those include files. {$INCLUDE} will most likely be the next thing I tackle.

For any who are curious, I didn’t make Str‘s colon a general binary operator, because that broke case statements. (This is why I would never dream of writing a parser any way but test-first!) Instead, I added another expression type, ParameterExpression, that’s only used in parameter lists.


The end was in sight. 18 files were failing with “Expected EndKeyword but found WithKeyword”, 2 with “Expected FinallyKeyword but found WithKeyword”. Parsing the “with” statement looked like the only thing between me and 100% parsing of the Delphi RTL source.

So I wrote the tests for RuleType.WithStatement, made them pass, and…

Expected CloseParenthesis but found Colon (1)

D’oh! Now that it was getting past the with statement, it found something else in System.pas. I forgot about Str(), one of those pesky compiler-magic functions. Here’s the offending line of code:

Str(val:0, S);

I think Str is the only function in Delphi that takes that colon syntax. How much code do you suppose is hanging around in the compiler just to handle that one special case?

Sigh. Well, I’ll probably just implement the colon as another binary operator, and figure out what its precedence should be. When I release 0.4 (probably no later than this weekend), it should be able to fully parse all the Delphi RTL source files.

Well, except for {$INCLUDE}. But that’s a different feature…