I found a chapter called “Object Pascal Grammar” in the Delphi 6 Object Pascal Language Guide, and have been using that as a starting point for my Delphi parser. It gives a decent (though incomplete) overview of the parser, but nothing on what the lexer should look like.
Here’s something I found really peculiar:
SimpleExpression -> ['+' | '-'] Term [AddOp Term]... Term -> Factor [MulOp Factor]...
The peculiar thing is that the ‘+’ and ‘-‘ prefixes are introduced in SimpleExpression. This is weird on several levels. First of all, when I took math, we learned that the negation operator binds more tightly than multiplication. “-A” should parse as a Factor, not two levels above that as a SimpleExpression.
On a more practical level, this grammar would disallow some language constructs that I would expect to be perfectly reasonable, such as:
A + -B
Now, I suppose you could just write A – B instead. But this one would also be disallowed:
A * -B
Odd. I don’t have Delphi installed on my home PC, but I’ll have to check tomorrow and see whether these actually are accepted. If so, the grammar documentation is hosed.
(Just to test, I downloaded and installed the FreePascal compiler, and it happily parses both of the above. But according to a Pascal EBNF grammar I found, neither one should parse, and neither should A * -1, because numbers are inherently unsigned — if the grammar is right (and it does carry a big disclaimer), numbers only obtain signs once they become SimpleExpressions. Very strange.)