Parsing directives
You learn the darndest things about a language when you try to parse it.
Directives — the decorations that go after a method heading, like stdcall, virtual, etc. — were, I had thought, always set off by semicolons:
procedure Foo; forward; procedure Bar; stdcall; deprecated;
Until I ran my parser on Math.pas, and it choked on this:
procedure SinCos(const Theta: Extended; var Sin, Cos: Extended) register;
Well, that’s just great. I had thought that directives came after the required semicolon, and that each directive had its own required trailing semicolon. Now I come to find that they come before the required semicolon, and each directive has its own optional, leading semicolon. Which meant I had to spend a couple of hours rearranging my grammar and updating my regression tests, since all directives now needed to use the same class (with its anomalous leading semicolon) and the logical order of MethodHeadingNode’s properties had changed.
And then there are procedural types (type TFoo = procedure, procedure of object, etc). I hadn’t even realized that they could have directives, since most directives don’t make any sense for a procedural type, but I forgot about calling conventions. As it turns out, not only do procedural types support directives, they support three very distinct syntaxes:
type TFoo = procedure of object; stdcall; TBar = procedure of object stdcall; TBaz = procedure stdcall of object;
Yep, any directives can show up either before or after “of object”. And the semicolons are optional in the ones after “of object”. But no, you can’t put any semicolons before “of object”. Consistency? Who needs it?
(And no, Sam, procedure of stdcall object doesn’t compile. Sorry.)
You can even mix and match:
type TFoo = procedure assembler of object cdecl; far; //1
Where this gets really weird (what, it wasn’t already?) is where it means that you can have a semicolon in the middle of a variable declaration:
var Foo: procedure; stdcall = nil;
Try reading that as a sentence.
Makes me really glad I’m hand-coding my parser — automated parsing tools would’ve gone nuts with the ambiguity in something like “zero or more directives, followed by an optional ‘of object’, followed by zero or more possibly semicolon-delimited directives”. Not to mention the variable declaration with a semicolon in the middle. It might be doable with an automated tool, but not without a lot of pain. With a hand-coded parser, it’s just a place to write more automated tests to cover the goofy behavior.
But… goofy or not, it’s valid Delphi, so I’m going to do my best to parse it. So my ProcedureTypeNode now has two properties for directives: FirstDirectives (which comes before Of and Object), and SecondDirectives (which comes after). For any non-of object types, SecondDirectives is always empty.
Sheesh.
1 No, assembler doesn’t make any sense in a procedural type. But that’s okay, because far doesn’t make any sense in Win32 at all. The compiler just ignores them both, so they’re harmless, if meaningless. The only other directives that are allowed on procedural types (besides near) are calling conventions, and I figured an example with assembler and far would be less confusing than an example with two or three contradictory calling conventions. (Which is perfectly valid, by the way, although I have no idea what it does.)
September 20th, 2007 at 11:56 am
There are several reasons for this apparent inconsistency. Originally, all directives *could not* have a semicolon between them. This was in-keeping with the notion that the semi-colon was a terminator not a separator. However, in practice, this was less than intuitive to explain or remember. So what this all means is that you cannot have a var block like this:
var
Foo: procedure;
StdCall: Integer;
Not likely to happen (who would name a variable StdCall?? However, syntactically it is possible). Had we enforced the "no semicolon" rule, then you *could* have a construct like above.
Remember, directives are *not* reserved words in the same sense as "for" "case" "while." They’re only semi-reserved based on their context. "requires" and "contains" are the same way. They’re only reserved in the context of a file with a leading "package" clause.
The whole notion of directives syntax was discussed at length internally. We were trying to be as "nice" as possible by limiting the breakage of existing code while making the syntax more "natural."
Allen.
September 21st, 2007 at 7:33 am
Then there’s the directive default, which, IIRC, has a different meaning with and without the semicolon. With the semicolon it indicates the default indexer; without it’s followed by the default value of the property.
One of the C++ TeamB guys once asked me why directives were preceded by the line separator. I didn’t know, so I directed him to Danny Thorpe, who was standing nearby. I don’t recall Danny’s precise answer, but I think it amounted to, "There’s not really a good reason for it; it’s just the way it is."
Allen, I don’t understand "in-keeping with the notion that the semi-colon was a terminator not a separator." In Pascal the semicolon is a (line/argument) separator, not a (line/argument) terminator. Moreover, semicolons have always appeared midline, as with separating multiple function arguments.
September 24th, 2007 at 1:13 am
Nice idea, nice work and nice findings!
Imagine if Codegear/Borland had to follow someone else’s standards instead of setting the rules!