What to do when ANTLR breaks your inner rules

Three or four times now, I’ve written one parser rule, test-first, and gotten everything working; and then used that rule from another rule, and watched the original tests all fail. For example:

parameter:
  parameter_style
  ident_list
  (COLON parameter_type)?
  (EQUAL expression)?;

Worked great. It would match A: Integer, it would match const Foo, it would match A, B: array of const. Passed all the tests I could throw at it.

Then I added a rule that used it:

formal_parameters:
  OPEN_PAREN
  (p:parameter)?
  CLOSE_PAREN;

(This new rule isn’t finished yet. Yes, it will accept multiple parameters by the time it’s done.)

As soon as I defined the formal_parameters rule in my grammar, my parameter tests started failing with errors like this:

antlr.NoViableAltException: unexpected token: ["",<1>,line=1,col=17]

I did some digging and some diffing, and found the problem. To take one particular example, the test was trying to parse var Foo: Integer. var matched parameter_style; Foo matched ident_list; and the colon and Integer matched COLON parameter_type. Then it tried to look for EQUAL expression, and it broke.

It was looking at the next token in the input and deciding what to do. If the next token was EQUAL, then it would take the optional EQUAL expression branch. If the next token was something that could validly come after the parameter rule (which it figured out by looking at every other rule in the grammar that could possibly use parameter), then it would skip the optional branch and return. And if neither of those cases held — if the next token was neither EQUAL nor a token that could possibly come after a parameter — then it would throw a NoViableAltException.

The problem was, somehow ANTLR got confused, and forgot that EOF was one of those somethings that could validly come after the parameter rule. After all, you might call the parameter rule all on its own, and it’s perfectly legit for it to match everything in the input stream.

ANTLR gets confused this way, every now and then. Most likely some bug in its “look at every other rule in the grammar that could possibly use parameter” logic. Fortunately, I don’t expect to use the parameter rule on its own, except in tests; it’ll always be part of some larger construct that does still work. So I used to just shrug helplessly, and move my parameter unit tests onto formal_parameters. Made the tests messier, because now they were testing two levels deep, but I got adequate coverage.

This latest time, though, a light bulb went off: I could easily force ANTLR to recognize EOF as something that can come after the parameter rule. I wrote another rule, like so, and with no other changes, magically my tests started passing:

parameter_with_eof:
  parameter EOF;

I don’t even need to use this new rule, in my tests or anywhere else. But it gives ANTLR a clue-by-four that yes, EOF can come after parameter. So everywhere I use parameter, it recognizes EOF as a valid signal to skip the optional bits. And everything starts working again.

It’d be nice if ANTLR worked right in the first place, of course. But it is at least encouraging to know that I’ve been able to work around every one of its bugs that I’ve found so far.

Leave a Reply

Your email address will not be published. Required fields are marked *