Joe White’s Blog

Life, .NET, and Cats


Archive for April, 2006

Fixture instances in DUnit vs. NUnit

Saturday, April 29th, 2006

Just found this out: when you define a test fixture in NUnit, only one instance of that fixture gets created at runtime. This is different from DUnit, which creates a separate fixture instance for each test.

So this code:

procedure TFooTestCase.TestOne;
begin
  FBar := 1;
  // do something that assumes FBar is set to 1
end;

procedure TFooTestCase.TestTwo;
begin
  // do something that assumes FBar is uninitialized and is still 0
end;

is perfectly valid in DUnit, because there are two instances of TFooTestCase: one for TestOne, the other for TestTwo. In NUnit, on the other hand, TestTwo had better initialize FBar to zero if that’s the behavior it wants.

So DUnit took the “keep tests insulated from each other” side of the tradeoff, whereas NUnit took the “use less memory” path. Neither one is better than the other, but it’s good to know which one you’re using.

More details for the real geeks (like me) who want to dig into the innards of the test frameworks:

In DUnit, when you call TestFramework.RegisterTest(TFooTestCase.Suite), you get one instance of TTestSuite, which contains two instances of TFooTestCase. One TFooTestCase instance has its TestName set to “TestOne”, the other has its TestName set to “TestTwo”.

In NUnit, one instance of TestSuite is automatically created. This TestSuite contains two instances of TestMethod. One TestMethod has its private method field (type MethodInfo) pointing to TestOne; the other TestMethod points to TestTwo. Additionally, there’s a single instance of FooTestCase; the reference to this is held by the TestSuite’s Fixture property.

Making ANTLR able to fail unit tests

Friday, April 28th, 2006

I’m writing my ANTLR grammar test-first. I’ll write an NUnit test method that parses a particular string and checks the resulting parse tree; then I’ll run the test and watch it fail first; and then I’ll start writing the lexer and parser rules to make that test pass.

This is an awesome way to write a grammar. It’s already saved me several times (it was my unit tests that caught the hiccup with ANTLR confusing “.” with “..“). But the thing I learned early on is that it doesn’t work out of the box.

ANTLR’s base lexer and parser classes throw exceptions when they encounter errors. This is great; I want my tests to fail if there are parse errors (except for the tests that expect parse errors, naturally). But by default, all of the code ANTLR generates will include try..catch blocks, which catch those exceptions, output messages to the console, and then backtrack in an attempt to continue parsing past the error. So as far as unit tests (which don’t even look at the console) are concerned, exceptions basically get swallowed, and you can’t tell whether the parse succeeded or not.

Turns out this is easy to change. If you’re writing your grammar test-first, you’ll want to disable the parser’s defaultErrorHandler option. Then ANTLR won’t generate its try..except blocks all over the place, so any parse errors will propagate back up the call stack, and cause your test to fail like you’d expect.

class DelphiParser extends Parser;
options {
  defaultErrorHandler = false;
}

ANTLR: when a token is not a token

Friday, April 28th, 2006

Lexers are good at figuring out where tokens start and end. They read one character at a time, but still, they can tell < from <= with ease. They thrive on it.

ANTLR gets stuck occasionally.

If I define a token named DOT that matches a single period, and another token named ELLIPSIS that matches two periods, it can handle them just fine. Most of the time. But it can’t parse 1..2, because it thinks that first dot is a decimal point, not an ellipsis.

When I tell ANTLR that I expect a dot, I want it to check that the next token is a dot. But instead, it checks that the next character is a dot. Which means, if the next two characters in the input stream are “..“, and I call the ANTLR-generated method mDOT(), it will read the first dot out of those two. I want it to fail, because the next thing isn’t a dot, it’s an ellipsis. But it doesn’t work that way.

The workaround is to use something called “symantic predicates”. (If you understand positive lookahead in regular expressions, it’s the same thing.) Symantic predicates look like this:

(...) => ...

You put this on one leg of an alternation. It means “if x, then match y.” x is the stuff inside the parentheses; y is the stuff after =>. I can use this to tell it “if you look ahead and see “..“, then do something other than matching the first ‘.‘ alone.”

This is really flexible. The biggest downside is, you have to provide this context at every call site. In practice, it hasn’t been too bad yet, but boy am I glad I’ve got my unit tests!

My complete NUMBER lexer rule (excluding unit tests) looks like this. If I hit “..“, I just give an empty alternate, and stop parsing the token:

NUMBER:
  ('0'..'9')+
  (
    // if we find "..", we're done parsing the number
    (ELLIPSIS) => |
    // otherwise, look for a decimal point
    (DOT ('0'..'9')+)?
    (('E' | 'e') ('+' | '-')? ('0'..'9')+)?
  )
  ;

The budget will provide

Wednesday, April 26th, 2006

Our emergency fund was already depleted a bit from our car breaking down earlier this month. Then yesterday, we dropped a $25 check off at a local collection agency — and I later realized that they now had our checking account information, and could probably just go ahead and take out the full amount we owed them (we’ve had it happen before). So another $300 of emergency fund is now tied up covering for that possibility.

And then we had to figure out how to pay the vet bill for a sick cat — nearly another $800. We didn’t have enough in the emergency fund to pay for it. So we went back to the budget.

We’ve already revised this month’s budget twice (once for the car, once for a withholding change). On to #3. Goal: find money we’re spending that we could live without spending for another month. Things like the money we were going to put back into the emergency fund. Clothing money. Do we really need to buy a new ladder this month? A little here, a little there. Didn’t look like it’d be enough.

Then we added it up.

We had found — found, just lying around — $659.

Six hundred and Jesus Christ fifty-nine dollars. Just sitting there in the budget. It’s like finding six hundred dollars under the sofa cushions.

And we’ve done this at least half a dozen times by now (though this is by far the biggest). Things always come up. Things never go according to plan. Sometimes they’re off by several hundred dollars. But every time we’ve gone back to the budget to find money for something, it’s been there. Every time.

It’s amazing. We can start with what we think is the leanest budget we could possibly manage. And then something comes up, and we go back and take more stuff out, and it still works.

This is why we budget. Yes, it helps us plan. Yes, it helps Jennie and me align our priorities and our goals. Yes, it removes stress. But the biggest reason we budget is because it saves our ass.

Paid-off count: 9

Wednesday, April 26th, 2006

Paid off the dentist yesterday morning, for some stuff from last December that our old insurance finally decided they weren’t going to pay for. Gotta love red tape.

On the other hand, gotta love getting rid of debt. One more notch in our belt. Ding!

if not (A <> B)

Wednesday, April 26th, 2006

Not only is A * -B not valid Delphi syntax

Expression -> SimpleExpression [RelOp SimpleExpression]...

RelOp -> '>'
      -> '<'
      -> '<='
      -> '>='
      -> '<>'
      -> IN
      -> IS
      -> AS

I guess I’ll have to stop writing if A = B now, since the equality operator apparently doesn’t exist in Delphi.

Parsing negation

Sunday, April 23rd, 2006

I found a chapter called “Object Pascal Grammar” in the Delphi 6 Object Pascal Language Guide, and have been using that as a starting point for my Delphi parser. It gives a decent (though incomplete) overview of the parser, but nothing on what the lexer should look like.

(For those curious to see it for yourselves, download d62pro.zip, open del6op.hlp, and search the index for “grammar”. Or you can get the Delphi 5 Object Pascal Language Guide as a zipped PDF.)

Here’s something I found really peculiar:

SimpleExpression -> ['+' | '-'] Term [AddOp Term]...
Term -> Factor [MulOp Factor]...

The peculiar thing is that the ‘+’ and ‘-‘ prefixes are introduced in SimpleExpression. This is weird on several levels. First of all, when I took math, we learned that the negation operator binds more tightly than multiplication. “-A” should parse as a Factor, not two levels above that as a SimpleExpression.

On a more practical level, this grammar would disallow some language constructs that I would expect to be perfectly reasonable, such as:

A + -B

Now, I suppose you could just write A – B instead. But this one would also be disallowed:

A * -B

Odd. I don’t have Delphi installed on my home PC, but I’ll have to check tomorrow and see whether these actually are accepted. If so, the grammar documentation is hosed.

(Just to test, I downloaded and installed the FreePascal compiler, and it happily parses both of the above. But according to a Pascal EBNF grammar I found, neither one should parse, and neither should A * -1, because numbers are inherently unsigned — if the grammar is right (and it does carry a big disclaimer), numbers only obtain signs once they become SimpleExpressions. Very strange.)

fallwell.com: Biblical response to Jerry Falwell’s anti-gay preaching

Wednesday, April 19th, 2006

While reading today’s headlines, I ran across a Web site, fallwell.com, that challenges Rev. Jerry Falwell’s attitudes about gays and lesbians, and does so from a Christian and a Biblical perspective.

Many otherwise well-meaning people have believed that gay people are commiting a sin. Jerry Falwell has been one of America’s leading promoters of this mistaken idea. This notion has created profound suffering and torment, not only for gay people, but for their friends and families as well. It has also dealt a devastating blow to Christianity itself, as fair-minded people have turned away in response.

I know this because I have received hundreds of emails from heterosexuals who have left Christianity directly due of its embrace of homophobia and anti-gay prejudice. Peace, acceptance, love, charity and humility have been replaced by a gospel of intolerance, ignorance and political ambition.

There can be no goal that is less holy than that which seeks to maintain a level of societal prejudice against a particular group of fellow human beings.

In contrast, Jesus reached out to those on the margins of society. The only people that Jesus ever condemned were religious fundamentalists. (See Matthew 23). Jesus also said that the two greatest commandments were those that commanded us to love (Mark 12:28-34). Fundamentalists have failed most miserably in fulfilling that task. It has been my privilege to counter their untruths, and it is an endeavor that I will continue.

These quotes are all from the About the Lawsuit page (Falwell sued the site for trademark infringement, and lost — that’s the headline that led me to the site), which provides a very good one-page overview of what the site is all about. Much more content is available through the front page of the site.

If there are any fundamentalists reading my blog, I’d be curious to hear what you think of this site.

Hierarchical ASTs with ANTLR

Wednesday, April 19th, 2006

Here’s something interesting I learned about ANTLR: when you tell it to build an AST (abstract syntax tree, aka parse tree) in memory, the structure of the parser grammar has nothing to do with the structure of the parse tree. I.e., if you give it these parser rules (horribly simplified from my already-simplified grammar):

const_section:
  CONST (constant)+;
constant:
  IDENTIFIER EQUALS NUMBER SEMICOLON;

and if you give it this as input:

const
  A = 5;
  B = 6;

I would expect to end up with a parse tree that looks like this:

const_section
  CONST
  constant
    IDENTIFIER 'A'
    EQUALS
    NUMBER 5
    SEMICOLON
  constant
    IDENTIFIER 'B'
    EQUALS
    NUMBER 6
    SEMICOLON

Ah, but not so. You actually end up with a parse “tree” that looks like this:

CONST
IDENTIFIER 'A'
EQUALS
NUMBER 5
SEMICOLON
IDENTIFIER 'B'
EQUALS
NUMBER 6
SEMICOLON

Not a tree at all — just a flat list! Having a parser rule that says there’s something called a “const_section” does not introduce a parse-tree node called “const_section”. That is, the parse tree doesn’t automatically reflect the structure of the parser rules.

In fact, it turns out that the only things that even can appear in the parse tree are tokens — the things that come out of the lexer. The parser rules have nothing to do with the parse tree! Near as I can tell, the parser rules are more of a convenience than anything else: a handy way to name different parts of the parser rules, but not to impose any kind of structure per se. They include a bit of scoping, but no AST structure.

You can impose a limited amount of structure if you wish, by marking individual nodes with the ^ suffix, which makes that node into a parent node, and the stuff around it into its children. That’s very useful for things like arithmetic expressions (especially handy for infix operators like + and *), where there really isn’t any higher-level structure that you care about as long as the order of operations comes out right.

But I do want higher-level structure. An identifier, an equal sign, a constant expression, and a semicolon all combine (if they come in the right place relative to a “const” keyword) to become a constant: a concept in its own right, greater than the sum of its parts. And so I want a parse tree with a node called “constant”. And above that, one called “const block”. And above that, one called “interface section”. And so on.

How to do this, when there’s no token for “constant”? By definition, a constant isn’t one single token; it’s a collection of tokens that take on a different meaning in combination.

Well, it took some scratching around, but I got it working. (Test-first, of course, so I’d know when I was done.) You need to use the parser’s tokens{} block to introduce an imaginary token — one that the lexer won’t produce, but that is still technically a token, and so can still appear in the parse tree. Then you can write some funky-looking code at the end of the parser rule. It ends up looking a little like this:

class DelphiParser extends Parser;
options {
  buildAST = true;
}
tokens {
  CONST = "const";
  CONST_SECTION;
}
const_section:
  CONST (constant)+
  { #const_section = #([CONST_SECTION, "const ..."], #const_section); };
// ...

I won’t go into depth on what most of this means, but I will break down that stuff inside curly braces at the end. Stuff in curly braces in an ANTLR grammar can mean a lot of different things, but inside a parser rule, it means “whenever the parser rule matches this far, run this block of code.” Within this, you can use some magical ANTLR syntax, which will be replaced with appropriate mundane C# code when ANTLR does its codegen.

[CONST_SECTION, “const …”] is shorthand for “create a new AST node with token type CONST_SECTION and text “const …”. It gets codegenned as a call to the AST factory.

#(ast1, ast2, ast3, ast4, …) is shorthand for “create an AST with ast1 as the root and ast2, ast3, ast4, … as its children”.

So the net result is, once the const_section rule is completely done parsing, we know everything’s a match and we’re not going to be adding anything more into it, then we synthesize a new node with the CONST_SECTION token type, wrap it around whatever we had already parsed for the const_section rule, and say, “Okay, this is what we really parsed in the first place.” Poof: instant (induced) tree.

So there ya go. Not exactly everything you’ll ever need to know to use ANTLR, but a good start if you’re parsing things that fall into a natural hierarchy.

Integrating ANTLR with Visual Studio builds

Sunday, April 16th, 2006

So I’m trying to write a tool that will parse Delphi source code. From there, I hope to move on to writing a tool that will tell me things about that Delphi source code: where we have typecasts, which properties are assigned in base DFMs, etc. Whatever I need it to tell me, really.

I fully expect that I’ll never get far with this, but it’s certainly an interesting toy for however long my attention span lasts. And I haven’t played with compiler-compilers since college. It’s kind of interesting to stick my foot back in.

I’m currently playing with ANTLR, which is kinda like LEX and YACC rolled into one, but different. I’m using it because it can generate C# output. Sorta. (But that’s another blog post.)

After some frustration with having to remember to run ANTLR manually every time I edited the grammar file, I finally buckled down and integrated ANTLR with MSBuild, so Visual Studio would automatically save my grammar file, and then run ANTLR, every time I compiled in the IDE. It wasn’t too hard, but there were a couple of wrinkles:

  • I want my build to fail if ANTLR generates warnings. (Or I could show the warnings as warnings in Visual Studio, but that looks like it’d be a lot more work; I’d have to write an MSBuild plug-in. Failing is simpler for now.) But ANTLR has no option for “treat warnings as errors”. Its exit code will tell you if there were genuine errors, but not if there were warnings. I had to work around this by writing a Ruby script that runs ANTLR and parses its console output.
  • ANTLR sends all its output to STDERR (even the “ANTLR Parser Generator version …” banner — for shame!). Ruby’s IO.popen() runs a process and captures its output, but it only captures STDOUT. Open3.popen3() is supposed to capture STDIN, STDOUT, and STDERR, but I couldn’t get it to work on Windows, because it tries to fork(). I know Sam figured out a way to get popen3 to work on Windows, but I don’t remember the details. I wound up using IO.popen() and 2>&1, which is a hack, but it works.
  • My Ruby script doesn’t check ANTLR’s exit code; it just looks at ANTLR’s console output. I didn’t bother to figure out how to check the process’s exit code after popen(). Looking at the output seems to work fine.
  • With this fix, when ANTLR fails, all that shows up in Visual Studio’s “Error List” is “The command “RunAntlr.rb” exited with code 1.” You have to go to the Output window to see the error. Again, I think I’d have to write an MSBuild plug-in to fix this, so this is good enough for now.

Here’s my Ruby script. It’s got hard-coded paths in it, but it works great for a quick hack. I called it RunAntlr.rb and put it in the same directory as my Delphi.g grammar file (which was itself in the same directory as my .csproj file). Written for ANTLR 2.7.5.

Dir.chdir("C:/svn/AntlrTestbed02/DelphiParser")
new_classpath = "C:\\Program Files\\antlr\\275\\lib\\antlr.jar;#{ENV['CLASSPATH']}"
command = "java -classpath \"#{new_classpath}\" antlr.Tool Delphi.g 2>&1"
exit_code = 0
IO.popen(command) do |pipe|
  pipe.each do |line|
    puts line
    if line !~ /^ANTLR Parser Generator/
      exit_code = 1
    end
  end
end
exit(exit_code)

Then I plugged it into my .csproj file. (To do this inside Visual Studio, first right-click the project in Solution Explorer and select “Unload Project”, then right-click the project again and select “Edit <ProjectName>.csproj”. Make the edits, save, then right-click and “Reload Project”.)

At the end of the .csproj file, there should be a couple of commented-out nodes: BeforeBuild and AfterBuild. Uncomment BeforeBuild and add this node inside it:

<Exec Command="RunAntlr.rb" />


Joe White's Blog copyright © 2004-2011. Portions of the site layout use Yahoo! YUI Reset, Fonts, and Grids.
Proudly powered by WordPress. Entries (RSS) and Comments (RSS). Privacy policy