Joe White's Blog Life, .NET, and cats

Diff, part 16: A look back, and a big O #.NET #Delphi #diff

I've been talking about the diff algorithm for three weeks now, and I'm proud to say that I now own the phrase "stupid diff" (at least as far as Google cares, which is all that really matters anyway). I think I'll start putting a little trademark symbol after it.

[Update, 19 Apr 2005: Only a day after I posted this, and I'm no longer the #1 Google result for "stupid diff". Oh well. Easy come, easy go.]

At this point, I've presented all the parts and pieces needed to make a serviceable Longest Common Subsequence algorithm. Now let's look at just how serviceable. How efficient are these parts and pieces?

Last time, I said we'd reached polynomial time. But what degree polynomial are we talking about? Here's a quick sketch of what the algorithm does:

  • Loop through all the elements in A, and for each Ai:
    • Loop through all the elements in B, and for each Bj that matches Ai:
      • Loop through all the CSes we have so far, and for each CS that we can construct a longer CS from by appending [i, j]:
        • Loop through all the CSes again, calling Expendable on [CSk, CSnew] and discarding CSes as needed.

Looks like our algorithm, so far, is O(N4). Sure beats our old super-exponential time, but N4 still leaves something to be desired.

Can we do better? You bet. In fact, by the time we're done, the limiting factor is going to be how efficiently we can find matches: those outer two loops. Next time, I'll take a look at making that a little more efficient (though still technically O(N2)), and start setting the stage for some of the later improvements.