### Diff, part 15: Implementing Expendable

Friday, April 15th, 2005In yesterday’s episode of Diff Quest, I outlined a function called Expendable, which takes two CSes and tells you whether you can throw one of them away. Today I’ll describe that function’s implementation.

Expendable takes two parameters, CS_{1} and CS_{2}. Each of those is a common subsequence, and therefore consists of a bunch of matches. And each match is a pair of indexes, like [0,1].

The Expendable function is all about the idea of growth potential. Growth potential depends on two things, and two things only: the length of the CS, and the last match in the CS. [0,0] [2,2] has the exact same growth potential as [1,1] [2,2] — the earlier matches don’t matter; what matters is what we can tack onto the end.

This means there are three numbers that we care about, when we’re thinking about a CS’s growth potential: its length; the AIndex of its last match; and the BIndex of its last match. So the inputs to Expendable can be described by six individual numbers, and the different return values of Expendable are determined by the relationship between corresponding pairs:

- Is CS
_{1}Length less than, equal to, or greater than CS_{2}Length? - Is CS
_{1}AIndex less than, equal to, or greater than CS_{2}AIndex? - Is CS
_{1}BIndex less than, equal to, or greater than CS_{2}BIndex?

The following table lists all the possible combinations of these. The “Len” column shows “<” if CS_{1}Length < CS_{2}Length, “=” if CS_{1}Length = CS_{2}Length, and “>” if CS_{1}Length > CS_{2}Length. The “AI” column is the same thing for AIndex, and “BI” is BIndex. After that, there’s a sample CS_{1} and a sample CS_{2} that exhibit these relationships (right-justified so the last matches always line up, since it’s the last match that’s important).

The next two columns are a kind of game, where I try to come up with a set of inputs that would make CS_{1} “win” (i.e., inputs that would build CS_{1} into a longer subsequence without also growing CS_{2}), and then a set of inputs that would make CS_{2} win. It doesn’t matter how contrived those inputs are; what matters is whether those inputs are possible (and sometimes they’re not, which is when expendability kicks in). “EOF” means that no further input is required to make that CS win — if we reached end of file, and there were no more matches, then it’s already won.

Finally, the last column shows the result of Expendable(CS_{1}, CS_{2}).

# LenAIBICS_{1}CS_{2}Make CS_{1}winMake CS_{2}winExp1 < < < `[3,3]`

`[0,0] [6,6]`

`[4,4] [5,5]`

EOF nil 2 < < = `[3,3]`

`[0,0] [6,3]`

`[4,4] [5,5]`

EOF nil 3 < < > `[3,6]`

`[0,0] [6,3]`

`[4,7] [5,8]`

EOF nil 4 < = < `[3,3]`

`[0,0] [3,6]`

`[4,4] [5,5]`

EOF nil 5 < = = `[3,3]`

`[0,0] [3,3]`

None possible EOF 1 6 < = > `[3,6]`

`[0,0] [3,3]`

None possible EOF 1 7 < > < `[6,3]`

`[0,0] [3,6]`

`[7,4] [8,5]`

EOF nil 8 < > = `[6,3]`

`[0,0] [3,3]`

None possible EOF 1 9 < > > `[6,6]`

`[0,0] [3,3]`

None possible EOF 1 10 = < < `[3,3]`

`[6,6]`

`[4,4]`

None possible 2 11 = < = `[3,3]`

`[6,3]`

`[4,4]`

None possible 2 12 = < > `[3,6]`

`[6,3]`

`[4,7]`

`[7,4]`

nil 13 = = < `[3,3]`

`[3,6]`

`[4,4]`

None possible 2 14 = = = `[3,3]`

`[3,3]`

None possible None possible Either 15 = = > `[3,6]`

`[3,3]`

None possible `[4,4]`

1 16 = > < `[6,3]`

`[3,6]`

`[7,4]`

`[4,7]`

nil 17 = > = `[6,3]`

`[3,3]`

None possible `[4,4]`

1 18 = > > `[6,6]`

`[3,3]`

None possible `[4,4]`

1 19 > < < `[0,0] [3,3]`

`[6,6]`

EOF None possible 2 20 > < = `[0,0] [3,3]`

`[6,3]`

EOF None possible 2 21 > < > `[0,0] [3,6]`

`[6,3]`

EOF `[7,4] [8,5]`

nil 22 > = < `[0,0] [3,3]`

`[3,6]`

EOF None possible 2 23 > = = `[0,0] [3,3]`

`[3,3]`

EOF None possible 2 24 > = > `[0,0] [3,6]`

`[3,3]`

EOF `[4,4] [5,5]`

nil 25 > > < `[0,0] [6,3]`

`[3,6]`

EOF `[4,7] [5,8]`

nil 26 > > = `[0,0] [6,3]`

`[3,3]`

EOF `[4,4] [5,5]`

nil 27 > > > `[0,0] [6,6]`

`[3,3]`

EOF `[4,4] [5,5]`

nil

Row #14 is interesting, in that the Expendable function could meaningfully return *either* CS_{1} or CS_{2}. That’s because there’s nothing to differentiate the two inputs — they are, as far as we care, the same. One might be [0,1] [2,2] and the other might be [1,0] [2,2], but they both have the same growth potential, so there’s no need to keep both. We can throw one of them away, and it really doesn’t matter which one; we can decide arbitrarily.

I spent a while staring at this table, trying to figure out the pattern, because of row #14. I originally had it marked as “nil”, but I finally realized that that’s not right; we do want to throw away one or the other. Once I figured out that both CS_{1} and CS_{2} were expendable for that row, the pattern became clear:

- If CS
_{1}Length < CS_{2}Length, and CS_{1}AIndex >= CS_{2}AIndex, and CS_{1}BIndex >= CS_{2}BIndex, then CS_{1}is expendable. - If CS
_{1}Length > CS_{2}Length, and CS_{1}AIndex <= CS_{2}AIndex, and CS_{1}BIndex <= CS_{2}BIndex, then CS_{2}is expendable.

It’s not the most graceful of algorithms, but it’s not a hard one to implement.

At this point, I’ve described all the major ingredients for a practical (better-than-O(2^{n})) LCS algorithm; I could (or you could) go ahead and start writing code, and end up with a working LCS library. But I’m not going to start writing code for it, not just yet. Why? Because even though we’re now well into the realm of polynomial time, we’re still doing more work than we have to. There are a few more sneaky shortcuts in store, and I’ll start delving into them next week.