Mailing List Archive

Stream-oriented diff algorithms
The next major issue to tackle with the new code is diffs. What I'd
really like to find is a stream-oriented difference algorithm rather
than a line-by-line one. I'm not familiar enough with the existing
difflib to know if it could be used that way--perhaps its contributor
could point me to some documentation on it?0
Stream-oriented diff algorithms [ In reply to ]
The difflib code which I took from phpwiki can be used for stream
based diffs (in fact, we do that: we first use a line based diff to
figure out the lines to present, and then we do a word based diff on
those lines in order to mark the changed words red.)

The code works as follows (see
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/fpw/difflib.php?rev=HEAD&content-type=text/vnd.viewcvs-markup):

The main workhorse is the Diff class. You construct it with two arrays
of strings that you want to compare. You then get a list $this->edits;
this is a list of _DiffOp's which describe how to get from the first
array of strings to the second array of strings with a minimal number
of changes. With that, you can do whatever you want. Normally, you'd
pass such an Diff object to a DiffFormater (which you should extend),
which then looks at the edits list and produces some output.

Currently, we cut the two article version into lines, pass those two
arrays to Diff, then to TableDiffFormatter for presenting in a table.
This TableDiffformatter, when printing out changed lines, calls a
WordLevedDiff (which extends Diff) to color changed words in red.

It might be nice to change TableDiffFormatter to produce side-by-side
output, similar to what the sourceforge cvs viewer does.

Axel