Sunday, October 23, 2005

Implementing merge

It seems like I've been implementing merge forever, but it's only been a year.

Merge is a fundamental operation when you have groups of people working together on a single piece of source code. It essentially means "take these other changes and combine them with mine."

The merge tech you'll see in CVS, SVN, Bazaar and Gnu Arch is called "three-way merging", because it looks at three versions of the file. The one you had before any changes were made ("BASE"), the one containing the changes the other person made ("OTHER"), and the one containing the changes you made ("THIS").

If THIS or OTHER (or both) introduced a change, it goes into the final version. But if THIS and OTHER made different changes to the same code, that's considered a conflict.

In CVS, "update" was used to merge other peoples' committed changes with your recent work. In distributed systems, it's typically its own command, and unlike CVS, it's typically invoked after you commit, not before.

My first work on merge was implementing three-way behaviour for baz merge-- at the Baz Code Sprint last November. At the time, baz and tla only supported three-way merging of text, and only when a particular command was used. I was making baz merge follow three-way behaviour.

At that code sprint, Martin Pool introduced his ideas for a new revision control system, now known as Bazaar-NG or bzr. I decided to take a stab at implementing those ideas, myself. My project, BaZing, got as far as being able to do merges and apply Arch changesets. Then I threw in with Martin to work on bzr.

I ported over the merge code to bzr. For a while, that meant a system of shims and adaptors to make one codebase speak to the other. Lately, I've been working on integrating the code better.

Martin implemented the actual text-merging portion of bzr, so we don't depend on diff3. But it generates more conflicts than I think it should, so I've been playing around with my own three-way algorithm.

And I went ahead and integrated weave merge, a technique used by SCCS and (we believe) BitKeeper, into bzr. Again, Martin had written the code, but I hooked it up to bzr's merge tech.

It's important that VCS creators not get hung up on merge. It's only one aspect of usability.

Normal, boring three-way merges are good enough quite a lot of the time. Merge has quite a lot of strange corner cases, but they're not hit all that often. Improvements are welcome, but we will never get it 100% right, because merge doesn't address combining programs, but combining text. Most merge tools don't understand what that text means. What we need, ideally, is artificial intelligence. Since we don't have that, we need tricks that make the program seem intelligent without actually being intelligent.

Merging is an art, not a science. So merge tech is a tar pit. It can take as much time as you're willing to throw at it, and still not be perfect. There are other things bzr needs, like better remote operations, central storage, better Windows support. So at times, it can be frustrating working on merging yet again.

Sunday, October 02, 2005

The Cathedral is bizarre

I can't friggin' believe Larry McVoy. I mean, I just don't understand him.

Here he is, lead designer of a powerful version control system(VCS). For a long time, BitKeeper had very good buzz in the open source world. (And, perhaps, even in the Free Software one.)

You'd think he would be proud. You'd think he'd focus on how to do even better. The last thing I expect from someone who's done great service to Linux is anticompetitive behaviour.

But lately, that's the kind of stuff we've seen. A while back, he cut Linus out because Tridge was writing an open-source Bitkeeper client. How does that work again? Now he's forced Brian O'Sullivan to stop working Mercurial, claiming he fears O'Sullivan will copy Bitkeeper's secret sauce.

Well, last time I checked, BitKeeper was a proprietary, closed-source program. Since Brian can't copy the source code, it can't be an issue of copyright infringement. No, Larry fears that Brian will copy ideas from BitKeeper.

In the first place, isn't that totally wrong? You shouldn't build a better mousetrap if you know how current mousetraps work? Edison has to invent lightbulbs in the dark? The hell?

In the second place, if Larry thinks his ideas are so special, why doesn't he patent them?

One possible reason is that not all the ideas are his own. BK is heavily based on SCCS, a 30 year-old VCS. It uses SCCS files to store its data. From what we can tell, its merge technology is also based on SCCS.

So Larry can base his VCS on someone else's, but Brian can't base his VCS on Larry's? Sure, that seems fair.

Look at the FOSS side of things, and there are no secrets. There's more than a few projects to build a great distributed VCS in progress at the moment, like Bazaar-NG (the one I'm with), Monotone, Codeville, Mercurial, Darcs, SVK, and more. Not only is the code open, but we're always chatting on IRC about merge algorithms, the merits and demerits of GUIDs for files, and other technology. IRC's where I first heard about Larry's latest escapades.

Maybe you think I should be happy that Mercurial's hit a bump in the road? Don't they say the enemy of my enemy is my friend? Maybe they do, but Brian O'Sullivan isn't my enemy, he's a competitor. We both want the better open-source VCSes. Why shouldn't we copy each others' best ideas?

Larry, he could have been another friendly competitor. But with anticompetitive behaviour and his talk of "innovation", he's starting to remind me of another of Free Software's enemies. But Microsoft has Visual Source Safe, which makes them BitKeeper's enemy, too. So I guess the enemy of my enemy is my enemy.