Sunday, October 23, 2005

Implementing merge

It seems like I've been implementing merge forever, but it's only been a year.

Merge is a fundamental operation when you have groups of people working together on a single piece of source code. It essentially means "take these other changes and combine them with mine."

The merge tech you'll see in CVS, SVN, Bazaar and Gnu Arch is called "three-way merging", because it looks at three versions of the file. The one you had before any changes were made ("BASE"), the one containing the changes the other person made ("OTHER"), and the one containing the changes you made ("THIS").

If THIS or OTHER (or both) introduced a change, it goes into the final version. But if THIS and OTHER made different changes to the same code, that's considered a conflict.

In CVS, "update" was used to merge other peoples' committed changes with your recent work. In distributed systems, it's typically its own command, and unlike CVS, it's typically invoked after you commit, not before.

My first work on merge was implementing three-way behaviour for baz merge-- at the Baz Code Sprint last November. At the time, baz and tla only supported three-way merging of text, and only when a particular command was used. I was making baz merge follow three-way behaviour.

At that code sprint, Martin Pool introduced his ideas for a new revision control system, now known as Bazaar-NG or bzr. I decided to take a stab at implementing those ideas, myself. My project, BaZing, got as far as being able to do merges and apply Arch changesets. Then I threw in with Martin to work on bzr.

I ported over the merge code to bzr. For a while, that meant a system of shims and adaptors to make one codebase speak to the other. Lately, I've been working on integrating the code better.

Martin implemented the actual text-merging portion of bzr, so we don't depend on diff3. But it generates more conflicts than I think it should, so I've been playing around with my own three-way algorithm.

And I went ahead and integrated weave merge, a technique used by SCCS and (we believe) BitKeeper, into bzr. Again, Martin had written the code, but I hooked it up to bzr's merge tech.

It's important that VCS creators not get hung up on merge. It's only one aspect of usability.

Normal, boring three-way merges are good enough quite a lot of the time. Merge has quite a lot of strange corner cases, but they're not hit all that often. Improvements are welcome, but we will never get it 100% right, because merge doesn't address combining programs, but combining text. Most merge tools don't understand what that text means. What we need, ideally, is artificial intelligence. Since we don't have that, we need tricks that make the program seem intelligent without actually being intelligent.

Merging is an art, not a science. So merge tech is a tar pit. It can take as much time as you're willing to throw at it, and still not be perfect. There are other things bzr needs, like better remote operations, central storage, better Windows support. So at times, it can be frustrating working on merging yet again.

No comments: