Thursday, March 23, 2006

More tree transforms

Here's a follow-up on tree transforms, now that I've finished implementing them.

One thing I found as I implemented tree transforms in bzr was that the modification step is probably the smallest and most boring. Content changes can be structured as delete/create pairs, so they don't need to be modifications. So the only modifications bzr does are file permission changes.

Another thing I found is that you can structure a tree transform so that all the risky stuff happens up-front, making it very unlikely that you'll run into an error that leaves your tree in an inconsistent state. The trick is to do all your creation operations into a temp directory beforehand. Then, when you're applying the transform, you rename-into-place instead of creating.

That means you're not subject to 'disk full' errors or failures in the text merge. It also means that you can safely perform merge operations that make reference to the working tree, like using '.' as BASE or OTHER in a merge. And the simpler you can make the tranform application, the better.

You can even take this a step farther, and defer deletions until you've completeliy updated the working tree. Just rename files into a temporary directory, and then delete it when the working tree has been successfully updated. That should make it relatively easy to roll back to the previous state, should you encounter an error.

Another thing we do before applying the transform is conflict resolution. After all, there may already be files where you want to put them. Or a text merge may need to emit .BASE, .THIS and .OTHER files. Or a merge may produce a directory loop. Etc. Bzr has a set of fairly simple rules for detecting and resolving conflicts, and doesn't try to predict how different conflict resolutions could interact with each other. It just runs the resolver up to 10 times, and if that doesn't fix the problem, it gives up.

But again, with all of this happening before a single move or deletion, there isn't a huge penalty for failure.

Thursday, March 16, 2006

A satisfying morning's hack

So standards-conformant web sites are mother-and-apple-pie kinda stuff. Everyone knows they're a good thing. The w3c has even been so helpful as to provide a way to validate anything you might have online.

Problem is, a lot of your stuff isn't online. Once you know it's valid, then it'll go online. Sure, the w3c validator will accept file uploads, but if you're doing any kinda dynamic content generation, you have to save the rendered output, and upload that. Every time.

So yesterday, (or was it Tuesday?) I hacked up a validator proxy. It's TurboGears, but it just uses CherryPy. It pulls down a specified URL, which can be a LAN-only URL, shoves it up to the w3c validator, and returns a slightly-rewritten version of the validator's response. For an encore, I hacked up the w3c validation bookmarklet, to teach it to use my proxy.

Validation happens a lot more when it's easy. And it's so validating...