Wednesday, December 21, 2005

Tree transforms on POSIX filesystems and related structures

This post comes from some discussion I had with Denys Duchier about reimplementing the inventory-rewriting step of the bzr merge procedure.

The bzr inventory system has a lot of the same constraints as a POSIX filesystem does. You can't have two files with the same name, at any point. You must never delete a directory before deleting its children. You must always create a directory before putting files inside it.

On the other hand, you can simply replace one inventory with another, updating the whole tree structure in one fell swoop. This is fairly straightforward, so that's what I did.

On the filesystem, there's no such shortcut. But there is a rather elegent way of doing it, one that I learned from reading the the GNU Arch source code.

There are 4 things you can do with a file:
  1. Create it
  2. Delete it
  3. Rename it
  4. Modify it
For regular files and directories, "modify" may mean permission changes. For files, it may mean contents changes. And for symlinks, it may mean target changes. It also includes changing a file from one type to another.

In order to allow operations to happen in the required order, we need to split rename into two operations: rename FROM, and rename TO. This means that we can classify the operations like so:
  1. Name removals (delete, rename FROM)
  2. Name insertions (create, rename TO)
  3. File modifications
If you do all your removals before insertions, you can be certain that insertions will never try to use a name that is pending removal. So as long as you don't try to insert the same name twice, you'll never violate the POSIX requirement of only one file per name.

So for name operations, we need two phases
  1. removals
  2. insertions
Splitting rename into two operations means that, during the removal phase, we need to keep renamed files somewhere else. We can keep them in a temp directory, then move them into place afterward.

It's important to do rename-FROM and delete at the same time, because of the ordering issues. Children must always be removed before parents. If you delete the parent first, your delete will fail. If you rename the parent first, you'll lose track of the child, and be unable to perform your operation on it. Because the inheritance hierarchy may include both deletes and rename-FROMs, you can't do renames or deletes as separate steps. They must be intermingled.

The inverse is true of insertions. If you try to create a child before creating its parents, the operation will fail. If you try to rename a child to its new name, the operation will also fail.

Modifications are largely free of naming concerns. However, there is one kind of modification that can affect the availability of names: file type changes. Turning a file into a directory enables it to have children. Turning a directory into something else makes it impossible for it to have children. Since file type modification requires that removals be done first, and since file type modification is required before insertions are safe, the logical place for it is in the middle.

Unfortunately, this means that you must do contents changes while renamed files are still in the temp dir.

So to recap, the three phases are:
  1. Do deletions and rename-to-temp in child-to-parent order
  2. Do file modifications
  3. Do creations and rename-to-final in parent-to-child order
I hope this is as fascinating to you as it is to me.

Thursday, December 01, 2005

When they say "Turbo", they mean it

I've been fiddling around with TurboGears recently. One web app I've worked on is a service that lets you know when the local bands you like are playing in a calendar format. So that you can say "I feel like seeing a band tonight" and see what's on. Another was a help system.

Well, I've just spent a fun few hours putting together a web front end to my Bugs Everywhere bug tracker. It was definitely a "turbo" experience. Went from zero to "does a lot and looks decent" in mere hours. It was also a "gears" experience, because half the fun was how easily everything fit together. Rock on, Turbogears.

There have been a spate of web interfaces for bzr, and I wonder whether they'd be easier using TurboGears or Django. My guess is yes-- lots easier.

These new web megaframeworks interest me a lot, partly because writing Panoramic Feedback required coming up with one. Yes, PF does templates, data persistence, url dispatch and JavaScript (not to mention having its own XML parser and PDF report generation subsystem). But if I was starting out today, there's no way I'd build our own megaframework. They're easy, they're fun, and they let you focus on what you really want to do.

TurboGears isn't the only megaframework in town, and I wouldn't turn up my nose at Django or Rails. But I'd prefer to write in Python, since that's compatible with most of my software. Ruby seems too similar to Python for it to be worthwile knowing both.

What I especially like about TurboGears is its philosophy of collaboration, and Kid templates. The fact that they're valid XHTML is neat. The fact that they pack the whole expressive power of Python (should you need it) is great. In my bug tracker front end, I'm actually passing list of Bug objects to the display template, and the template is retrieving the members it needs. Nice.

I'd like it if CherryPy had better support for RESTful URLs. SQLObject seems a little buggy to me. And CatWalk will be very nice, when it happens.

But it's already come a very long way.