Wednesday, December 21, 2005

Tree transforms on POSIX filesystems and related structures

This post comes from some discussion I had with Denys Duchier about reimplementing the inventory-rewriting step of the bzr merge procedure.

The bzr inventory system has a lot of the same constraints as a POSIX filesystem does. You can't have two files with the same name, at any point. You must never delete a directory before deleting its children. You must always create a directory before putting files inside it.

On the other hand, you can simply replace one inventory with another, updating the whole tree structure in one fell swoop. This is fairly straightforward, so that's what I did.

On the filesystem, there's no such shortcut. But there is a rather elegent way of doing it, one that I learned from reading the the GNU Arch source code.

There are 4 things you can do with a file:
  1. Create it
  2. Delete it
  3. Rename it
  4. Modify it
For regular files and directories, "modify" may mean permission changes. For files, it may mean contents changes. And for symlinks, it may mean target changes. It also includes changing a file from one type to another.

In order to allow operations to happen in the required order, we need to split rename into two operations: rename FROM, and rename TO. This means that we can classify the operations like so:
  1. Name removals (delete, rename FROM)
  2. Name insertions (create, rename TO)
  3. File modifications
If you do all your removals before insertions, you can be certain that insertions will never try to use a name that is pending removal. So as long as you don't try to insert the same name twice, you'll never violate the POSIX requirement of only one file per name.

So for name operations, we need two phases
  1. removals
  2. insertions
Splitting rename into two operations means that, during the removal phase, we need to keep renamed files somewhere else. We can keep them in a temp directory, then move them into place afterward.

It's important to do rename-FROM and delete at the same time, because of the ordering issues. Children must always be removed before parents. If you delete the parent first, your delete will fail. If you rename the parent first, you'll lose track of the child, and be unable to perform your operation on it. Because the inheritance hierarchy may include both deletes and rename-FROMs, you can't do renames or deletes as separate steps. They must be intermingled.

The inverse is true of insertions. If you try to create a child before creating its parents, the operation will fail. If you try to rename a child to its new name, the operation will also fail.

Modifications are largely free of naming concerns. However, there is one kind of modification that can affect the availability of names: file type changes. Turning a file into a directory enables it to have children. Turning a directory into something else makes it impossible for it to have children. Since file type modification requires that removals be done first, and since file type modification is required before insertions are safe, the logical place for it is in the middle.

Unfortunately, this means that you must do contents changes while renamed files are still in the temp dir.

So to recap, the three phases are:
  1. Do deletions and rename-to-temp in child-to-parent order
  2. Do file modifications
  3. Do creations and rename-to-final in parent-to-child order
I hope this is as fascinating to you as it is to me.

No comments: