Index: DOWNLOAD.wiki
==================================================================
--- DOWNLOAD.wiki
+++ DOWNLOAD.wiki
@@ -8,12 +8,13 @@
   *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-1.0.5.tar.gz?uuid=1.0.5|1.0.5]
   *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-1.0.6.tar.gz?uuid=1.0.6|1.0.6]
   *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-1.0.7.tar.gz?uuid=1.0.7|1.0.7]
   *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-1.0.8.tar.gz?uuid=1.0.8|1.0.8]
   *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-1.0.9.tar.gz?uuid=1.0.9|1.0.9]
+  *  [https://www.kitten-technologies.co.uk/project/ugarit/tarball/ugarit-2.0.tar.gz?uuid=2.0|2.0]
 
 <h1>Source Control</h1>
 
 You can obtain the latest sources, all history, and a local copy of
 the ticket database using [http://www.fossil-scm.org/|Fossil], like so:
 
 fossil clone https://www.kitten-technologies.co.uk/project/ugarit ugarit.fossil

Index: README.wiki
==================================================================
--- README.wiki
+++ README.wiki
@@ -1,11 +1,11 @@
 <center><img src="https://www.kitten-technologies.co.uk/project/ugarit/doc/trunk/artwork/logo.png" /></center>
 
 <h1>Introduction</h1>
 
 Ugarit is a backup/archival system based around content-addressible
-storage.
+storage. [./docs/intro.wiki|Learn more...]
 
 <h1>News</h1>
 
 <p>Development priorities are: Performance, better error handling, and
 fixing bugs! After I've cleaned house a little, I'll be focussing on
@@ -12,11 +12,11 @@
 replicated backend storage (ticket [f1f2ce8cdc]), as I now have a
 cluster of storage devices at home.</p>
 
 <ul>
 
-<li>FIXME: Version 2.0 is released, containing rudimentary archive
+<li>2015-06-12: [./docs/release-2.0.wiki|Version 2.0] is released, containing rudimentary archive
 mode, plus many minor improvements! See the release notes at the
 bottom for more details.</li>
 
 <li>2014-11-02: Chicken itself has gained
 [http://code.call-cc.org/cgi-bin/gitweb.cgi?p=chicken-core.git;a=commit;h=a0ce0b4cb4155754c1a304c0d8b15276b11b8cd2|significantly
@@ -24,1777 +24,24 @@
 writing; I look forward to it being in a formal release, as it sped up
 Ugarit snapshot benchmarks (dumping a 256MiB file into an sqlite
 backend) by a factor of twenty-something.</li>
 
 <li>2014-02-21: User [http://rmm.meta.ph/|Rommel Martinez] has written
-[https://ebzzry.github.io/blog/2014/02/21/an-introduction-to-ugarit/|An introduction to Ugarit]!</li>
+[http://rmm.meta.ph/blog/2014/02/21/an-introduction-to-ugarit/|An introduction to Ugarit]!</li>
 
 </ul>
 
-<h1>About Ugarit</h1>
-
-<h2>What's content-addressible storage?</h2>
-
-Traditional backup systems work by storing copies of your files
-somewhere. Perhaps they go onto tapes, or perhaps they're in archive
-files written to disk. They will either be full dumps, containing a
-complete copy of your files, or incrementals or differentials, which
-only contain files that have been modified since some point. This
-saves making repeated copies of unchanging files, but it means that to
-do a full restore, you need to start by extracting the last full dump
-then applying one or more incrementials, or the latest differential,
-to get the latest state.
-
-Not only do differentials and incrementals let you save space, they
-also give you a history - you can restore up to a previous point in
-time, which is invaluable if the file you want to restore was deleted
-a few backup cycles ago!
-
-This technology was developed when the best storage technology for
-backups was magnetic tape, because each dump is written sequentially
-(and restores are largely sequential, unless you're skipping bits to
-pull out specific files).
-
-However, these days, random-access media such as magnetic disks and
-SSDs are cheap enough to compete with magnetic tape for long-term bulk
-storage (especially when one considers the cost of a tape drive or
-two). And having fast random access means we can take advantage of
-different storage techniques.
-
-A content-addressible store is a key-value store, except that the keys
-are always computed from the values. When a given object is stored, it
-is hashed, and the hash used as the key. This means you can never
-store the same object twice; the second time you'll get the same hash,
-see the object is already present, and re-use the existing
-copy. Therefore, you get deduplication of your data for free.
-
-But, I hear you ask, how do you find things again, if you can't choose
-the keys?
-
-When an object is stored, you need to record the key so you can find
-it again later. In Ugarit, everything is stored in a tree-like
-directory structure. Files are uploaded and their hashes obtained, and
-then a directory object is constructed containing a list of the files
-in the directory, and listing the key of the Ugarit objects storing
-the contents of each file. This directory object itself has a hash,
-which is stored inside the directory entry in the parent directory,
-and so on up to the root. The root of a tree stored in a Ugarit vault
-has no parent directory to contain it, so at that point, we store the
-key of the root in a named "tag" that we can look up by name when we
-want it.
-
-Therefore, everything in a Ugarit vault can be found by starting with
-a named tag and retrieving the object whose key it contains, then
-finding keys inside that object and looking up the objects they refer
-to, until we find the object we want.
-
-When you use Ugarit to back up your filesystem, it uploads a complete
-snapshot of every file in the filesystem, like a full dump. But
-because the vault is content-addressed, it automatically avoids
-uploading anything it already has a copy of, so all we upload is an
-incremental dump - but in the vault, it looks like a full dump, and so
-can be restored on its own without having to restore a chain of incrementals.
-
-Also, the same storage can be shared between multiple systems that all
-back up to it - and the incremental upload algorithm will mean that
-any files shared between the servers will only need to be uploaded
-once. If you back up a complete server, than go and back up another
-that is running the same distribution, then all the files in <tt>/bin</tt>
-and so on that are already in the storage will not need to be backed
-up again; the system will automatically spot that they're already
-there, and not upload them again.
-
-As well as storing backups of filesystems, Ugarit can also be used as
-the primary storage for read-only files, such as music and photos. The
-principle is exactly the same; the only difference is in how the files
-are organised - rather than as a directory structure, the files are
-referenced from metadata objects that specify information about the
-file (so it can be found) and a reference to the contents. Sets of
-metadata objects are pointed to by tags as well, so they can also be
-found.
-
-<h2>So what's that mean in practice?</h2>
-
-<h3>Backups</h3>
-You can run Ugarit to back up any number of filesystems to a shared
-storage area (known as a <i>vault</i>, and on every backup, Ugarit
-will only upload files or parts of files that aren't already in the
-vault - be they from the previous snapshot, earlier snapshots,
-snapshot of entirely unrelated filesystems, etc. Every time you do a
-snapshot, Ugarit builds an entire complete directory tree of the
-snapshot in the vault - but reusing any parts of files, files, or
-entire directories that already exist anywhere in the vault, and
-only uploading what doesn't already exist.
-
-The support for parts of files means that, in many cases, gigantic
-files like database tables and virtual disks for virtual machines will
-not need to be uploaded entirely every time they change, as the
-changed sections will be identified and uploaded.
-
-Because a complete directory tree exists in the vault for any
-snapshot, the extraction algorithm is incredibly simple - and,
-therefore, incredibly reliable and fast. Simple, reliable, and fast
-are just what you need when you're trying to reconstruct the
-filesystem of a live server.
-
-Also, it means that you can do lots of small snapshots. If you run a
-snapshot every hour, then only a megabyte or two might have changed in
-your filesystem, so you only upload a megabyte or two - yet you end up
-with a complete history of your filesystem at hourly intervals in the
-vault.
-
-Conventional backup systems usually either store a full backup then
-incrementals to their archives, meaning that doing a restore involves
-reading the full backup then reading every incremental since and
-applying them - so to do a restore, you have to download *every
-version* of the filesystem you've ever uploaded, or you have to do
-periodic full backups (even though most of your filesystem won't have
-changed since the last full backup) to reduce the number of
-incrementals required for a restore. Better results are had from
-systems that use a special backup server to look after the archive
-storage, which accept incremental backups and apply them to the
-snapshot they keep in order to maintain a most-recent snapshot that
-can be downloaded in a single run; but they then restrict you to using
-dedicated servers as your archive stores, ruling out cheaply scalable
-solutions like Amazon S3, or just backing up to a removable USB or
-eSATA disk you attach to your system whenever you do a backup. And
-dedicated backup servers are complex pieces of software; can you rely
-on something complex for the fundamental foundation of your data
-security system?
-
-<h3>Archives</h3>
-
-You can also use Ugarit as the primary storage for read-only
-files. You do this by creating an archive in the vault, and importing
-batches of files into it along with their metadata (arbitrary
-attributes, such as "author", "creation date" or "subject").
-
-Just as you can keep snapshots of multiple systems in a Ugarit vault,
-you can also keep multiple separate archives, each identified by a
-named tag.
-
-However, as it's all within the same vault, the usual de-duplication
-rules apply. The same file may be in multiple archives, with different
-metadata in each, as the file contents and metadata are stored
-separately (and associated only within the context of each
-archive). And, of course, the same file may appear in snapshots and in
-archives; perhaps a file was originally downloaded into your home
-directory, where it was backed up into Ugarit snapshots, and then you
-imported it into your archive. The archive import would not have had
-to re-upload the file, as its contents would have already been found
-in the vault, so all that needs to be uploaded is the metadata.
-
-Although we have mainly spoken of storing files in archives, the
-objects in archives can be files or directories full of files, as
-well. This is useful for storing MacOS-style files that are actually
-directories, or for archiving things like completed projects for
-clients, which can be entire directory structures.
-
-<h2>System Requirements</h2>
-
-Ugarit should run on any POSIX-compliant system that can run
-[http://www.call-with-current-continuation.org/|Chicken Scheme]. It
-stores and restores all the file attributes reported by the <code>stat</code>
-system call - POSIX mode permissions, UID, GID, mtime, and optionally
-atime and ctime (although the ctime cannot be restored due to POSIX
-restrictions). Ugarit will store files, directories, device and
-character special files, symlinks, and FIFOs.
-
-Support for extended filesystem attributes - ACLs, alternative
-streams, forks and other metadata - is possible, due to the extensible
-directory entry format; support for such metadata will be added as
-required.
-
-Currently, only local filesystem-based vault storage backends are
-complete: these are suitable for backing up to a removable hard disk
-or a filesystem shared via NFS or other protocols. However, the
-backend can be accessed via an SSH tunnel, so a remote server you are
-able to install Ugarit on to run the backends can be used as a remote
-vault.
-
-However, the next backend to be implemented will be one for Amazon S3,
-and an SFTP backend for storing vaults anywhere you can ssh
-to. Other backends will be implemented on demand; a vault can, in
-principle, be stored on anything that can store files by name, report
-on whether a file already exists, and efficiently download a file by
-name. This rules out magnetic tapes due to their requirement for
-sequential access.
-
-Although we need to trust that a backend won't lose data (for now), we
-don't need to trust the backend not to snoop on us, as Ugarit
-optionally encrypts everything sent to the vault.
-
-<h2>Terminology</h2>
-
-A Ugarit backend is the software module that handles backend
-storage. An actual storage area - managed by a backend - is called a
-storage, and is used to implement a vault; currently, every storage is
-a valid vault, but the planned future introduction of a distributed
-storage backend will enable multiple storages (which are not,
-themselves, valid vaults as they only contain some subset of the
-information required) to be combined into an aggregrate storage, which
-then holds the actual vault. Note that the contents of a storage is
-purely a set of blocks, and a series of named tags containing
-references to them; the storage does not know the details of
-encryption and hashing, so cannot make any sense of its contents.
-
-For example, if you use the recommended "splitlog" filesystem backend,
-your vault might be <samp>/mnt/bigdisk</samp> on the server
-<samp>prometheus</samp>. The backend (which is compiled along with the
-other filesystem backends in the <code>backend-fs</code> binary) must
-be installed on <samp>prometheus</samp>, and Ugarit clients all over
-the place may then use it via ssh to <samp>prometheus</samp>. However,
-even with the filesystem backends, the actual storage might not be on
-<samp>prometheus</samp> where the backend runs -
-<samp>/mnt/bigdisk</samp> might be an NFS mount, or a mount from a
-storage-area network. This ability to delegate via SSH is particularly
-useful with the "cache" backend, which reduces latency by storing a
-cache of what blocks exist in a backend, thereby making it quicker to
-identify already-stored files; a cluster of servers all sharing the
-same vault might all use SSH tunnels to access an instance of the
-"cache" backend on one of them (using some local disk to store the
-cache), which proxies the actual vault storage to a vault on the other
-end of a high-latency Internet link, again via an SSH tunnel.
-
-A vault is where Ugarit stores backups (as chains of snapshots) and
-archives (as chains of archive imports). Backups and archives are
-identified by tags, which are the top-level named entry points into a
-vault. A vault is based on top of a storage, along with a choice of
-hash function, compression algorithm, and encryption that are used to
-map the logical world of snapshots and archive imports into the
-physical world of blocks stored in the storage.
-
-A snapshot is a copy of a filesystem tree in the vault, with a header
-block that gives some metadata about it. A backup consists of a number
-of snapshots of a given filesystem.
-
-An archive import is a set of filesystem trees, each along with
-metadata about it. Whereas a backup is organised around a series of
-timed snapshots, an archive is organised around the metadata; the
-filesystem trees in the archive are identified by their properties.
-
-<h2>So what, exactly, is in a vault?</h2>
-
-A Ugarit vault contains a load of blocks, each up to a maximum size
-(usually 1MiB, although other backends might impose smaller
-limits). Each block is identified by the hash of its contents; this is
-how Ugarit avoids ever uploading the same data twice, by checking to
-see if the data to be uploaded already exists in the vault by
-looking up the hash. The contents of the blocks are compressed and
-then encrypted before upload.
-
-Every file uploaded is, unless it's small enough to fit in a single
-block, chopped into blocks, and each block uploaded. This way, the
-entire contents of your filesystem can be uploaded - or, at least,
-only the parts of it that aren't already there! The blocks are then
-tied together to create a snapshot by uploading blocks full of the
-hashes of the data blocks, and directory blocks are uploaded listing
-the names and attributes of files in directories, along with the
-hashes of the blocks that contain the files' contents. Even the blocks
-that contain lists of hashes of other blocks are subject to checking
-for pre-existence in the vault; if only a few MiB of your
-hundred-GiB filesystem has changed, then even the index blocks and
-directory blocks are re-used from previous snapshots.
-
-Once uploaded, a block in the vault is never again changed. After all,
-if its contents changed, its hash would change, so it would no longer
-be the same block! However, every block has a reference count,
-tracking the number of index blocks that refer to it. This means that
-the vault knows which blocks are shared between multiple snapshots (or
-shared *within* a snapshot - if a filesystem has more than one copy of
-the same file, still only one copy is uploaded), so that if a given
-snapshot is deleted, then the blocks that only that snapshot is using
-can be deleted to free up space, without corrupting other snapshots by
-deleting blocks they share. Keep in mind, however, that not all
-storage backends may support this - there are certain advantages to
-being an append-only vault. For a start, you can't delete something by
-accident! The supplied fs and sqlite backends support deletion, while
-the splitlog backend does not yet. However, the actual snapshot
-deletion command in the user interface hasn't been implemented yet
-either, so it's a moot point for now...
-
-Finally, the vault contains objects called tags. Unlike the blocks,
-the tags' contents can change, and they have meaningful names rather
-than being identified by hash. Tags identify the top-level blocks of
-snapshots within the system, from which (by following the chain of
-hashes down through the index blocks) the entire contents of a
-snapshot may be found. Unless you happen to have recorded the hash of
-a snapshot somewhere, the tags are where you find snapshots from when
-you want to do a restore.
-
-Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
-files, directories, and index blocks required, it looks up the tag you
-have identified as the target of the snapshot. If the tag already
-exists, then the snapshot it currently points to is recorded in the
-new snapshot as the "previous snapshot"; then the snapshot header
-containing the previous snapshot hash, along with the date and time
-and any comments you provide for the snapshot, and is uploaded (as
-another block, identified by its hash). The tag is then updated to
-point to the new snapshot.
-
-This way, each tag actually identifies a chronological chain of
-snapshots. Normally, you would use a tag to identify a filesystem
-being backed up; you'd keep snapshotting the filesystem to the same
-tag, resulting in all the snapshots of that filesystem hanging from
-the tag. But if you wanted to remember any particular snapshot
-(perhaps if it's the snapshot you take before a big upgrade or other
-risky operation), you can duplicate the tag, in effect 'forking' the
-chain of snapshots much like a branch in a version control system.
-
-Archive imports cause the creation of one or more archive metadata
-blocks, each of which lists the hashes of files or filesystem trees in
-the archive, along with their metadata. Each import then has a single
-archive import block pointing to the sequence of metadata blocks, and
-pointing to the previous archive import block in that archive. The
-same filesystem tree can be imported more than once to the same
-archive, and the "latest" metadata always wins.
-
-Generally, you should create lots of small archives for different
-categories of things - such as one for music, one for photos, and so
-on. You might well create separate archives for the music collections
-of different people in your household, unless they overlap, and
-another for Christmas music so it doesn't crop up in random shuffle
-play! It's easy to merge archives if you over-compartmentalise them,
-but harder to split an archive if you find it too cluttered with
-unrelated things.
-
-I've spoken of archive imports, and backup snapshots, each having a
-"previous" reference to the last import or snapshot in the chain, but
-it's actually more complex than that: they have an arbitrary list of
-zero or more previous objects. As such, it's possible for several
-imports or snapshots to have the same "previous", known as a "fork",
-and it's possible to have an import or snapshot that merges multiple
-previous ones.
-
-Forking is handy if you want to basically duplicate an archive,
-creating two new archives with the same contents to begin with, but
-each then capable of diverging thereafter. You might do this to keep
-the state of an archive before doing a bit import, so you can go back
-to the original state if you regret the import, for instance.
-
-Forking a backup tag is a more unusual operation, but also
-useful. Perhaps you have a server running many stateful services, and
-the hardware becomes overloaded, so you clone the basic setup onto
-another server, and run half of the services on the original and half
-on the new one; if you fork the backup tag of the original server to
-create a backup tag for the new server, then both servers' snapshot
-history will share the original shared state.
-
-Merging is most useful for archives; you might merge several archives
-into one, as mentioned.
-
-And, of course, you can merge backup tags, as well. If your earlier
-splitting of one server into two doesn't work out (perhaps your
-workload reduces, or you can now afford a single, more powerful,
-server to handle everything in one place), you might rsync back the
-service state from the two servers onto the new server, so it's all
-merged in the new server's filesystem. To preserve this in the
-snapshot history, you can merge the two backup tags of the two servers
-to create a backup tag for the single new server, which will
-accurately reflect the history of the filesystem.
-
-Also, tags might fork by accident - I plan to introduce a distributed
-storage backend, which will replicate blocks and tags across multiple
-storages to create a single virtual storage to build a vault on top
-of; in the event of the network of actual storages suffering a
-failure, it may be that snapshots and imports are only applied to some
-of the storages - and then subsequent snapshots and imports only get
-applied to some other subset of the storages. When the network is
-repaired and all the storages are again visible, they will have
-diverged, inconsistent, states for their tags, and the distributed
-storage system will resolve the situation by keeping the majority
-state as the state of the tag on all the backends, but preserving any
-other states by creating new tags, with the original name plus a
-suffix. These can then be merged to "heal" the conflict.
-
-<h1>Using Ugarit</h1>
-
-<h2>Installation</h2>
-
-Install [http://www.call-with-current-continuation.org/|Chicken Scheme] using their [http://wiki.call-cc.org/man/4/Getting%20started|installation instructions].
-
-Ugarit can then be installed by typing (as root):
-
-    chicken-install ugarit
-
-See the [http://wiki.call-cc.org/manual/Extensions#chicken-install-reference|chicken-install manual] for details if you have any trouble, or wish to install into your home directory.
-
-<h2>Setting up a vault</h2>
-
-Firstly, you need to know the vault identifier for the place you'll
-be storing your vaults. This depends on your backend. The vault
-identifier is actually the command line used to invoke the backend for
-a particular vault; communication with the vault is via standard
-input and output, which is how it's easy to tunnel via ssh.
-
-<h3>Local filesystem backends</h3>
-
-These backends use the local filesystem to store the vaults. Of
-course, the "local filesystem" on a given server might be an NFS mount
-or mounted from a storage-area network.
-
-<h4>Logfile backend</h4>
-
-The logfile backend works much like the original Venti system. It's
-append-only - you won't be able to delete old snapshots from a logfile
-vault, even when I implement deletion. It stores the vault in two
-sets of files; one is a log of data blocks, split at a specified
-maximum size, and the other is the metadata: an sqlite database used
-to track the location of blocks in the log files, the contents of
-tags, and a count of the logs so a filename can be chosen for a new one.
-
-To set up a new logfile vault, just choose where to put the two
-parts. It would be nice to put the metadata file on a different
-physical disk to the logs directory, to reduce seeking. If you only
-have one disk, you can put the metadata file in the log directory
-("metadata" is a good name).
-
-You can then refer to it using the following vault identifier:
-
-      "backend-fs splitlog ...log directory... ...metadata file..."
-
-<h4>SQLite backend</h4>
-
-The sqlite backend works a bit like a
-[http://www.fossil-scm.org/|Fossil] repository; the storage is
-implemented as a single file, which is actually an SQLite database
-containing blocks as blobs, along with tags and configuration data in
-their own tables.
-
-It supports unlinking objects, and the use of a single file to store
-everything is convenient; but storing everything in a single file with
-random access is slightly riskier than the simple structure of an
-append-only log file; it is less tolerant of corruption, which can
-easily render the entire storage unusable. Also, that one file can get
-very large.
-
-SQLite has internal limits on the size of a database, but they're
-quite large - you'll probably hit a size limit at about 140
-terabytes.
-
-To set up an SQLite storage, just choose a place to put the file. I
-usually use an extension of <code>.vault</code>; note that SQLite will
-create additional temporary files alongside it with additional
-extensions, too.
-
-Then refer to it with the following vault identifier:
-
-      "backend-sqlite ...path to vault file..."
-
-<h4>Filesystem backend</h4>
-
-The filesystem backend creates vaults by storing each block or tag
-in its own file, in a directory. To keep the objects-per-directory
-count down, it'll split the files into subdirectories. Because of
-this, it uses a stupendous number of inodes (more than the filesystem
-being backed up). Only use it if you don't mind that; splitlog is much
-more efficient.
-
-To set up a new filesystem-backend vault, just create an empty
-directory that Ugarit will have write access to when it runs. It will
-probably run as root in order to be able to access the contents of
-files that aren't world-readable (although that's up to you), so
-unless you access your storage via ssh or sudo to use another user to
-run the backend under, be careful of NFS mounts that have
-<code>maproot=nobody</code> set!
-
-You can then refer to it using the following vault identifier:
-
-      "backend-fs fs ...path to directory..."
-
-<h3>Proxying backends</h3>
-
-These backends wrap another vault identifier which the actual
-storage task is delegated to, but add some value along the way.
-
-<h3>SSH tunnelling</h3>
-
-It's easy to access a vault stored on a remote server. The caveat
-is that the backend then needs to be installed on the remote server!
-Since vaults are accessed by running the supplied command, and then
-talking to them via stdin and stdout, the vault identified needs
-only be:
-
-      "ssh ...hostname... '...remote vault identifier...'"
-
-<h3>Cache backend</h3>
-
-The cache backend is used to cache a list of what blocks exist in the
-proxied backend, so that it can answer queries as to the existance of
-a block rapidly, even when the proxied backend is on the end of a
-high-latency link (eg, the Internet). This should speed up snapshots,
-as existing files are identified by asking the backend if the vault
-already has them.
-
-The cache backend works by storing the cache in a local sqlite
-file. Given a place for it to store that file, usage is simple:
-
-      "backend-cache ...path to cachefile... '...proxied vault identifier...'"
-
-The cache file will be automatically created if it doesn't already
-exist, so make sure there's write access to the containing directory.
-
- - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
-
-If you use a cache on a vault shared between servers, make sure
-that you either:
-
-  *  Never delete things from the vault
-
-or
-
-  *  Make sure all access to the vault is via the same cache
-
-If a block is deleted from a vault, and a cache on that vault is
-not aware of the deletion (as it did not go "through" the caching
-proxy), then the cache will record that the block exists in the
-vault when it does not. This will mean that if a snapshot is made
-through the cache that would use that block, then it will be assumed
-that the block already exists in the vault when it does
-not. Therefore, the block will not be uploaded, and a dangling
-reference will result!
-
-Some setups which *are* safe:
-
-  *  A single server using a vault via a cache, not sharing it with
-   anyone else.
-
-  *  A pool of servers using a vault via the same cache.
-
-  *  A pool of servers using a vault via one or more caches, and
-   maybe some not via the cache, where nothing is ever deleted from
-   the vault.
-
-  *  A pool of servers using a vault via one cache, and maybe some
-   not via the cache, where deletions are only performed on servers
-   using the cache, so the cache is always aware.
-
-<h2>Writing a <code>ugarit.conf</code></h2>
-
-<code>ugarit.conf</code> should look something like this:
-
-<verbatim>(storage <vault identifier>)
-(hash tiger "<salt>")
-[double-check]
-[(compression [deflate|lzma])]
-[(encryption aes <key>)]
-[(file-cache "<path>")]
-[(rule ...)]</verbatim>
-
-The hash line chooses a hash algorithm. Currently Tiger-192
-(<code>tiger</code>), SHA-256 (<code>sha256</code>), SHA-384
-(<code>sha384</code>) and SHA-512 (<code>sha512</code>) are supported;
-if you omit the line then Tiger will still be used, but it will be a
-simple hash of the block with the block type appended, which reveals
-to attackers what blocks you have (as the hash is of the unencrypted
-block, and the hash is not encrypted). This is useful for development
-and testing or for use with trusted vaults, but not advised for use
-with vaults that attackers may snoop at. Providing a salt string
-produces a hash function that hashes the block, the type of block, and
-the salt string, producing hashes that attackers who can snoop the
-vault cannot use to find known blocks (see the "Security model"
-section below for more details).
-
-I would recommend that you create a salt string from a secure entropy
-source, such as:
-
-   dd if=/dev/random bs=1 count=64 | base64 -w 0
-
-Whichever hash function you use, you will need to install the required
-Chicken egg with one of the following commands:
-
-    chicken-install -s tiger-hash  # for tiger
-    chicken-install -s sha2        # for the SHA hashes
-
-<code>double-check</code>, if present, causes Ugarit to perform extra
-internal consistency checks during backups, which will detect bugs but
-may slow things down.
-
-<code>lzma</code> is the recommended compression option for
-low-bandwidth backends or when space is tight, but it's very slow to
-compress; deflate or no compression at all are better for fast local
-vaults. To have no compression at all, just remove the
-<code>(compression ...)</code> line entirely. Likewise, to use
-compression, you need to install a Chicken egg:
-
-       chicken-install -s z3       # for deflate
-       chicken-install -s lzma     # for lzma
-
-WARNING: The lzma egg is currently rather difficult to install, and
-needs rewriting to fix this problem.
-
-Likewise, the <code>(encryption ...)</code> line may be omitted to have no
-encryption; the only currently supported algorithm is aes (in CBC
-mode) with a key given in hex, as a passphrase (hashed to get a key),
-or a passphrase read from the terminal on every run. The key may be
-16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
-hex key, just supply it as a string, like so:
-
-      (encryption aes "00112233445566778899AABBCCDDEEFF")
-
-...for 128-bit AES,
-
-      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
-
-...for 192-bit AES, or
-
-      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
-
-...for 256-bit AES.
-
-Alternatively, you can provide a passphrase, and specify how large a
-key you want it turned into, like so:
-
-      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
-
-I would recommend that you generate a long passphrase from a secure
-entropy source, such as:
-
-   dd if=/dev/random bs=1 count=64 | base64 -w 0
-
-Finally, the extra-paranoid can request that Ugarit prompt for a
-passphrase on every run and hash it into a key of the specified
-length, like so:
-
-      (encryption aes ([16|24|32] prompt))
-
-(note the lack of quotes around <code>prompt</code>, distinguishing it from a passphrase)
-
-Please read the "Security model" section below for details on the
-implications of different encryption setups.
-
-Again, as it is an optional feature, to use encryption, you must
-install the appropriate Chicken egg:
-
-       chicken-install -s aes
-
-A file cache, if enabled, significantly speeds up subsequent snapshots
-of a filesystem tree. The file cache is a file (which Ugarit will
-create if it doesn't already exist) mapping filenames to
-(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
-file in the cache and the mtime and size have not changed, it will
-assume it is already stored under the specified hash. This saves it
-from having to read the entire file to hash it and then check if the
-hash is present in the vault. In other words, if only a few files
-have changed since the last snapshot, then snapshotting a directory
-tree becomes an O(N) operation, where N is the number of files, rather
-than an O(M) operation, where M is the total size of files involved.
-
-For example:
-
-      (storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata'")
-      (hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
-      (encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
-      (compression lzma)
-      (file-cache "/var/ugarit/cache")
-
-Be careful to put a set of parentheses around each configuration
-entry. White space isn't significant, so feel free to indent things
-and wrap them over lines if you want.
-
-Keep copies of this file safe - you'll need it to do extractions!
-Print a copy out and lock it in your fire safe! Ok, currently, you
-might be able to recreate it if you remember where you put the
-storage, but encryption keys and hash salts are harder to remember...
-
-<h2>Your first backup</h2>
-
-Think of a tag to identify the filesystem you're backing up. If it's
-<code>/home</code> on the server <samp>gandalf</samp>, you might call it <samp>gandalf-home</samp>. If
-it's the entire filesystem of the server <samp>bilbo</samp>, you might just call
-it <samp>bilbo</samp>.
-
-Then from your shell, run (as root):
-
-      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>
-
-For example, if we have a <code>ugarit.conf</code> in the current directory:
-
-      # ugarit snapshot ugarit.conf -c localhost-etc /etc
-
-Specify the <code>-c</code> flag if you want to store ctimes in the vault;
-since it's impossible to restore ctimes when extracting from an
-vault, doing this is useful only for informational purposes, so it's
-not done by default. Similarly, atimes aren't stored in the vault
-unless you specify <code>-a</code>, because otherwise, there will be a lot of
-directory blocks uploaded on every snapshot, as the atime of every
-file will have been changed by the previous snapshot - so with <code>-a</code>
-specified, on every snapshot, every directory in your filesystem will
-be uploaded! Ugarit will happily restore atimes if they are found in
-a vault; their storage is made optional simply because uploading
-them is costly and rarely useful.
-
-<h2>Exploring the vault</h2>
-
-Now you have a backup, you can explore the contents of the
-vault. This need not be done as root, as long as you can read
-<code>ugarit.conf</code>; however, if you want to extract files, run it as root
-so the uids and gids can be set.
-
-      $ ugarit explore ugarit.conf
-
-This will put you into an interactive shell exploring a virtual
-filesystem. The root directory contains an entry for every tag; if you
-type <code>ls</code> you should see your tag listed, and within that
-tag, you'll find a list of snapshots, in descending date order, with a
-special entry <code>current</code> for the most recent
-snapshot. Within a snapshot, you'll find the root directory of your
-snapshot under <code>contents</codel>, and the detailts of the snapshot itself in
-<code>propreties.sexpr</code>, and will be able to <code>cd</code> into
-subdirectories, and so on:
-
-      > <b>ls</b>
-      localhost-etc/ <tag>
-      > <b>cd localhost-etc</b>
-      /localhost-etc> <b>ls</b>
-      current/ <snapshot>
-      2015-06-12 22:49:34/ <snapshot>
-      2015-06-12 22:49:25/ <snapshot>
-      /localhost-etc> cd current
-      /localhost-etc/current> ls
-      log.sexpr <file>
-      properties.sexpr <inline>
-      contents/ <dir>
-      /localhost-etc/current> <b>cat properties.sexpr</b>
-      ((previous . "a140e6dbe0a7a38f8b8c381323997c23e51a39e2593afb61")
-       (mtime . 1434102574.0)
-       (contents . "34eccf1f5141187e4209cfa354fdea749a0c3c1c4682ec86")
-       (stats (blocks-stored . 12)
-              (bytes-stored . 16889)
-              (blocks-skipped . 50)
-              (bytes-skipped . 6567341)
-              (file-cache-hits . 0)
-              (file-cache-bytes . 0))
-       (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
-       (hostname . "ahe")
-       (source-path . "/etc")
-       (notes)
-       (files . 112)
-       (size . 6563588))
-      /localhost-etc/current> <b>cd contents</b>
-      /localhost-etc/current/contents> <b>ls</b>
-      zoneinfo <symlink>
-      vconsole.conf <symlink>
-      udev/ <dir>
-      tmpfiles.d/ <dir>
-      systemd/ <dir>
-      sysctl.d/ <dir>
-      sudoers.tmp~ <file>
-      sudoers <file>
-      subuid <file>
-      subgid <file>
-      static <symlink>
-      ssl/ <dir>
-      ssh/ <dir>
-      shells <symlink>
-      shadow- <file>
-      shadow <file>
-      services <symlink>
-      samba/ <dir>
-      rpc <symlink>
-      resolvconf.conf <symlink>
-      resolv.conf <file>
-      -- Press q then enter to stop or enter for more...
-      <b>q</b>
-      /localhost-etc/current/contents> <b>ls -ll resolv.conf</b>
-      -rw-r--r--     0     0 [2015-05-23 23:22:41] 78B/-: resolv.conf
-      key: #f
-      contents: "e33ea1394cd2a67fe6caab9af99f66a4a1cc50e8929d3550"
-      size: 78
-      ctime: 1432419761.0
-
-As well as exploring around, you can also extract files or directories
-(or entire snapshots) by using the <code>get</code> command. Ugarit
-will do its best to restore the metadata of files, subject to the
-rights of the user you run it as.
-
-Type <code>help</code> to get help in the interactive shell.
-
-The interactive shell supports command-line editing, history and tab
-completion for your convenience.
-
-<h2>Extracting things directly</h2>
-
-As well as using the interactive explore mode, it is also possible to
-directly extract something from the vault, given a path.
-
-Given the sample vault from the previous example, it would be possible
-to extract the <code>README.txt</code> file with the following
-command:
-
-      ugarit extract ugarit.conf /Test/current/contents/README.txt
-
-<h2>Forking tags</h2>
-
-As mentioned above, you can fork a tag, creating two tags that
-refer to the same snapshot and its history but that can then have
-their own subsequent history of snapshots applied to each
-independently, with the following command:
-
-      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
-
-<h2>Merging tags</h2>
-
-And you can also merge two or more tags into one. It's possible to
-merge a bunch of tags to make an entirely new tag, or you can merge a
-tag into an existing tag, by having the "output" tag also be one of
-the "input" tags.
-
-The command to do this is:
-
-      $ ugarit merge <ugarit.conf> <output tag> <input tags...>
-
-For instance, to import your classical music collection into your main
-musical collection, you might do:
-
-     $ ugarit merge ugarit.conf my-music my-music classical-music
-
-Or if you want to create a new all-music archive from the archives
-bobs-music and petes-music, you might do:
-
-     $ ugarit merge ugarit.conf all-music bobs-music petes-music
-
-<h2>Archive operations</h2>
-
-<h3>Importing</h3>
-
-To import some files into an archive, you must create a manifest file
-listing them, and their metadata. The manifest can also list
-metadata for the import as a whole, perhaps naming the source of the
-files, or the reason for importing them.
-
-The metadata for a file (or an import) is a series of named
-properties. The value of a property can be any Scheme value, written
-in Scheme syntax (with strings double-quoted unless they are to be
-interpreted as symbols), but strings and numbers are the most useful
-types.
-
-You can use whatever names you like for properties in metadata, but
-there are some that the system applies automatically, and an informal
-standard of sorts, which is documented in [docs/archive-schema.wiki].
-
-You can produce a manifest file by hand, or use the Ugarit Manifest
-Maker to produce one for you. You do this by installing it like so:
-
-      $ chicken-install ugarit-manifest-maker
-
-And then running it, giving it any number of file and directory names
-on the command line. When given directories, it will recursively scan
-them to find all the files contained therein and put them in the
-manifest; it will not put directories in the manifest, although it is
-perfectly legal for you to do so when writing a manifest by hand. This
-is because the manifest maker can't do much useful analysis on a
-directory to suggest default metadata for them (so there isn't much
-point in using it), and it's far more useful for it to make it easy
-for you to import a large number of files individually by referencing
-the directory containing them.
-
-The manifest is sent to standard output, so you need to redirect it to
-a file, like so:
-
-      $ ugarit-manifest-maker ~/music > music.manifest
-
-You can specify command-line options, as well. <code>-e PATTERN</code>
-or <code>--exclude=PATTERN</code> introduces a glob pattern for files
-to exclude from the manifest, and <code>-D KEY=VALUE</code> or
-<code>--define=KEY=VALUE</code> provides a property to be added to
-every file in the manifest (as opposed to an import property, that is
-part of the metadata of the overall import). Note that
-<code>VALUE</code> must be double-quoted if it's a string, as per
-Scheme value syntax.
-
-One might use this like so:
-
-      $ ugarit-manifest-maker -e *.txt -D rating=5 ~/favourite-music > music.manifest
-
-The manifest maker simplifies the writing of manifests for files, by
-listing the files in manifest format along with useful metadata
-extracted from the filename and the file itself. For supported file
-types (currently, MP3 and OGG music files), it will even look inside
-the file to extract metadata.
-
-The manifest file it generates will contain lots of comments
-mentioning things it couldn't automatically analyse (such as unknown
-OGG/ID3 tags, or unknown types of files); and for metadata properties
-it thinks might be relevant but can't automatically provide, it
-suggests them with an empty property declaration, commented out. The
-idea is that, after generating a manifest, you read it by hand in a
-text editor to attempt to improve it.
-
-<h4>The format of a manifest file</h4>
-
-Manifest files have a relatively simple format. The are based on
-Scheme s-expressions, so can contain comments. From any semicolon (not
-in a string or otherwise quoted) to the end of the line is a comment,
-and <code>#;</code> in front of something comments out that something.
-
-Import metadata properties are specified like so:
-
-     (KEY = VALUE)
-
-...where, as usual, <code>VALUE</code> must be double-quoted if it's a
-string.
-
-Files to import, with their metadata, are specified like so:
-
-     (object "PATH OF FILE TO IMPORT"
-        (KEY = VALUE)
-        (KEY = VALUE)...
-     )
-
-The closing parenthesis need not be on a line of its own, it's
-conventionally placed after the closing parenthesis of the final
-property.
-
-Ugarit, when importing the files in the manifest, will add the
-following properties if they are not already specified:
-
-<dl>
-<dt><code>import-path</code></dt>
-<dd>The path the file was imported from</dd>
-
-<dt><code>dc:format</code></dt>
-<dd>A guess at the file's MIME type, based on the extension</dd>
-
-<dt><code>mtime</code></dt>
-<dd>The file's modification time (as the number of seconds since the
-UNIX epoch)</dd>
-
-<dt><code>ctime</code></dt>
-<dd>The file's change time (as the number of seconds since the UNIX
-epoch)</dd>
-
-<dt><code>filename</code></dt>
-<dd>The name of the file, stripped of any directory components, and
-including the extension.</dd>
-
-</dl>
-
-The following properties are placed in the import metadata,
-automatically:
-
-<dl>
-<dt><code>hostname</code></dt>
-<dd>The hostname the import was performed on.</dd>
-
-<dt><code>manifest-path</code></dt>
-<dd>The path to the manifest file used for the import.</dd>
-
-<dt><code>mtime</code></dt>
-<dd>The time (in seconds since the UNIX epoch) at which the import was
-committed.</dd>
-
-<dt><code>stats</code></dt>
-<dd>A Scheme alist of statistics about the import (number of
-files/blocks uploaded, etc).</dd>
-</dl>
-
-So, to wrap that all up, here's a sample import manifest file:
-
-<verbatim>
-(notes = "A bunch of old CDs I've finally ripped")
-
-(object "/home/alaric/newrip/track01.mp3"
-  (filename = "track01.mp3")
-  (dc:format = "audio/mpeg")
-
-  (dc:publisher = "Go! Beat Records")
-  (dc:created = "1994")
-  (dc:contributor = "Portishead")
-  (dc:subject = "Trip-Hop")
-  (superset:size = 1)
-  (superset:index = 1)
-  (set:title = "Dummy")
-  (set:size = 11)
-  (set:index = 1)
-  (dc:creator = "Portishead")
-  (dc:title = "Wandering Star")
-
-  (mtime = 1428962299.0)
-  (ctime = 1428962299.0)
-  (file-size = 4703055))
-
-;;... and so on, for ten more MP3s on this CD, then several other CDs...
-</verbatim>
-
-<h4>Actually importing a manifest</h4>
-
-Well, when you finally have a manifest file, importing it is easy:
-
-      $ ugarit import <ugarit.conf> <archive tag> <manifest path>
-
-<h4>How do I change the metadata of an already-imported file?</h4>
-
-That's easy; the "current" metadata of a file is the metadata of its
-most recent. Just import the file again, in a new manifest, with new
-metadata, and it will overwrite the old. However, the old metadata is
-still preserved in the archive's history; tags forked from the archive
-tag before the second import will still see the original state of the
-archive, by design.
-
-<h3>Exploring</h3>
-
-Archives are visible in the explore interface. For instance, an import
-of some music I did looks like this:
-
-<pre>
-> <b>ls</b>
-localhost-etc/ &lt;tag>
-archive-tag/ &lt;tag>
-> <b>cd archive-tag</b>
-/archive-tag> <b>ls</b>
-history/ &lt;archive-history>
-/archive-tag> <b>cd history</b>
-/archive-tag/history> <b>ls</b>
-2015-06-12 22:53:13/ &lt;import>
-/archive-tag/history> <b>cd 2015-06-12 22:53:13</b>
-/archive-tag/history/2015-06-12 22:53:13> <b>ls</b>
-log.sexpr &lt;file>
-properties.sexpr &lt;inline>
-manifest/ &lt;import-manifest>
-/archive-tag/history/2015-06-12 22:53:13> <b>cat properties.sexpr</b>
-((stats (blocks-stored . 2046)
-        (bytes-stored . 1815317503)
-        (blocks-skipped . 9)
-        (bytes-skipped . 8388608)
-        (file-cache-hits . 0)
-        (file-cache-bytes . 0))
- (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
- (mtime . 1434135993.0)
- (contents . "fcdd5b996914fdcac1e8a6cfbc67663e08f6eaf0cc952e21")
- (hostname . "ahe")
- (notes . "A bunch of music, imported as a demo")
- (manifest-path . "/home/alaric/tmp/test.manifest"))
-/archive-tag/history/2015-06-12 22:53:13> <b>cd manifest</b>
-/archive-tag/history/2015-06-12 22:53:13/manifest> <b>ls</b>
-1d4269099189234eefeb80b95370eaf280730cf4d591004d:03 The Lemon Song.mp3 &lt;file>
-7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3 &lt;file>
-64092fa12c2800dda474b41e5ebe8c948f39a59ee91c120b:09 How Many More Times.mp3 &lt;file>
-1d79148d1e1e8947c50b44cf2d5690588787af328e82eeef:2-07 Going to California.mp3 &lt;file>
-e3685148d0d12213074a9fdb94a00e05282aeabe77fa60d5:1-01 You Shook Me.mp3 &lt;file>
-d73904f371af8d7ca2af1076881230f2dc1c2cf82416880a:03 Strangers.mp3 &lt;file>
-9c5a0efb7d397180a1e8d42356d8f04c6c26a83d3b05d34a:09 Uptight.mp3 &lt;file>
-01a069aec2e731e18fcdd4ecb0e424f346a2f0e16910f5e9:07 Numb.mp3 &lt;file>
-7ea1ab7fbd525c40e21d6dd25130e8c70289ad56c09375b0:08 She.mp3 &lt;file>
-009dacd8f3185b7caeb47050002e584ab86d08cf9e9aceec:1-03 Communication Breakdown.mp3 &lt;file>
-26d264d629e22709f664ed891741f690900d45cd4fd44326:1-03 Dazed and Confused.mp3 &lt;file>
-d879761195faf08e4e95a5a2398ea6eefb79920710bfeab6:1-10 Band Introduction _ How Many More Times.mp3 &lt;file>
-83244601db42677d110fc8522c6a3cbbc1f22966a779f876:06 All My Love.mp3 &lt;file>
-5eebee9a2ad79d04e4f69e9e2a92c4e0a8d5f21e670f89da:07 Tangerine.mp3 &lt;file>
-dd6f1203b5973ecd00d2c0cee18087030490230727591746:2-08 That's the Way.mp3 &lt;file>
-c0acea15aa27a6dd1bcaff1c13d4f3d741a40a46abeca3fc:04 The Crunge.mp3 &lt;file>
-ea7727ad07c6c82e5c9c7218ee1b059cd78264c131c1438d:1-02 I Can't Quit You Baby.mp3 &lt;file>
-10fda5f46b8f505ca965bcaf12252eedf5ab44514236f892:14 F.O.D..mp3 &lt;file>
-a99ca9af5a83bde1c676c388dc273051defa88756df26e95:1-03 Good Times Bad Times.mp3 &lt;file>
-b5d7cfe9808c7fc0dedbd656d44e4c56159cbd3c2ed963bb:1-15 Stairway to Heaven.mp3 &lt;file>
-79c87e3c49ffdac175c95aae071f63d3a9efdf2ddb84998c:08.Batmilk.ogg &lt;file>
--- Press q then enter to stop or enter for more...
-q
-/archive-tag/history/2015-06-12 22:53:13/manifest> <b>ls -ll 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3</b>
--r--------     -     - [2015-04-13 21:46:39] -/-: 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
-key: #f
-contents: "7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
-import-path: "/home/alaric/archive/sorted-music/Led Zeppelin/Led Zeppelin/04 Dazed and Confused.mp3"
-filename: "04 Dazed and Confused.mp3"
-dc:format: "audio/mpeg"
-dc:publisher: "Atlantic"
-dc:subject: "Classic Rock"
-dc:title: "Dazed and Confused"
-dc:creator: "Led Zeppelin"
-dc:created: "1982"
-dc:contributor: "Led Zeppelin"
-set:title: "Led Zeppelin"
-set:index: 4
-set:size: 9
-superset:index: 1
-superset:size: 1
-ctime: 1428957999.0
-file-size: 15448903
-</verbatim>
-
-<h3>Searching</h3>
-
-However, the explore interface to an archive is far from pleasant. You
-need to go to the correct import, and find your file by name, and then
-identify it with a big long name composed of its hash and the original
-filename to find its properties and extract.
-
-I hope to add property-based searching to explore mode in future
-(which is why you need to go into a <code>history</code> directory
-within the archive directory, as other ways of exploring the archive
-will appear alongside). This will be particularly useful when the
-explore-mode virtual filesystem is mounted over 9P!
-
-However, even that interface, being constrained to look like a
-filesystem, will be limited. The <code>ugarit</code> command-line tool
-provides a very powerful search interface that exposes the full power
-of the archive metadata.
-
-<h4>Metadata filters</h4>
-
-Files (and directories) in an archive can be searched for using
-"metadata filters", which are descriptions of what you're looking for
-that the computer can understand. They are represented as Scheme
-s-expressions, and can be made up of the following components:
-
-<dl>
-<dt><code>#t</code></dt>
-<dd>This filter matches everything. It's not very useful.</dd>
-
-<dt><code>#f</code></dt>
-<dd>This filter matches nothing. It's not very useful.</dd>
-
-<dt><code>(and FILTER FILTER...)</code></dt>
-<dd>This filter matches files for which all of the inner filters match.</dd>
-
-<dt><code>(or FILTER FILTER...)</code></dt>
-<dd>This filter matches files for which any of the inner filters match.</dd>
-
-<dt><code>(not FILTER)</code></dt>
-<dd>This filter matches files which do not match the inner filter.</dd>
-
-<dt><code>(= ($ PROP) VALUE)</code></dt>
-<dd>This filter matches files which have the given
-<code>PROP</code>erty equal to that <code>VALUE</code> in their metadata.</dd>
-
-<dt><code>(= key HASH)</code></dt>
-<dd>This filter matches the file with the given hash.</dd>
-
-<dt><code>(= ($import PROP) VALUE)</code></dt>
-<dd>This filter matches files which have the given
-<code>PROP</code>erty equal to that <code>VALUE</code> in the metadata
-of the import that last imported them.</dd>
-</dl>
-
-<h4>Searching an archive</h4>
-
-For a start, you can search for files matching a given metadata filter
-in a given archive. This is done with:
-
-      $ ugarit search <ugarit.conf> <archive tag> <filter>
-
-For instance, let's look for music by Led Zeppelin:
-
-      $ ugarit search ugarit.conf music '(or
-        (= ($ dc:creator) "Led Zeppelin")
-        (= ($ dc:contributor) "Led Zeppelin"))'
-
-The result looks like the explore-mode view of an archive manifest,
-listing the file's hash followed by its title and extension:
-
-<verbatim>
-7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
-834a1619a59835e0c27b22801e3c829b40be583dadd19770:2-08 No Quarter.mp3
-9e8bc4954838bd9c671f275eb48595089257185750d63894:1-12 I Can't Quit You Baby.mp3
-6742b3bebcdd9cae5ec5403c585935403fa74d16ed076cf2:02 Friends (1).mp3
-07d161f4bd684e283f7f2cf26e0b732157a8e95ef66939c3:05 Carouselambra.mp3
-[...]
-</verbatim>
-
-What of all our lovely metadata? You can view that if you add the word
-"verbose" to the end of the command line, which allows you to specify
-alternate output formats:
-
-      $ ugarit search ugarit.conf music '(or
-        (= ($ dc:creator) "Led Zeppelin")
-        (= ($ dc:contributor) "Led Zeppelin"))' verbose
-
-Now the output looks like:
-
-<verbatim>
-object a444ff6ef807b080b536155f58d246d633cab4a0eabef5bf
-        (ctime = 1428958660.0)
-        (dc:contributor = "Led Zeppelin")
-        (dc:created = "2008")
-        (dc:creator = "Led Zeppelin")
-[... all the usual file properties omitted ...]
-        import a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c
-                (stats = ((blocks-stored . 2046) (bytes-stored . 1815317503) (blocks-skipped . 9) (bytes-skipped . 8388608) (file-cache-hits . 0) (file-cache-bytes . 0)))
-                (log = "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
-[... all the usual import properties omitted ...]
-object b4cadf48b2c07ccf0303fc4064b292cb222980b0d4223641
-        (ctime = 1428958673.0)
-        (dc:contributor = "Led Zeppelin")
-        (dc:created = "2008")
-        (dc:creator = "Led Zeppelin")
-        (dc:creator = "Jimmy Page/John Paul Jones/Robert Plant")
-[...and so on...]
-</verbatim>
-
-As you can see, it lists the hash of each file, its metadata, the hash
-of the import that last imported it, and the metadata of that import.
-
-That's quite verbose, so you'd probably be wanting to take that as
-input to another program to do something nicer with it. But it's laid
-out for human reading, not for machine parsing. Thankfully, we have
-other formats for that, <code>alist</code> and
-<code>alist-with-imports</code>.
-
-The look like:
-
-      $ ugarit search ugarit.conf music '(or
-        (= ($ dc:creator) "Led Zeppelin")
-        (= ($ dc:contributor) "Led Zeppelin"))' alist
-
-This outputs one Scheme s-expression list per match, the first element
-of which is the hash as a string, the rest of which is an alist of properties:
-
-<verbatim>
-("7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
- (ctime . 1428957999.0)
- (dc:contributor . "Led Zeppelin")
- (dc:created . "1982")
- (dc:creator . "Led Zeppelin")
-[... elided file properties ...]
- (superset:index . 1)
- (superset:size . 1))
-("77c960d09eb21ed72e434ddcde0bd3781a4f3d6ee7a6eb66"
- (ctime . 1428958981.0)
- (dc:contributor . "Led Zeppelin")
-[...]
-</verbatim>
-
-      $ ugarit search ugarit.conf music '(or
-        (= ($ dc:creator) "Led Zeppelin")
-        (= ($ dc:contributor) "Led Zeppelin"))' alist-with-imports
-
-This outputs one s-expression per list per match, with four
-elements. The first is the key string, the second is an alist of file
-properties, the third is the import's hash, and the last is an alist
-containing the import's properties. It looks like:
-
-<verbatim>
-("64fa08a0080aee6ef501c408fd44dfcc634cfcafd8006fc4"
- ((ctime . 1428958683.0)
-  (dc:contributor . "Led Zeppelin")
-  (dc:created . "2008")
-  (dc:creator . "Led Zeppelin")
-[... elided file properties ...]
-  (superset:index . 1)
-  (superset:size . 1))
- "a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c"
- ((stats (blocks-stored . 2046)
-         (bytes-stored . 1815317503)
-[... elided manifest properties ...]
-  (manifest-path . "test.manifest")))
-("4cd56f916a63399b252976e842dcae0b87f058b5a60c93a4"
- ((ctime . 1428958437.0)
-  (dc:contributor . "Led Zeppelin")
-[...]
-</verbatim>
-
-And finally, you might just want to get the hashes of matching files
-(which are particularly useful for extraction operations, which we'll
-come to next). To do this, specify a format of "keys", which outputs
-one line per match, containing just the hash:
-
-      $ ugarit search ugarit.conf music '(or
-        (= ($ dc:creator) "Led Zeppelin")
-        (= ($ dc:contributor) "Led Zeppelin"))' keys
-
-<verbatim>
-ce6f6484337de772de9313038cb25d1b16e28028136cc291
-6af5c664cbfa1acb22a377e97aee35d94c0fc003d239dd0c
-92e91e79b384478b5aab31bf1b2ff9e25e7e2c4b48575185
-6ddb9a41d4968468a904f05ecf7e0e73d2c7c7ad76bc394b
-a074dddcef67cd93d92c6ffce845894aa56594674023f6e1
-4f65f735bbb00a6fda4bc887b370b3160f55e5e07ec37ffa
-97cc8b8ba70c39387fc08ef62311b751aea4340d636eb421
-72358dbe3eb60da42eadcf6de325b2a6686f4e17ea41fa60
-[...]
-</verbatim>
-
-However, to write filter expressions, you need to know what properties
-you have available to search on. You might remember, or go for
-standard properties, or look at existing files in verbose mode to find
-some; but you can also just ask Ugarit what properties it has in an
-archive, like so:
-
-      $ ugarit search-props <ugarit.conf> <archive tag>
-
-You can even ask what properties are available for files matching an
-existing filter:
-
-      $ ugarit search-props <ugarit.conf> <archive tag> <filter>
-
-This is useful if you're interested in further narrowing down a
-filter, and so only care about properties that files already matching
-that filter have.
-
-For a bunch of music files imported with the Ugarit Manifest Maker,
-you can expect to see something like this:
-
-<verbatim>
-ctime
-dc:contributor
-dc:created
-dc:creator
-dc:format
-dc:publisher
-dc:subject
-dc:title
-file-size
-filename
-import-path
-mtime
-set:index
-set:size
-set:title
-superset:index
-superset:size
-</verbatim>
-
-Now you know what properties to search, next you'll be wanting to know
-what values to look for. Again, Ugarit has a command to query the
-available values of any given property:
-
-      $ ugarit search-values <ugarit.conf> <archive tag> <property>
-
-And you can limit that just to files matching a given filter:
-
-      $ ugarit search-values <ugarit.conf> <archive tag> <filter> <property>
-
-The resulting list of values is ordered by popularity, so the most
-widely-used values will be listed first. Let's see what genres of
-music were in my sample of music files I imported:
-
-      $ ugarit search-values test.conf archive-tag dc:subject
-
-The result is:
-
-<verbatim>
-Classic Rock
-Alternative & Punk
-Electronic
-Trip-Hop
-</verbatim>
-
-Ok, let's now use a filter to find out what artists
-(<code>dc:creator</code>) I have that made Trip-Hop music (what even
-IS that?):
-
-      $ ugarit search-values test.conf archive-tag \
-         '(= ($ dc:subject) "Trip-Hop")' \
-         dc:creator
-
-The result is:
-
-<verbatim>
-Portishead
-</verbatim>
-
-Ah, OK, now I know what "Trip-Hop" is.
-
-<h3>Extracting</h3>
-
-All this searching is lovely, but what it gets us, in the end, is a
-bunch of file hashes. Perhaps we might want to actually play some
-music, or look at a photo, or something. To do that, we need to
-extract from the archive.
-
-We've already seen the contents of an archive in the explore mode
-virtual filesystem, so we could go into the archive history, find the
-import, go into the manifest, pick the file out there, and use
-<code>get</code> to extract it, but that would be yucky. Thankfully,
-we have a command-line interface to get things from archives, in one
-of two ways.
-
-Firstly, we can extract a file (or a directory tree) from an archive,
-out into the local filesystem:
-
-      $ ugarit archive-extract <ugarit.conf> <archive tag> <hash> <target>
-
-The "target" is the name to give it in the local filesystem. We could
-pull out that Led Zeppelin song from our search results above, like so:
-
-      $ ugarit archive-extract test.conf archive-tag \
-         ce6f6484337de772de9313038cb25d1b16e28028136cc291 foo.mp3
-
-We now have a foo.mp3 file in the current directory.
-
-However, sometimes it would be nicer to have it streamed to standard
-output, which can be done like so:
-
-      $ ugarit archive-stream <ugarit.conf> <archive tag> <hash>
-
-This lets us write a command such as:
-
-      $ ugarit archive-stream test.conf archive-tag \
-         ce6f6484337de772de9313038cb25d1b16e28028136cc291 | mpg123 -
-
-...to play it in real time.
-
-<h2>Storage administration</h2>
-
-Each backend offers a number of administrative commands for
-administering the storage underlying vaults. These are accessible via
-the <code>ugarit-storage-admin</code> command line interface.
-
-To use it, run it with the following command:
-
-      $ ugarit-storage-admin '<vault identifier>'
-
-The available commands differ between backends, but all backends
-support the <code>info</code> and <code>help</code> commands, which
-give basic information about the vault, and list all available
-commands, respectively. Some offer a <code>stats</code> command that
-examines the vault state to give interesting statistics, but which may
-be a time-consuming operation.
-
-<h3>Administering <code>splitlog</code> storages</h3>
-
-The splitlog backend offers a wide selection of administrative
-commands. See the <code>help</code> command on a splitlog vault for
-details. The following commands are available:
-
-<dl>
-
-<dt><code>help</code></dt>
-<dd>List the available commands.</dd>
-
-<dt><code>info</code></dt>
-<dd>List some basic information about the storage.</dd>
-
-<dt><code>stats</code></dt>
-<dd>Examine the metadata to provide overall statistics about the
-archive. This may be a time-consuming operation on large
-storages.</dd>
-
-<dt><code>set-block-size! BYTES</code></dt>
-<dd>Sets the block size to the given number of bytes. This will affect
-new blocks written to the storage, and leave existing blocks
-untouched, even if they are larger than the new block size.</dd>
-
-<dt><code>set-max-logfile-size! BYTES</code></dt>
-<dd>Sets the size at which a log file is finished and a new one
-started (likewise, existing log files will be untouched; this will
-only affect new log files)</dd>
-
-<dt><code>set-commit-interval! UPDATES</code></dt>
-<dd>Sets the frequency of automatic synching of the storage
-state to disk. Lowering this harms performance when writing to the
-storage, but decreases the number of in-progress block writes that
-can fail in a crash.</dd>
-
-<dt><code>write-protect!</code></dt>
-<dd>Disables updating of the storage.</dd>
-
-<dt><code>write-unprotect!</code></dt>
-<dd>Re-enables updating of the storage.</dd>
-
-<dt><code>reindex!</code></dt>
-<dd>Reindex the storage, rebuilding the block and tag state from the
-contents of the log. If the metadata file is damaged or lost,
-reindexing can rebuild it (although any configuration changes made
-via other admin commands will need manually repeating as they are
-not logged).</dd>
-</dl>
-
-<h3>Administering <code>sqlite</code> storages</h3>
-
-The sqlite backend has a similar administrative interface to the
-splitlog backend, except that it does not have log files, so lacks the
-<code>set-max-logfile-size!</code> and <code>reindex!</code> commands.
-
-<h3>Administering <code>cache</code> storages</h3>
-
-The cache backend provides a minimalistic interface:
-
-<dl>
-
-<dt><code>help</code></dt>
-<dd>List the available commands.</dd>
-
-<dt><code>info</code></dt>
-<dd>List some basic information about the storage.</dd>
-
-<dt><code>stats</code></dt>
-<dd>Report on how many entries are in the cache.</dd>
-
-<dt><code>clear!</code></dt>
-<dd>Clears the cache, dropping all the entries in it.</dd>
-
-</dl>
-
-<h2><code>.ugarit</code> files</h2>
-
-By default, Ugarit will vault everything it finds in the filesystem
-tree you tell it to snapshot. However, this might not always be
-desired; so we provide the facility to override this with <code>.ugarit</code>
-files, or global rules in your <code>.conf</code> file.
-
-Note: The syntax of these files is provisional, as I want to
-experiment with usability, as the current syntax is ugly. So please
-don't be surprised if the format changes in incompatible ways in
-subsequent versions!
-
-In quick summary, if you want to ignore all files or directories
-matching a glob in the current directory and below, put the following
-in a <code>.ugarit</code> file in that directory:
-
-      (* (glob "*~") exclude)
-
-You can write quite complex expressions as well as just globs. The
-full set of rules is:
-
-   *  <code>(glob "<em>pattern</em>")</code> matches files and directories whose names
-  match the glob pattern
-
-   *  <code>(name "<em>name</em>")</code> matches files and directories with exactly that
-  name (useful for files called <code>*</code>...)
-
-   *  <code>(modified-within <em>number</em> seconds)</code> matches files and
-  directories modified within the given number of seconds
-
-  *  <code>(modified-within <em>number</em> minutes)</code> matches files and
-  directories modified within the given number of minutes
-
-  *  <code>(modified-within <em>number</em> hours)</code> matches files and directories
-  modified within the given number of hours
-
-  *  <code>(modified-within <em>number</em> days)</code> matches files and directories
-  modified within the given number of days
-
-  *  <code>(not <em>rule</em>)</code> matches files and directories that do not match
-  the given rule
-
-  *  <code>(and <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
-  all the given rules
-
-  *  <code>(or <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
-  any of the given rules
-
-Also, you can override a previous exclusion with an explicit include
-in a lower-level directory:
-
-    (* (glob "*~") include)
-
-You can bind rules to specific directories, rather than to "this
-directory and all beneath it", by specifying an absolute or relative
-path instead of the `*`:
-
-    ("/etc" (name "passwd") exclude)
-
-If you use a relative path, it's taken relative to the directory of
-the <code>.ugarit</code> file.
-
-You can also put some rules in your <code>.conf</code> file, although relative
-paths are illegal there, by adding lines of this form to the file:
-
-    (rule * (glob "*~") exclude)
-
-<h1>Questions and Answers</h1>
-
-<h2>What happens if a snapshot is interrupted?</h2>
-
-Nothing! Whatever blocks have been uploaded will be uploaded, but the
-snapshot is only added to the tag once the entire filesystem has been
-snapshotted. So just start the snapshot again. Any files that have
-already be uploaded will then not need to be uploaded again, so the
-second snapshot should proceed quickly to the point where it failed
-before, and continue from there.
-
-Unless the vault ends up with a partially-uploaded corrupted block
-due to being interrupted during upload, you'll be fine. The filesystem
-backend has been written to avoid this by writing the block to a file
-with the wrong name, then renaming it to the correct name when it's
-entirely uploaded.
-
-Actually, there is *one* caveat: blocks that were uploaded, but never
-make it into a finished snapshot, will be marked as "referenced" but
-there's no snapshot to delete to un-reference them, so they'll never
-be removed when you delete snapshots. (Not that snapshot deletion is
-implemented yet, mind). If this becomes a problem for people, we could
-write a "garbage collect" tool that regenerates the reference counts
-in a vault, leading to unused blocks (with a zero refcount) being
-unlinked.
-
-<h2>Should I share a single large vault between all my filesystems?</h2>
-
-I think so. Using a single large vault means that blocks shared
-between servers - eg, software installed from packages and that sort
-of thing - will only ever need to be uploaded once, saving storage
-space and upload bandwidth. However, do not share a vault between
-servers that do not mutually trust each other, as they can all update
-the same tags, so can meddle with each other's snapshots - and read
-each other's snapshots.
-
-<h3>CAVEAT</h3>
-
-It's not currently safe to have multiple concurrent snapshots to the
-same split log backend; this will soon be fixed, however.
-
-<h1>Security model</h1>
-
-I have designed and implemented Ugarit to be able to handle cases
-where the actual vault storage is not entirely trusted.
-
-However, security involves tradeoffs, and Ugarit is configurable in
-ways that affect its resistance to different kinds of attacks. Here I
-will list different kinds of attack and explain how Ugarit can deal
-with them, and how you need to configure it to gain that
-protection.
-
-<h2>Vault snoopers</h2>
-
-This might be somebody who can intercept Ugarit's communication with
-the vault at any point, or who can read the vault itself at their
-leisure.
-
-Ugarit's splitlog backend creates files with "rw-------" permissions
-out of the box to try and prevent this. This is a pain for people who
-want to share vaults between UIDs, but we can add a configuration
-option to override this if that becomes a problem.
-
-<h3>Reading your data</h3>
-
-If you enable encryption, then all the blocks sent to the vault are
-encrypted using a secret key stored in your Ugarit configuration
-file. As long as that configuration file is kept safe, and the AES
-algorithm is secure, then attackers who can snoop the vault cannot
-decode your data blocks. Enabling compression will also help, as the
-blocks are compressed before encrypting, which is thought to make
-cryptographic analysis harder.
-
-Recommendations: Use compression and encryption when there is a risk
-of vault snooping. Keep your Ugarit configuration file safe using
-UNIX file permissions (make it readable only by root), and maybe store
-it on a removable device that's only plugged in when
-required. Alternatively, use the "prompt" passphrase option, and be
-prompted for a passphrase every time you run Ugarit, so it isn't
-stored on disk anywhere.
-
-<h3>Looking for known hashes</h3>
-
-A block is identified by the hash of its content (before compression
-and encryption). If an attacker was trying to find people who own a
-particular file (perhaps a piece of subversive literature), they could
-search Ugarit vaults for its hash.
-
-However, Ugarit has the option to "key" the hash with a "salt" stored
-in the Ugarit configuration file. This means that the hashes used are
-actually a hash of the block's contents *and* the salt you supply. If
-you do this with a random salt that you keep secret, then attackers
-can't check your vault for known content just by comparing the hashes.
-
-Recommendations: Provide a secret string to your hash function in your
-Ugarit configuration file. Keep the Ugarit configuration file safe, as
-per the advice in the previous point.
-
-<h2>Vault modifiers</h2>
-
-These folks can modify Ugarit's writes into the vault, its reads
-back from the vault, or can modify the vault itself at their leisure.
-
-Modifying an encrypted block without knowing the encryption key can at
-worst be a denial of service, corrupting the block in an unknown
-way. An attacker who knows the encryption key could replace a block
-with valid-seeming but incorrect content. In the worst case, this
-could exploit a bug in the decompression engine, causing a crash or
-even an exploit of the Ugarit process itself (thereby gaining the
-powers of a process inspector, as documented below). We can but hope
-that the decompression engine is robust. Exploits of the decryption
-engine, or other parts of Ugarit, are less likely due to the nature of
-the operations performed upon them.
-
-However, if a block is modified, then when Ugarit reads it back, the
-hash will no longer match the hash Ugarit requested, which will be
-detected and an error reported. The hash is checked after
-decryption and decompression, so this check does not protect us
-against exploits of the decompression engine.
-
-This protection is only afforded when the hash Ugarit asks for is not
-tampered with. Most hashes are obtained from within other blocks,
-which are therefore safe unless that block has been tampered with; the
-nature of the hash tree conveys the trust in the hashes up to the
-root. The root hashes are stored in the vault as "tags", which an
-vault modifier could alter at will. Therefore, the tags cannot be
-trusted if somebody might modify the vault. This is why Ugarit
-prints out the snapshot hash and the root directory hash after
-performing a snapshot, so you can record them securely outside of the
-vault.
-
-The most likely threat posed by vault modifiers is that they could
-simply corrupt or delete all of your vault, without needing to know
-any encryption keys.
-
-Recommendations: Secure your vaults against modifiers, by whatever
-means possible. If vault modifiers are still a potential threat,
-write down a log of your root directory hashes from each snapshot, and keep
-it safe. When extracting your backups, use the <code>ls -ll</code> command in the
-interface to check the "contents" hash of your snapshots, and check
-they match the root directory hash you expect.
-
-<h2>Process inspectors</h2>
-
-These folks can attach debuggers or similar tools to running
-processes, such as Ugarit itself.
-
-Ugarit backend processes only see encrypted data, so people who can
-attach to that process gain the powers of vault snoopers and
-modifiers, and the same conditions apply.
-
-People who can attach to the Ugarit process itself, however, will see
-the original unencrypted content of your filesystem, and will have
-full access to the encryption keys and hashing keys stored in your
-Ugarit configuration. When Ugarit is running with sufficient
-permissions to restore backups, they will be able to intercept and
-modify the data as it comes out, and probably gain total write access
-to your entire filesystem in the process.
-
-Recommendations: Ensure that Ugarit does not run under the same user
-ID as untrusted software. In many cases it will need to run as root in
-order to gain unfettered access to read the filesystems it is backing
-up, or to restore the ownership of files. However, when all the files
-it backs up are world-readable, it could run as an untrusted user for
-backups, and where file ownership is trivially reconstructible, it can
-do restores as a limited user, too.
-
-<h2>Attackers in the source filesystem</h2>
-
-These folks create files that Ugarit will back up one day. By having
-write access to your filesystem, they already have some level of
-power, and standard Unix security practices such as storage quotas
-should be used to control them. They may be people with logins on your
-box, or more subtly, people who can cause servers to writes files;
-somebody who sends an email to your mailserver will probably cause
-that message to be written to queue files, as will people who can
-upload files via any means.
-
-Such attackers might use up your available storage by creating large
-files. This creates a problem in the actual filesystem, but that
-problem can be fixed by deleting the files. If those files get
-stored into Ugarit, then they are a part of that snapshot. If you
-are using a backend that supports deletion, then (when I implement
-snapshot deletion in the user interface) you could delete that entire
-snapshot to recover the wasted space, but that is a rather serious
-operation.
-
-More insidiously, such attackers might attempt to abuse a hash
-collision in order to fool the vault. If they have a way of creating
-a file that, for instance, has the same hash as your shadow password
-file, then Ugarit will think that it already has that file when it
-attempts to snapshot it, and store a reference to the existing
-file. If that snapshot is restored, then they will receive a copy of
-your shadow password file. Similarly, if they can predict a future
-hash of your shadow password file, and create a shadow password file
-of their own (perhaps one giving them a root account with a known
-password) with that hash, they can then wait for the real shadow
-password file to have that hash. If the system is later restored from
-that snapshot, then their chosen content will appear in the shadow
-password file. However, doing this requires a very fundamental break
-of the hash function being used.
-
-Recommendations: Think carefully about who has write access to your
-filesystems, directly or indirectly via a network service that stores
-received data to disk. Enforce quotas where appropriate, and consider
-not backing up "queue directories" where untrusted content might
-appear; migrate incoming content that passes acceptance tests to an
-area that is backed up. If necessary, the queue might be backed up to
-a non-snapshotting system, such as rsyncing to another server, so that
-any excessive files that appear in there are removed from the backup
-in due course, while still affording protection.
+<h1>Documentation</h1>
+
+  *  [./docs/intro.wiki|Introduction to Ugarit]
+  *  [./docs/installation.wiki|Installation and Configuration]
+  *  [./docs/commands.wiki|Command reference]
+  *  [./docs/storage-admin.wiki|Storage backend administration]
+  *  [./docs/dot-ugarit.wiki|Fine-tuning snapshots with <code>.ugarit</code> files]
+  *  [./docs/archive-schema.wiki|Archive metadata schema]
+  *  [./docs/faq.wiki|Frequently Asked Questions]
+  *  [./docs/security.wiki|Security guide]
 
 <h1>Acknowledgements</h1>
 
 The Ugarit implementation contained herein is the work of Alaric
 Snell-Pym and Christian Kellermann, with advice, ideas, encouragement
@@ -1863,166 +110,8 @@
 And I'd like to thank my wife for putting up with me spending several
 evenings and weekends and holiday days working on this thing...
 
 <h1>Version history</h1>
 
-  *  2.0: Archival mode [dae5e21ffc], and to support its integration
-     into Ugarit, implemented typed tags [08bf026f5a], displaying tag
-     types in the VFS [30054df0b6], refactoring the Ugarit internals
-     [5fa161239c], made the storage of logs in the vault better
-     [68bb75789f], made it possible to view logs from within the VFS
-     [4e3673e0fe], supported hidden tags [cf5ef4691c], recording
-     configuration information in the vault (and providing instant
-     notification if your vault hashing/encryption setup is incorrect,
-     thanks to a clever idea by Andy Bennett) [0500d282fc], rearranged
-     how local caching is handled [b5911d321a], and added support for
-     the history of a snapshot or archive tag to have arbitrary
-     branches and merges [a987e28fef], which (as a side-effect)
-     improved the performance of running "ls" in long snapshot
-     histories [fcf8bc942a]. Also added an sqlite backend
-     [8719dfb84f], which makes testing easier but is useful in its own
-     right as it's fully-featured and crash-safe, while storing the
-     vault in a single file; and improved the appearance of the
-     explore mode ls command, as the VFS layout has become more
-     complex with the new log/properties views and all the archive
-     mode stuff.
+  *  2015-06-12: [./docs/release-2.0.wiki|Version 2.0]
 
-  *  1.0.9:  More humane display of sizes in explore's directory
-     listings, using low-level I/O to reduce CPU usage. Myriad small
-     bug fixes and some internal structural improvements.
-
-  *  1.0.8: Bug fixes to work with the latest chicken master, and
-     increased unit test coverage to test stuff that wasn't working
-     due to chicken bugs. Looking good!
-
-  *  1.0.7: Fixed bug with directory rules (errors arose when files
-     were skipped). I need to improve the test suite coverage of
-     high-level components to stop this happening!
-
-  *  1.0.6: Fixed missing features from v1.0.5 due to a fluffed merge
-     (whoops), added tracking of directory sizes (files+bytes) in the
-     vault on snapshot and the use of this information to display
-     overall percentage completion when extracting. Directory sizes
-     can be seen in the explore interface when doing "ls -l" or "ls -ll".
-
-  *  1.0.5: Changed the VFS layout slightly, making the existence of
-     snapshot objects explicit (when you go into a tag, then go into a
-     snapshot, you now need to go into "contents" to see the actual
-     file tree; the snapshot object itself now exists as a node in the
-     tree). Added traverse-vault-* functions to the core API, and tests
-     for same, and used traverse-vault-node to drive the cd and get
-     functions in the interactive explore mode (speeding them up in the
-     process!). Added "extract" command. Added a progress reporting
-     callback facility for snapshots and extractions, and used it to
-     provide progress reporting in the front-end, every 60 seconds or
-     so by default, not at all with -q, and every time something
-     happens with -v. Added tab completion in explore mode.
-
-  *  1.0.4: Resurrected support for compression and encryption and SHA2
-  hashes, which had been broken by the failure of the
-  <code>autoload</code> egg to continue to work as it used to. Tidying
-  up error and ^C handling somewhat.
-
-  *  1.0.3: Installed sqlite busy handlers to retry when the database is
-   locked due to concurrent access (affects backend-fs, backend-cache,
-   and the file cache), and gained an EXCLUSIVE lock when locking a
-   tag in backend-fs; I'm not clear if it's necessary, but it can't
-   hurt.
-
-   BUGFIX: Logging of messages from storage backends wasn't
-   happening correctly in the Ugarit core, leading to errors when the
-   cache backend (which logs an info message at close time) was closed
-   and the log message had nowhere to go.
-
-  *  1.0.2: Made the file cache also commit periodically, rather than on
-  every write, in order to improve performance. Counting blocks and
-  bytes uploaded / reused, and file cache bytes as well as hits;
-  reporting same in snapshot UI and logging same to snapshot
-  metadata. Switched to the <code>posix-extras</code> egg and ditched our own
-  <code>posixextras.scm</code> wrappers. Used the <code>parley</code> egg in the <code>ugarit
-  explore</code> CLI for line editing. Added logging infrastructure,
-  recording of snapshot logs in the snapshot. Added recovery from
-  extraction errors. Listed lock state of tags in explore
-  mode. Backend protocol v2 introduced (retaining v1 for
-  compatability) allowing for an error on backend startup, and logging
-  nonfatal errors, warnings, and info on startup and all protocol
-  calls. Added <code>ugarit-archive-admin</code> command line interface to
-  backend-specific administrative interfaces. Configuration of the
-  splitlog backend (write protection, adjusting block size and logfile
-  size limit and commit interval) is now possible via the admin
-  interface. The admin interface also permits rebuilding the metadata
-  index of a splitlog vault with the <code>reindex!</code> admin command.
-
-  BUGFIX: Made file cache check the file hashes it finds in the
-    cache actually exist in the vault, to protect against the case
-    where a crash of some kind has caused unflushed changes to be
-    lost; the file cache may well have committed changes that the
-    backend hasn't, leading to references to nonexistant blocks. Note
-    that we assume that vaults are sequentially safe, eg if the
-    final indirect block of a large file made it, all the partial
-    blocks must have made it too.
-
-  BUGFIX: Added an explicit <code>flush!</code> command to the backend
-    protocol, and put explicit flushes at critical points in higher
-    layers (<code>backend-cache</code>, the vault abstraction in the Ugarit
-    core, and when tagging a snapshot) so that we ensure the blocks we
-    point at are flushed before committing references to them in the
-    <code>backend-cache</code> or file caches, or into tags, to ensure crash
-    safety.
-
-  BUGFIX: Made the splitlog backend never exceed the file size limit
-    (except when passed blocks that, plus a header, are larger than
-    it), rather than letting a partial block hang over the 'end'.
-
-  BUGFIX: Fixed tag locking, which was broken all over the
-    place. Concurrent snapshots to the same tag should now block for
-    one another, although why you'd want to *do* that is questionable.
-
-  BUGFIX: Fixed generation of non-keyed hashes, which was
-    incorrectly appending the type to the hash without an outer
-    hash. This breaks backwards compatability, but nobody was using
-    the old algorithm, right? I'll introduce it as an option if
-    required.
-
-  *  1.0.1: Consistency check on read blocks by default. Removed warning
-  about deletions from backend-cache; we need a new mechanism to
-  report warnings from backends to the user. Made backend-cache and
-  backend-fs/splitlog commit periodically rather than after every
-  insert, which should speed up snapshotting a lot, and reused the
-  prepared statements rather than re-preparing them all the
-  time.
-
-  BUGFIX: splitlog backend now creates log files with
-  "rw-------" rather than "rwx------" permissions; and all sqlite
-  databases (splitlog metadata, cache file, and file-cache file) are
-  created with "rw-------" rather then "rw-r--r--".
-
-  *  1.0: Migrated from gdbm to sqlite for metadata storage, removing the
-  GPL taint. Unit test suite. backend-cache made into a separate
-  backend binary. Removed backend-log.
-
-  BUGFIX: file caching uses mtime *and*
-  size now, rather than just mtime. Error handling so we skip objects
-  that we cannot do something with, and proceed to try the rest of the
-  operation.
-
-  *  0.8: decoupling backends from the core and into separate binaries,
-  accessed via standard input and output, so they can be run over SSH
-  tunnels and other such magic.
-
-  *  0.7: file cache support, sorting of directories so they're archived
-  in canonical order, autoloading of hash/encryption/compression
-  modules so they're not required dependencies any more.
-
-  *  0.6: .ugarit support.
-
-  *  0.5: Keyed hashing so attackers can't tell what blocks you have,
-  markers in logs so the index can be reconstructed, sha2 support, and
-  passphrase support.
-
-  *  0.4: AES encryption.
-
-  *  0.3: Added splitlog backend, and fixed a .meta file typo.
-
-  *  0.2: Initial public release.
-
-  *  0.1: Internal development release.
+  *  [./docs/release-old.wiki|Previous Releases]

Index: RELEASE.wiki
==================================================================
--- RELEASE.wiki
+++ RELEASE.wiki
@@ -1,18 +1,25 @@
+The tip of trunk is "what's live"; the documentation, ugarit.setup,
+and ugarit.release-info from there is what gets served to the public
+at the canonical URLs.
+
+Do not merge documentation changes onto the trunk until you're
+releasing, or the live docs will be ahead of the available version!
+
 How to do a release:
 
   *  Merge desired changes onto the trunk
   *  Update ugarit.setup to set the new version
   *  Install and test to make sure you didn't break it!
-  *  Update ugarit.release-info to refer to the new release
   *  Commit, and tag the commit with the version number
   *  Run ../kitten-technologies/bin/generate-download-page to
      update DOWNLOAD.wiki
+  *  Update ugarit.release-info to refer to the new release
   *  Commit again
   *  Announce on Google Plus etc.
 
 See also:
 
 http://www.kitten-technologies.co.uk/project/kitten-technologies/doc/trunk/README.wiki
 
 In future, expand this with a way of tagging a pre-release beta in
 Fossil for fossil followers to try out, before we tag it for henrietta.

Index: docs/archive-schema.wiki
==================================================================
--- docs/archive-schema.wiki
+++ docs/archive-schema.wiki
@@ -1,73 +1,201 @@
+<h1>Ugarit Archive Metadata Schema</h1>
+
 Any symbol can be used as an archive metadata property name, but here are some
-standard ones, defined for the sake of interoperability.
-
-<h2>System-provided import properties</h2>
-
-previous
-contents
-mtime
-log
-stats
-hostname
-manifest-path
-
-<h2>System-provided object properties</h2>
-
-import-path - full path to imported file
-filename - filename and extension
-dc:format - guessed MIME type
-
-<h2>Object properties provided by the manifest maker</h2>
-
-file-size
-mtime
-ctime
-filename
-dc:title - made from file name, or in-file metadata
-dc:format - MIME type
+suggested ones, defined for the sake of interoperability.
+
+Where possible, we have used the
+[http://dublincore.org/documents/2001/04/12/usageguide/generic.shtml|Dublin
+Core] vocabulary, as it's a good fit for the kinds of things archive
+mode is designed for. Properties imported from Dublin Core are
+identified with a <code>dc:</code> prefix.
+
+Some of these properties are automatically applied by the import
+process. However, if these properties are specified in the import
+manifest file, then the specified value from the manifest overrides
+the default.
+
+<h2>Import properties</h2>
+
+These are properties applied to an import object, rather than to an
+individual object in an archive.
+
+<h3>Internal</h3>
+
+These properties are all provided by the system itself, and must not
+be specified in an import manifest.
+
+<dl>
+<dt><code>previous</code> (hash)</dt>
+<dd>The hash of a previous import. If there is
+no instance of this property, then this is the first import in a
+sequence. If there are more than one instances, then this is a
+merge.</dd>
+
+<dt><code>contents</code> (hash)</dt>
+<dd>The hash of the imported archive manifest. This is probably not of
+much interest beyond the Ugarit internals.</dd>
+
+<dt><code>mtime</code> (number)</dt>
+<dd>The UNIX timestamp of the import.</dd>
+
+<dt><code>log</code> (hash)</dt>
+<dd>The hash of the import log file.</dd>
+
+<dt><code>stats</code> (alist)</dt>
+<dd>An alist of import statistics.</dd>
+
+<dt><code>manifest-path</code></dt>
+<dd>The path to the manifest filename that was used for the import.</dd>
+
+<dt><code>hostname</code></dt>
+<dd>The hostname on which the import was performed.</dd>
+
+</dl>
+
+<h2>Core object properties</h2>
+
+These object properties apply usefully to almost anything in an archive.
+
+<dt><code>import-path</code></dt>
+<dd>The path the file was imported from, as taken from the import
+manifest file. (DEFAULT: The path from the manifest file)</dd>
+
+<dt><code>filename</code></dt>
+<dd>The name of the file, including the extension (if applicable), but
+not any directory path. This is usually the name the file had when it
+was imported (eg, the latter part of <code>import-path</code>), but if
+it was imported from some temporary file name while the system knows
+of a "proper" filename other than that, they may differ. (DEFAULT: The
+import path, minus any directory path)</dd>
+
+<dt><code>dc:format</code></dt>
+<dd>The MIME type of the file. (DEFAULT: A MIME type guessed from the
+file extension)<dd>
+
+<dt><code>file-size</code></dt>
+<dd>The size of the file. If it's a directory, then this is the sum of
+the sizes of the files within it, not including any directory
+metadata.</dd>
+
+<dt><code>mtime</code> (number)</dt>
+<dd>The mtime of the file when it was imported, as a UNIX
+timestamp.</dd>
+
+<dt><code>ctime</code></dt>
+<dd>The ctime of the file when it was imported, as a UNIX
+timestamp.</dd>
+
+<dt><code>dc:title</code></dt>
+<dd>The title of the object. This should be a proper human-readable
+title, not just a filename, where possible.</dd>
+
+<dt><code>dc:description</code></dt>
+<dd>A longer description of the object.</dd>
 
 <h2>Object properties for music</h2>
 
-dc:title
-dc:creator
-dc:contributor
-dc:publisher
-dc:created - date
-dc:subject - genre
-set:title - title of album
-set:index - track number
-set:size - track count
-superset:index - disc number
-superset:size - number of discs
+Music files should put the song title in <code>dc:title</code>.
+
+<dt><code>dc:creator</code></dt>
+<dd>The creator of the piece, generally the artist name.</dd>
+
+<dt><code>dc:contributor</code></dt>
+<dd>Some other contributor to the piece, other than the artist.</dd>
+
+<dt><code>dc:publisher</code></dt>
+<dd>The name of the publisher.</dd>
+
+<dt><code>dc:created</code></dt>
+<dd>The creation date, in <code>YYYY-MM-DD</code> form.</dd>
+
+<dt><code>dc:subject</code></dt>
+<dd>The name of the genre.</dd>
+
+<dt><code>set:title</code></dt>
+<dd>The title of the album.</dd>
+
+<dt><code>set:index</code></dt>
+<dd>Track number within the album.</dd>
+
+<dt><code>set:size</code></dt>
+<dd>Track count within the album.</dd>
+
+<dt><code>superset:index</code></dt>
+<dd>For multi-disk albums, the disk number.</dd>
+
+<dt><code>superset:size</code></dt>
+<dd>For multi-disk albums, the number of disks.</dd>
 
 <h2>Object properties for photographs</h2>
 
-dc:creator - photographer
-dc:description
-dc:subject - keyword, person/thing in photo
-dc:spatial - place name, or lat/long/alt
-dc:temporal - name of event featured
-dc:created - timestamp
+Use <code>dc:description</code> for a description of the photo.
+
+<dt><code>dc:creator</code></dt>
+<dd>The name of the photographer.</dd>
+
+<dt><code>dc:subject</code></dt>
+<dd>Something in the photograph (names of photographed people or
+things, or more general keywords)</dd>
+
+<dt><code>dc:spatial</code></dt>
+<dd>The name of the place the photo was taken, or coordinates as a
+[https://en.wikipedia.org/wiki/Geo_URI|geo: URL].</dd>
+
+<dt><code>dc:temporal</code></dt>
+<dd>The name of the event the photograph was from.</dd>
+
+<dt><code>dc:created</code></dt>
+<dd>The creation timestamp of the photo, in YYYY-MM-DD format,
+optionally with a 24-hour UTC HH:MM:SS time.</dd>
 
 <h2>Object properties for PDF/PS/ebooks</h2>
 
-dc:title
-dc:creator
-dc:subject
-dc:description
-dc:created
-dc:publisher
-dc:identifier - ISBN
-dc:source - download from URL
+Use <code>dc:title</code> for the title of the work.
+
+<dt><code>dc:creator</code></dt>
+<dd>The name of the author.</dd>
+
+<dt><code>dc:subject</code></dt>
+<dd>A subject or keyword.</dd>
+
+<dt><code>dc:created</code></dt>
+<dd>The creation date in YYYY-MM-DD format.</dd>
+
+<dt><code>dc:publisher</code></dt>
+<dd>The name of the publisher.</dd>
+
+<dt><code>dc:identifier</code></dt>
+<dd>An ISBN, ISSN, or similar identifier, in
+[https://en.wikipedia.org/wiki/Uniform_resource_name|URN format] (eg:
+<code>urn:isbn:0451450523</code>).</dd>
+
+<dt><code>dc:source</code></dt>
+<dd>The original URL the thing was downloaded from.</dd>
 
 <h2>Other useful Dublin Core properties</h2>
 
-See
-[http://dublincore.org/documents/2001/04/12/usageguide/generic.shtml]
-for inspiration.
+<dt><code>dc:alternative</code></dt>
+<dd>An alternative title.</dd>
+
+<dt><code>dc:extent</code></dt>
+<dd>Size, duration, etc. Not the size of the file in bytes, but the
+duration of a recording, the size of an image in pixels, etc.</dd>
+
+<dt><code>dc:language</code></dt>
+<dd>The language of the object. <code>en</code>, <code>en-GB</code>,
+<code>jbo</code>, etc.</dd>
+
+<dt><code>dc:license</code></dt>
+<dd>A description of the license the file is under.</dd>
+
+<dt><code>dc:accessRights</code></dt>
+<dd>A space-separted list of names of groups that should be allowed to
+access the object, under some means of publishing all or part of an
+archive. <code>public</code> should refer to unrestricted access.</dd>
+
+<h2>Please contribute!</h2>
 
-dc:alternative - alternative name
-dc:extent - size, duration, etc.
-dc:language - "en", "jbo", etc
-dc:license - licensing statement
-dc:accessRights - "public" or "private" (the latter being the default)
+The above are the conventions I have started to settle towards with
+the kinds of things I am using Ugarit archives for. If you use it for
+something else, please drop me a line and I'll be glad to help you
+choose a good schema, and publish the results here for others to share!

ADDED   docs/commands.wiki
Index: docs/commands.wiki
==================================================================
--- docs/commands.wiki
+++ docs/commands.wiki
@@ -0,0 +1,726 @@
+<h1>Ugarit command-line reference</h1>
+
+<h2>Your first backup</h2>
+
+Think of a tag to identify the filesystem you're backing up. If it's
+<code>/home</code> on the server <samp>gandalf</samp>, you might call it <samp>gandalf-home</samp>. If
+it's the entire filesystem of the server <samp>bilbo</samp>, you might just call
+it <samp>bilbo</samp>.
+
+Then from your shell, run (as root):
+
+<pre># ugarit snapshot <ugarit.conf> <nowiki>[-c] [-a]</nowiki> <tag> <path to root of filesystem></pre>
+
+For example, if we have a <code>ugarit.conf</code> in the current directory:
+
+<pre># ugarit snapshot ugarit.conf -c localhost-etc /etc</pre>
+
+Specify the <code>-c</code> flag if you want to store ctimes in the vault;
+since it's impossible to restore ctimes when extracting from an
+vault, doing this is useful only for informational purposes, so it's
+not done by default. Similarly, atimes aren't stored in the vault
+unless you specify <code>-a</code>, because otherwise, there will be a lot of
+directory blocks uploaded on every snapshot, as the atime of every
+file will have been changed by the previous snapshot - so with <code>-a</code>
+specified, on every snapshot, every directory in your filesystem will
+be uploaded! Ugarit will happily restore atimes if they are found in
+a vault; their storage is made optional simply because uploading
+them is costly and rarely useful.
+
+<h2>Exploring the vault</h2>
+
+Now you have a backup, you can explore the contents of the
+vault. This need not be done as root, as long as you can read
+<code>ugarit.conf</code>; however, if you want to extract files, run it as root
+so the uids and gids can be set.
+
+<pre>$ ugarit explore ugarit.conf</pre>
+
+This will put you into an interactive shell exploring a virtual
+filesystem. The root directory contains an entry for every tag; if you
+type <code>ls</code> you should see your tag listed, and within that
+tag, you'll find a list of snapshots, in descending date order, with a
+special entry <code>current</code> for the most recent
+snapshot. Within a snapshot, you'll find the root directory of your
+snapshot under <code>contents</code>, and the detailts of the snapshot itself in
+<code>propreties.sexpr</code>, and will be able to <code>cd</code> into
+subdirectories, and so on:
+
+<pre>> <b>ls</b>
+localhost-etc/ <tag>
+> <b>cd localhost-etc</b>
+/localhost-etc> <b>ls</b>
+current/ <snapshot>
+2015-06-12 22:49:34/ <snapshot>
+2015-06-12 22:49:25/ <snapshot>
+/localhost-etc> cd current
+/localhost-etc/current> ls
+log.sexpr <file>
+properties.sexpr <inline>
+contents/ <dir>
+/localhost-etc/current> <b>cat properties.sexpr</b>
+((previous . "a140e6dbe0a7a38f8b8c381323997c23e51a39e2593afb61")
+ (mtime . 1434102574.0)
+ (contents . "34eccf1f5141187e4209cfa354fdea749a0c3c1c4682ec86")
+ (stats (blocks-stored . 12)
+  (bytes-stored . 16889)
+  (blocks-skipped . 50)
+  (bytes-skipped . 6567341)
+  (file-cache-hits . 0)
+  (file-cache-bytes . 0))
+ (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
+ (hostname . "ahe")
+ (source-path . "/etc")
+ (notes)
+ (files . 112)
+ (size . 6563588))
+/localhost-etc/current> <b>cd contents</b>
+/localhost-etc/current/contents> <b>ls</b>
+zoneinfo <symlink>
+vconsole.conf <symlink>
+udev/ <dir>
+tmpfiles.d/ <dir>
+systemd/ <dir>
+sysctl.d/ <dir>
+sudoers.tmp~ <file>
+sudoers <file>
+subuid <file>
+subgid <file>
+static <symlink>
+ssl/ <dir>
+ssh/ <dir>
+shells <symlink>
+shadow- <file>
+shadow <file>
+services <symlink>
+samba/ <dir>
+rpc <symlink>
+resolvconf.conf <symlink>
+resolv.conf <file>
+-- Press q then enter to stop or enter for more...
+<b>q</b>
+/localhost-etc/current/contents> <b>ls -ll resolv.conf</b>
+-rw-r--r--     0     0 <nowiki>[2015-05-23 23:22:41]</nowiki> 78B/-: resolv.conf
+key: #f
+contents: "e33ea1394cd2a67fe6caab9af99f66a4a1cc50e8929d3550"
+size: 78
+ctime: 1432419761.0</pre>
+
+As well as exploring around, you can also extract files or directories
+(or entire snapshots) by using the <code>get</code> command. Ugarit
+will do its best to restore the metadata of files, subject to the
+rights of the user you run it as.
+
+Type <code>help</code> to get help in the interactive shell.
+
+The interactive shell supports command-line editing, history and tab
+completion for your convenience.
+
+<h2>Extracting things directly</h2>
+
+As well as using the interactive explore mode, it is also possible to
+directly extract something from the vault, given a path.
+
+Given the sample vault from the previous example, it would be possible
+to extract the <code>README.txt</code> file with the following
+command:
+
+<pre>$ ugarit extract ugarit.conf /Test/current/contents/README.txt</pre>
+
+<h2>Forking tags</h2>
+
+As mentioned above, you can fork a tag, creating two tags that
+refer to the same snapshot and its history but that can then have
+their own subsequent history of snapshots applied to each
+independently, with the following command:
+
+<pre>$ ugarit fork <ugarit.conf> <existing tag> <new tag></pre>
+
+<h2>Merging tags</h2>
+
+And you can also merge two or more tags into one. It's possible to
+merge a bunch of tags to make an entirely new tag, or you can merge a
+tag into an existing tag, by having the "output" tag also be one of
+the "input" tags.
+
+The command to do this is:
+
+<pre>$ ugarit merge <ugarit.conf> <output tag> <input tags...></pre>
+
+For instance, to import your classical music collection into your main
+musical collection, you might do:
+
+<pre>$ ugarit merge ugarit.conf my-music my-music classical-music</pre>
+
+Or if you want to create a new all-music archive from the archives
+bobs-music and petes-music, you might do:
+
+<pre>$ ugarit merge ugarit.conf all-music bobs-music petes-music</pre>
+
+<h2>Archive operations</h2>
+
+<h3>Importing</h3>
+
+To import some files into an archive, you must create a manifest file
+listing them, and their metadata. The manifest can also list
+metadata for the import as a whole, perhaps naming the source of the
+files, or the reason for importing them.
+
+The metadata for a file (or an import) is a series of named
+properties. The value of a property can be any Scheme value, written
+in Scheme syntax (with strings double-quoted unless they are to be
+interpreted as symbols), but strings and numbers are the most useful
+types.
+
+You can use whatever names you like for properties in metadata, but
+there are some that the system applies automatically, and an informal
+standard of sorts, which is documented in [docs/archive-schema.wiki].
+
+You can produce a manifest file by hand, or use the Ugarit Manifest
+Maker to produce one for you. You do this by installing it like so:
+
+<pre>$ chicken-install ugarit-manifest-maker</pre>
+
+And then running it, giving it any number of file and directory names
+on the command line. When given directories, it will recursively scan
+them to find all the files contained therein and put them in the
+manifest; it will not put directories in the manifest, although it is
+perfectly legal for you to do so when writing a manifest by hand. This
+is because the manifest maker can't do much useful analysis on a
+directory to suggest default metadata for them (so there isn't much
+point in using it), and it's far more useful for it to make it easy
+for you to import a large number of files individually by referencing
+the directory containing them.
+
+The manifest is sent to standard output, so you need to redirect it to
+a file, like so:
+
+<pre>$ ugarit-manifest-maker ~/music > music.manifest</pre>
+
+You can specify command-line options, as well. <code>-e PATTERN</code>
+or <code>--exclude=PATTERN</code> introduces a glob pattern for files
+to exclude from the manifest, and <code>-D KEY=VALUE</code> or
+<code>--define=KEY=VALUE</code> provides a property to be added to
+every file in the manifest (as opposed to an import property, that is
+part of the metadata of the overall import). Note that
+<code>VALUE</code> must be double-quoted if it's a string, as per
+Scheme value syntax.
+
+One might use this like so:
+
+<pre>$ ugarit-manifest-maker -e *.txt -D rating=5 ~/favourite-music > music.manifest</pre>
+
+The manifest maker simplifies the writing of manifests for files, by
+listing the files in manifest format along with useful metadata
+extracted from the filename and the file itself. For supported file
+types (currently, MP3 and OGG music files), it will even look inside
+the file to extract metadata.
+
+The manifest file it generates will contain lots of comments
+mentioning things it couldn't automatically analyse (such as unknown
+OGG/ID3 tags, or unknown types of files); and for metadata properties
+it thinks might be relevant but can't automatically provide, it
+suggests them with an empty property declaration, commented out. The
+idea is that, after generating a manifest, you read it by hand in a
+text editor to attempt to improve it.
+
+<h4>The format of a manifest file</h4>
+
+Manifest files have a relatively simple format. The are based on
+Scheme s-expressions, so can contain comments. From any semicolon (not
+in a string or otherwise quoted) to the end of the line is a comment,
+and <code>#;</code> in front of something comments out that something.
+
+Import metadata properties are specified like so:
+
+<pre>(KEY = VALUE)</pre>
+
+...where, as usual, <code>VALUE</code> must be double-quoted if it's a
+string.
+
+Files to import, with their metadata, are specified like so:
+
+<pre>(object "PATH OF FILE TO IMPORT"
+  (KEY = VALUE)
+  (KEY = VALUE)...
+)</pre>
+
+The closing parenthesis need not be on a line of its own, it's
+conventionally placed after the closing parenthesis of the final
+property.
+
+Ugarit, when importing the files in the manifest, will add the
+following properties if they are not already specified:
+
+<dl>
+<dt><code>import-path</code></dt>
+<dd>The path the file was imported from</dd>
+
+<dt><code>dc:format</code></dt>
+<dd>A guess at the file's MIME type, based on the extension</dd>
+
+<dt><code>mtime</code></dt>
+<dd>The file's modification time (as the number of seconds since the
+UNIX epoch)</dd>
+
+<dt><code>ctime</code></dt>
+<dd>The file's change time (as the number of seconds since the UNIX
+epoch)</dd>
+
+<dt><code>filename</code></dt>
+<dd>The name of the file, stripped of any directory components, and
+including the extension.</dd>
+
+</dl>
+
+The following properties are placed in the import metadata,
+automatically:
+
+<dl>
+<dt><code>hostname</code></dt>
+<dd>The hostname the import was performed on.</dd>
+
+<dt><code>manifest-path</code></dt>
+<dd>The path to the manifest file used for the import.</dd>
+
+<dt><code>mtime</code></dt>
+<dd>The time (in seconds since the UNIX epoch) at which the import was
+committed.</dd>
+
+<dt><code>stats</code></dt>
+<dd>A Scheme alist of statistics about the import (number of
+files/blocks uploaded, etc).</dd>
+</dl>
+
+So, to wrap that all up, here's a sample import manifest file:
+
+<verbatim>
+(notes = "A bunch of old CDs I've finally ripped")
+
+(object "/home/alaric/newrip/track01.mp3"
+  (filename = "track01.mp3")
+  (dc:format = "audio/mpeg")
+
+  (dc:publisher = "Go! Beat Records")
+  (dc:created = "1994")
+  (dc:contributor = "Portishead")
+  (dc:subject = "Trip-Hop")
+  (superset:size = 1)
+  (superset:index = 1)
+  (set:title = "Dummy")
+  (set:size = 11)
+  (set:index = 1)
+  (dc:creator = "Portishead")
+  (dc:title = "Wandering Star")
+
+  (mtime = 1428962299.0)
+  (ctime = 1428962299.0)
+  (file-size = 4703055))
+
+;;... and so on, for ten more MP3s on this CD, then several other CDs...
+</verbatim>
+
+<h4>Actually importing a manifest</h4>
+
+Well, when you finally have a manifest file, importing it is easy:
+
+<pre>$ ugarit import <ugarit.conf> <archive tag> <manifest path></pre>
+
+<h4>How do I change the metadata of an already-imported file?</h4>
+
+That's easy; the "current" metadata of a file is the metadata of its
+most recent. Just import the file again, in a new manifest, with new
+metadata, and it will overwrite the old. However, the old metadata is
+still preserved in the archive's history; tags forked from the archive
+tag before the second import will still see the original state of the
+archive, by design.
+
+<h3>Exploring</h3>
+
+Archives are visible in the explore interface. For instance, an import
+of some music I did looks like this:
+
+<pre>> <b>ls</b>
+localhost-etc/ &lt;tag>
+archive-tag/ &lt;tag>
+> <b>cd archive-tag</b>
+/archive-tag> <b>ls</b>
+history/ &lt;archive-history>
+/archive-tag> <b>cd history</b>
+/archive-tag/history> <b>ls</b>
+2015-06-12 22:53:13/ &lt;import>
+/archive-tag/history> <b>cd 2015-06-12 22:53:13</b>
+/archive-tag/history/2015-06-12 22:53:13> <b>ls</b>
+log.sexpr &lt;file>
+properties.sexpr &lt;inline>
+manifest/ &lt;import-manifest>
+/archive-tag/history/2015-06-12 22:53:13> <b>cat properties.sexpr</b>
+((stats (blocks-stored . 2046)
+        (bytes-stored . 1815317503)
+        (blocks-skipped . 9)
+        (bytes-skipped . 8388608)
+        (file-cache-hits . 0)
+        (file-cache-bytes . 0))
+ (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
+ (mtime . 1434135993.0)
+ (contents . "fcdd5b996914fdcac1e8a6cfbc67663e08f6eaf0cc952e21")
+ (hostname . "ahe")
+ (notes . "A bunch of music, imported as a demo")
+ (manifest-path . "/home/alaric/tmp/test.manifest"))
+/archive-tag/history/2015-06-12 22:53:13> <b>cd manifest</b>
+/archive-tag/history/2015-06-12 22:53:13/manifest> <b>ls</b>
+1d4269099189234eefeb80b95370eaf280730cf4d591004d:03 The Lemon Song.mp3 &lt;file>
+7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3 &lt;file>
+64092fa12c2800dda474b41e5ebe8c948f39a59ee91c120b:09 How Many More Times.mp3 &lt;file>
+1d79148d1e1e8947c50b44cf2d5690588787af328e82eeef:2-07 Going to California.mp3 &lt;file>
+e3685148d0d12213074a9fdb94a00e05282aeabe77fa60d5:1-01 You Shook Me.mp3 &lt;file>
+d73904f371af8d7ca2af1076881230f2dc1c2cf82416880a:03 Strangers.mp3 &lt;file>
+9c5a0efb7d397180a1e8d42356d8f04c6c26a83d3b05d34a:09 Uptight.mp3 &lt;file>
+01a069aec2e731e18fcdd4ecb0e424f346a2f0e16910f5e9:07 Numb.mp3 &lt;file>
+7ea1ab7fbd525c40e21d6dd25130e8c70289ad56c09375b0:08 She.mp3 &lt;file>
+009dacd8f3185b7caeb47050002e584ab86d08cf9e9aceec:1-03 Communication Breakdown.mp3 &lt;file>
+26d264d629e22709f664ed891741f690900d45cd4fd44326:1-03 Dazed and Confused.mp3 &lt;file>
+d879761195faf08e4e95a5a2398ea6eefb79920710bfeab6:1-10 Band Introduction _ How Many More Times.mp3 &lt;file>
+83244601db42677d110fc8522c6a3cbbc1f22966a779f876:06 All My Love.mp3 &lt;file>
+5eebee9a2ad79d04e4f69e9e2a92c4e0a8d5f21e670f89da:07 Tangerine.mp3 &lt;file>
+dd6f1203b5973ecd00d2c0cee18087030490230727591746:2-08 That's the Way.mp3 &lt;file>
+c0acea15aa27a6dd1bcaff1c13d4f3d741a40a46abeca3fc:04 The Crunge.mp3 &lt;file>
+ea7727ad07c6c82e5c9c7218ee1b059cd78264c131c1438d:1-02 I Can't Quit You Baby.mp3 &lt;file>
+10fda5f46b8f505ca965bcaf12252eedf5ab44514236f892:14 F.O.D..mp3 &lt;file>
+a99ca9af5a83bde1c676c388dc273051defa88756df26e95:1-03 Good Times Bad Times.mp3 &lt;file>
+b5d7cfe9808c7fc0dedbd656d44e4c56159cbd3c2ed963bb:1-15 Stairway to Heaven.mp3 &lt;file>
+79c87e3c49ffdac175c95aae071f63d3a9efdf2ddb84998c:08.Batmilk.ogg &lt;file>
+-- Press q then enter to stop or enter for more...
+q
+/archive-tag/history/2015-06-12 22:53:13/manifest> <b>ls -ll 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3</b>
+-r--------     -     - <nowiki>[2015-04-13 21:46:39]</nowiki> -/-: 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
+key: #f
+contents: "7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
+import-path: "/home/alaric/archive/sorted-music/Led Zeppelin/Led Zeppelin/04 Dazed and Confused.mp3"
+filename: "04 Dazed and Confused.mp3"
+dc:format: "audio/mpeg"
+dc:publisher: "Atlantic"
+dc:subject: "Classic Rock"
+dc:title: "Dazed and Confused"
+dc:creator: "Led Zeppelin"
+dc:created: "1982"
+dc:contributor: "Led Zeppelin"
+set:title: "Led Zeppelin"
+set:index: 4
+set:size: 9
+superset:index: 1
+superset:size: 1
+ctime: 1428957999.0
+file-size: 15448903
+</pre>
+
+<h3>Searching</h3>
+
+However, the explore interface to an archive is far from pleasant. You
+need to go to the correct import, and find your file by name, and then
+identify it with a big long name composed of its hash and the original
+filename to find its properties and extract.
+
+I hope to add property-based searching to explore mode in future
+(which is why you need to go into a <code>history</code> directory
+within the archive directory, as other ways of exploring the archive
+will appear alongside). This will be particularly useful when the
+explore-mode virtual filesystem is mounted over 9P!
+
+However, even that interface, being constrained to look like a
+filesystem, will be limited. The <code>ugarit</code> command-line tool
+provides a very powerful search interface that exposes the full power
+of the archive metadata.
+
+<h4>Metadata filters</h4>
+
+Files (and directories) in an archive can be searched for using
+"metadata filters", which are descriptions of what you're looking for
+that the computer can understand. They are represented as Scheme
+s-expressions, and can be made up of the following components:
+
+<dl>
+<dt><code>#t</code></dt>
+<dd>This filter matches everything. It's not very useful.</dd>
+
+<dt><code>#f</code></dt>
+<dd>This filter matches nothing. It's not very useful.</dd>
+
+<dt><code>(and FILTER FILTER...)</code></dt>
+<dd>This filter matches files for which all of the inner filters match.</dd>
+
+<dt><code>(or FILTER FILTER...)</code></dt>
+<dd>This filter matches files for which any of the inner filters match.</dd>
+
+<dt><code>(not FILTER)</code></dt>
+<dd>This filter matches files which do not match the inner filter.</dd>
+
+<dt><code>(= ($ PROP) VALUE)</code></dt>
+<dd>This filter matches files which have the given
+<code>PROP</code>erty equal to that <code>VALUE</code> in their metadata.</dd>
+
+<dt><code>(= key HASH)</code></dt>
+<dd>This filter matches the file with the given hash.</dd>
+
+<dt><code>(= ($import PROP) VALUE)</code></dt>
+<dd>This filter matches files which have the given
+<code>PROP</code>erty equal to that <code>VALUE</code> in the metadata
+of the import that last imported them.</dd>
+</dl>
+
+<h4>Searching an archive</h4>
+
+For a start, you can search for files matching a given metadata filter
+in a given archive. This is done with:
+
+<pre>$ ugarit search <ugarit.conf> <archive tag> <filter></pre>
+
+For instance, let's look for music by Led Zeppelin:
+
+<pre>$ ugarit search ugarit.conf music '(or
+   (= ($ dc:creator) "Led Zeppelin")
+   (= ($ dc:contributor) "Led Zeppelin"))'</pre>
+
+The result looks like the explore-mode view of an archive manifest,
+listing the file's hash followed by its title and extension:
+
+<verbatim>
+7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
+834a1619a59835e0c27b22801e3c829b40be583dadd19770:2-08 No Quarter.mp3
+9e8bc4954838bd9c671f275eb48595089257185750d63894:1-12 I Can't Quit You Baby.mp3
+6742b3bebcdd9cae5ec5403c585935403fa74d16ed076cf2:02 Friends (1).mp3
+07d161f4bd684e283f7f2cf26e0b732157a8e95ef66939c3:05 Carouselambra.mp3
+[...]
+</verbatim>
+
+What of all our lovely metadata? You can view that if you add the word
+"verbose" to the end of the command line, which allows you to specify
+alternate output formats:
+
+<pre>$ ugarit search ugarit.conf music '(or
+   (= ($ dc:creator) "Led Zeppelin")
+   (= ($ dc:contributor) "Led Zeppelin"))' verbose</pre>
+
+Now the output looks like:
+
+<verbatim>
+object a444ff6ef807b080b536155f58d246d633cab4a0eabef5bf
+        (ctime = 1428958660.0)
+        (dc:contributor = "Led Zeppelin")
+        (dc:created = "2008")
+        (dc:creator = "Led Zeppelin")
+[... all the usual file properties omitted ...]
+        import a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c
+                (stats = ((blocks-stored . 2046) (bytes-stored . 1815317503) (blocks-skipped . 9) (bytes-skipped . 8388608) (file-cache-hits . 0) (file-cache-bytes . 0)))
+                (log = "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
+[... all the usual import properties omitted ...]
+object b4cadf48b2c07ccf0303fc4064b292cb222980b0d4223641
+        (ctime = 1428958673.0)
+        (dc:contributor = "Led Zeppelin")
+        (dc:created = "2008")
+        (dc:creator = "Led Zeppelin")
+        (dc:creator = "Jimmy Page/John Paul Jones/Robert Plant")
+[...and so on...]
+</verbatim>
+
+As you can see, it lists the hash of each file, its metadata, the hash
+of the import that last imported it, and the metadata of that import.
+
+That's quite verbose, so you'd probably be wanting to take that as
+input to another program to do something nicer with it. But it's laid
+out for human reading, not for machine parsing. Thankfully, we have
+other formats for that, <code>alist</code> and
+<code>alist-with-imports</code>.
+
+Try this:
+
+<pre>$ ugarit search ugarit.conf music '(or
+   (= ($ dc:creator) "Led Zeppelin")
+   (= ($ dc:contributor) "Led Zeppelin"))' alist</pre>
+
+This outputs one Scheme s-expression list per match, the first element
+of which is the hash as a string, the rest of which is an alist of properties:
+
+<verbatim>
+("7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
+ (ctime . 1428957999.0)
+ (dc:contributor . "Led Zeppelin")
+ (dc:created . "1982")
+ (dc:creator . "Led Zeppelin")
+[... elided file properties ...]
+ (superset:index . 1)
+ (superset:size . 1))
+("77c960d09eb21ed72e434ddcde0bd3781a4f3d6ee7a6eb66"
+ (ctime . 1428958981.0)
+ (dc:contributor . "Led Zeppelin")
+[...]
+</verbatim>
+
+<pre>$ ugarit search ugarit.conf music '(or
+   (= ($ dc:creator) "Led Zeppelin")
+   (= ($ dc:contributor) "Led Zeppelin"))' alist-with-imports</pre>
+
+This outputs one s-expression per list per match, with four
+elements. The first is the key string, the second is an alist of file
+properties, the third is the import's hash, and the last is an alist
+containing the import's properties. It looks like:
+
+<verbatim>
+("64fa08a0080aee6ef501c408fd44dfcc634cfcafd8006fc4"
+ ((ctime . 1428958683.0)
+  (dc:contributor . "Led Zeppelin")
+  (dc:created . "2008")
+  (dc:creator . "Led Zeppelin")
+[... elided file properties ...]
+  (superset:index . 1)
+  (superset:size . 1))
+ "a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c"
+ ((stats (blocks-stored . 2046)
+         (bytes-stored . 1815317503)
+[... elided manifest properties ...]
+  (manifest-path . "test.manifest")))
+("4cd56f916a63399b252976e842dcae0b87f058b5a60c93a4"
+ ((ctime . 1428958437.0)
+  (dc:contributor . "Led Zeppelin")
+[...]
+</verbatim>
+
+And finally, you might just want to get the hashes of matching files
+(which are particularly useful for extraction operations, which we'll
+come to next). To do this, specify a format of "keys", which outputs
+one line per match, containing just the hash:
+
+<pre>$ ugarit search ugarit.conf music '(or
+   (= ($ dc:creator) "Led Zeppelin")
+   (= ($ dc:contributor) "Led Zeppelin"))' keys</pre>
+
+<verbatim>
+ce6f6484337de772de9313038cb25d1b16e28028136cc291
+6af5c664cbfa1acb22a377e97aee35d94c0fc003d239dd0c
+92e91e79b384478b5aab31bf1b2ff9e25e7e2c4b48575185
+6ddb9a41d4968468a904f05ecf7e0e73d2c7c7ad76bc394b
+a074dddcef67cd93d92c6ffce845894aa56594674023f6e1
+4f65f735bbb00a6fda4bc887b370b3160f55e5e07ec37ffa
+97cc8b8ba70c39387fc08ef62311b751aea4340d636eb421
+72358dbe3eb60da42eadcf6de325b2a6686f4e17ea41fa60
+[...]
+</verbatim>
+
+However, to write filter expressions, you need to know what properties
+you have available to search on. You might remember, or go for
+standard properties, or look at existing files in verbose mode to find
+some; but you can also just ask Ugarit what properties it has in an
+archive, like so:
+
+<pre>$ ugarit search-props <ugarit.conf> <archive tag></pre>
+
+You can even ask what properties are available for files matching an
+existing filter:
+
+<pre>$ ugarit search-props <ugarit.conf> <archive tag> <filter></pre>
+
+This is useful if you're interested in further narrowing down a
+filter, and so only care about properties that files already matching
+that filter have.
+
+For a bunch of music files imported with the Ugarit Manifest Maker,
+you can expect to see something like this:
+
+<verbatim>
+ctime
+dc:contributor
+dc:created
+dc:creator
+dc:format
+dc:publisher
+dc:subject
+dc:title
+file-size
+filename
+import-path
+mtime
+set:index
+set:size
+set:title
+superset:index
+superset:size
+</verbatim>
+
+Now you know what properties to search, next you'll be wanting to know
+what values to look for. Again, Ugarit has a command to query the
+available values of any given property:
+
+<pre>$ ugarit search-values <ugarit.conf> <archive tag> <property></pre>
+
+And you can limit that just to files matching a given filter:
+
+<pre>$ ugarit search-values <ugarit.conf> <archive tag> <filter> <property></pre>
+
+The resulting list of values is ordered by popularity, so the most
+widely-used values will be listed first. Let's see what genres of
+music were in my sample of music files I imported:
+
+<pre>$ ugarit search-values test.conf archive-tag dc:subject</pre>
+
+The result is:
+
+<verbatim>
+Classic Rock
+Alternative & Punk
+Electronic
+Trip-Hop
+</verbatim>
+
+Ok, let's now use a filter to find out what artists
+(<code>dc:creator</code>) I have that made Trip-Hop music (what even
+IS that?):
+
+<pre>$ ugarit search-values test.conf archive-tag \
+    '(= ($ dc:subject) "Trip-Hop")' \
+    dc:creator</pre>
+
+The result is:
+
+<verbatim>Portishead</verbatim>
+
+Ah, OK, now I know what "Trip-Hop" is.
+
+<h3>Extracting</h3>
+
+All this searching is lovely, but what it gets us, in the end, is a
+bunch of file hashes. Perhaps we might want to actually play some
+music, or look at a photo, or something. To do that, we need to
+extract from the archive.
+
+We've already seen the contents of an archive in the explore mode
+virtual filesystem, so we could go into the archive history, find the
+import, go into the manifest, pick the file out there, and use
+<code>get</code> to extract it, but that would be yucky. Thankfully,
+we have a command-line interface to get things from archives, in one
+of two ways.
+
+Firstly, we can extract a file (or a directory tree) from an archive,
+out into the local filesystem:
+
+<pre>$ ugarit archive-extract <ugarit.conf> <archive tag> <hash> <target></pre>
+
+The "target" is the name to give it in the local filesystem. We could
+pull out that Led Zeppelin song from our search results above, like so:
+
+<pre>$ ugarit archive-extract test.conf archive-tag \
+    ce6f6484337de772de9313038cb25d1b16e28028136cc291 foo.mp3</pre>
+
+We now have a foo.mp3 file in the current directory.
+
+However, sometimes it would be nicer to have it streamed to standard
+output, which can be done like so:
+
+<pre>$ ugarit archive-stream <ugarit.conf> <archive tag> <hash></pre>
+
+This lets us write a command such as:
+
+<pre>$ ugarit archive-stream test.conf archive-tag \
+    ce6f6484337de772de9313038cb25d1b16e28028136cc291 | mpg123 -</pre>
+
+...to play it in real time.
+

ADDED   docs/dot-ugarit.wiki
Index: docs/dot-ugarit.wiki
==================================================================
--- docs/dot-ugarit.wiki
+++ docs/dot-ugarit.wiki
@@ -0,0 +1,70 @@
+<h1><code>.ugarit</code> files</h1>
+
+By default, Ugarit will vault everything it finds in the filesystem
+tree you tell it to snapshot. However, this might not always be
+desired; so we provide the facility to override this with <code>.ugarit</code>
+files, or global rules in your <code>.conf</code> file.
+
+Note: All of this only applies to snapshots. Archive mode imports are
+not affected by <code>.ugarit</code> files, or global rules.
+
+Note: The syntax of these files is provisional, as I want to
+experiment with usability, as the current syntax is ugly. So please
+don't be surprised if the format changes in incompatible ways in
+subsequent versions!
+
+In quick summary, if you want to ignore all files or directories
+matching a glob in the current directory and below, put the following
+in a <code>.ugarit</code> file in that directory:
+
+<pre>(* (glob "*~") exclude)</pre>
+
+You can write quite complex expressions as well as just globs. The
+full set of rules is:
+
+   *  <code>(glob "<em>pattern</em>")</code> matches files and directories whose names
+  match the glob pattern
+
+   *  <code>(name "<em>name</em>")</code> matches files and directories with exactly that
+  name (useful for files called <code>*</code>...)
+
+   *  <code>(modified-within <em>number</em> seconds)</code> matches files and
+  directories modified within the given number of seconds
+
+  *  <code>(modified-within <em>number</em> minutes)</code> matches files and
+  directories modified within the given number of minutes
+
+  *  <code>(modified-within <em>number</em> hours)</code> matches files and directories
+  modified within the given number of hours
+
+  *  <code>(modified-within <em>number</em> days)</code> matches files and directories
+  modified within the given number of days
+
+  *  <code>(not <em>rule</em>)</code> matches files and directories that do not match
+  the given rule
+
+  *  <code>(and <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
+  all the given rules
+
+  *  <code>(or <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
+  any of the given rules
+
+Also, you can override a previous exclusion with an explicit include
+in a lower-level directory:
+
+<pre>(* (glob "*~") include)</pre>
+
+You can bind rules to specific directories, rather than to "this
+directory and all beneath it", by specifying an absolute or relative
+path instead of the `*`:
+
+<pre>("/etc" (name "passwd") exclude)</pre>
+
+If you use a relative path, it's taken relative to the directory of
+the <code>.ugarit</code> file.
+
+You can also put some rules in your <code>.conf</code> file, although relative
+paths are illegal there, by adding lines of this form to the file:
+
+<pre>(rule * (glob "*~") exclude)</pre>
+

ADDED   docs/faq.wiki
Index: docs/faq.wiki
==================================================================
--- docs/faq.wiki
+++ docs/faq.wiki
@@ -0,0 +1,40 @@
+<h1>Questions and Answers</h1>
+
+<h2>What happens if a snapshot is interrupted?</h2>
+
+Nothing! Whatever blocks have been uploaded will be uploaded, but the
+snapshot is only added to the tag once the entire filesystem has been
+snapshotted. So just start the snapshot again. Any files that have
+already be uploaded will then not need to be uploaded again, so the
+second snapshot should proceed quickly to the point where it failed
+before, and continue from there.
+
+Unless the vault ends up with a partially-uploaded corrupted block
+due to being interrupted during upload, you'll be fine. The filesystem
+backend has been written to avoid this by writing the block to a file
+with the wrong name, then renaming it to the correct name when it's
+entirely uploaded.
+
+Actually, there is *one* caveat: blocks that were uploaded, but never
+make it into a finished snapshot, will be marked as "referenced" but
+there's no snapshot to delete to un-reference them, so they'll never
+be removed when you delete snapshots. (Not that snapshot deletion is
+implemented yet, mind). If this becomes a problem for people, we could
+write a "garbage collect" tool that regenerates the reference counts
+in a vault, leading to unused blocks (with a zero refcount) being
+unlinked.
+
+<h2>Should I share a single large vault between all my filesystems?</h2>
+
+I think so. Using a single large vault means that blocks shared
+between servers - eg, software installed from packages and that sort
+of thing - will only ever need to be uploaded once, saving storage
+space and upload bandwidth. However, do not share a vault between
+servers that do not mutually trust each other, as they can all update
+the same tags, so can meddle with each other's snapshots - and read
+each other's snapshots.
+
+<h3>CAVEAT</h3>
+
+It's not currently safe to have multiple concurrent snapshots to the
+same split log backend; this will soon be fixed, however.

ADDED   docs/installation.wiki
Index: docs/installation.wiki
==================================================================
--- docs/installation.wiki
+++ docs/installation.wiki
@@ -0,0 +1,342 @@
+<h1>Installation</h1>
+
+Install [http://www.call-with-current-continuation.org/|Chicken Scheme] using their [http://wiki.call-cc.org/man/4/Getting%20started|installation instructions].
+
+Ugarit can then be installed by typing (as root):
+
+    chicken-install ugarit
+
+See the [http://wiki.call-cc.org/manual/Extensions#chicken-install-reference|chicken-install manual] for details if you have any trouble, or wish to install into your home directory.
+
+<h1>Setting up a vault</h1>
+
+Firstly, you need to know the vault identifier for the place you'll
+be storing your vaults. This depends on your backend. The vault
+identifier is actually the command line used to invoke the backend for
+a particular vault; communication with the vault is via standard
+input and output, which is how it's easy to tunnel via ssh.
+
+<h2>Local filesystem backends</h2>
+
+These backends use the local filesystem to store the vaults. Of
+course, the "local filesystem" on a given server might be an NFS mount
+or mounted from a storage-area network.
+
+<h3>Logfile backend</h3>
+
+The logfile backend works much like the original Venti system. It's
+append-only - you won't be able to delete old snapshots from a logfile
+vault, even when I implement deletion. It stores the vault in two
+sets of files; one is a log of data blocks, split at a specified
+maximum size, and the other is the metadata: an sqlite database used
+to track the location of blocks in the log files, the contents of
+tags, and a count of the logs so a filename can be chosen for a new one.
+
+To set up a new logfile vault, just choose where to put the two
+parts. It would be nice to put the metadata file on a different
+physical disk to the logs directory, to reduce seeking. If you only
+have one disk, you can put the metadata file in the log directory
+("metadata" is a good name).
+
+You can then refer to it using the following vault identifier:
+
+      "backend-fs splitlog ...log directory... ...metadata file..."
+
+<h3>SQLite backend</h3>
+
+The sqlite backend works a bit like a
+[http://www.fossil-scm.org/|Fossil] repository; the storage is
+implemented as a single file, which is actually an SQLite database
+containing blocks as blobs, along with tags and configuration data in
+their own tables.
+
+It supports unlinking objects, and the use of a single file to store
+everything is convenient; but storing everything in a single file with
+random access is slightly riskier than the simple structure of an
+append-only log file; it is less tolerant of corruption, which can
+easily render the entire storage unusable. Also, that one file can get
+very large.
+
+SQLite has internal limits on the size of a database, but they're
+quite large - you'll probably hit a size limit at about 140
+terabytes.
+
+To set up an SQLite storage, just choose a place to put the file. I
+usually use an extension of <code>.vault</code>; note that SQLite will
+create additional temporary files alongside it with additional
+extensions, too.
+
+Then refer to it with the following vault identifier:
+
+      "backend-sqlite ...path to vault file..."
+
+<h3>Filesystem backend</h3>
+
+The filesystem backend creates vaults by storing each block or tag
+in its own file, in a directory. To keep the objects-per-directory
+count down, it'll split the files into subdirectories. Because of
+this, it uses a stupendous number of inodes (more than the filesystem
+being backed up). Only use it if you don't mind that; splitlog is much
+more efficient.
+
+To set up a new filesystem-backend vault, just create an empty
+directory that Ugarit will have write access to when it runs. It will
+probably run as root in order to be able to access the contents of
+files that aren't world-readable (although that's up to you), so
+unless you access your storage via ssh or sudo to use another user to
+run the backend under, be careful of NFS mounts that have
+<code>maproot=nobody</code> set!
+
+You can then refer to it using the following vault identifier:
+
+      "backend-fs fs ...path to directory..."
+
+<h2>Proxying backends</h2>
+
+These backends wrap another vault identifier which the actual
+storage task is delegated to, but add some value along the way.
+
+<h2>SSH tunnelling</h2>
+
+It's easy to access a vault stored on a remote server. The caveat
+is that the backend then needs to be installed on the remote server!
+Since vaults are accessed by running the supplied command, and then
+talking to them via stdin and stdout, the vault identified needs
+only be:
+
+      "ssh ...hostname... '...remote vault identifier...'"
+
+<h2>Cache backend</h2>
+
+The cache backend is used to cache a list of what blocks exist in the
+proxied backend, so that it can answer queries as to the existance of
+a block rapidly, even when the proxied backend is on the end of a
+high-latency link (eg, the Internet). This should speed up snapshots,
+as existing files are identified by asking the backend if the vault
+already has them.
+
+The cache backend works by storing the cache in a local sqlite
+file. Given a place for it to store that file, usage is simple:
+
+      "backend-cache ...path to cachefile... '...proxied vault identifier...'"
+
+The cache file will be automatically created if it doesn't already
+exist, so make sure there's write access to the containing directory.
+
+ - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
+
+If you use a cache on a vault shared between servers, make sure
+that you either:
+
+  *  Never delete things from the vault
+
+or
+
+  *  Make sure all access to the vault is via the same cache
+
+If a block is deleted from a vault, and a cache on that vault is
+not aware of the deletion (as it did not go "through" the caching
+proxy), then the cache will record that the block exists in the
+vault when it does not. This will mean that if a snapshot is made
+through the cache that would use that block, then it will be assumed
+that the block already exists in the vault when it does
+not. Therefore, the block will not be uploaded, and a dangling
+reference will result!
+
+Some setups which *are* safe:
+
+  *  A single server using a vault via a cache, not sharing it with
+   anyone else.
+
+  *  A pool of servers using a vault via the same cache.
+
+  *  A pool of servers using a vault via one or more caches, and
+   maybe some not via the cache, where nothing is ever deleted from
+   the vault.
+
+  *  A pool of servers using a vault via one cache, and maybe some
+   not via the cache, where deletions are only performed on servers
+   using the cache, so the cache is always aware.
+
+<h1>Writing a <code>ugarit.conf</code></h1>
+
+<code>ugarit.conf</code> should look something like this:
+
+<verbatim>(storage <vault identifier>)
+(hash tiger "<salt>")
+[double-check]
+[(compression [deflate|lzma])]
+[(encryption aes <key>)]
+[(cache "<path>")|(file-cache "<path>")]
+[(rule ...)]</verbatim>
+
+<h2>Hashing</h2>
+
+The hash line chooses a hash algorithm. Currently Tiger-192
+(<code>tiger</code>), SHA-256 (<code>sha256</code>), SHA-384
+(<code>sha384</code>) and SHA-512 (<code>sha512</code>) are supported;
+if you omit the line then Tiger will still be used, but it will be a
+simple hash of the block with the block type appended, which reveals
+to attackers what blocks you have (as the hash is of the unencrypted
+block, and the hash is not encrypted). This is useful for development
+and testing or for use with trusted vaults, but not advised for use
+with vaults that attackers may snoop at. Providing a salt string
+produces a hash function that hashes the block, the type of block, and
+the salt string, producing hashes that attackers who can snoop the
+vault cannot use to find known blocks (see the "Security model"
+section below for more details).
+
+I would recommend that you create a salt string from a secure entropy
+source, such as:
+
+<pre>dd if=/dev/random bs=1 count=64 | base64 -w 0</pre>
+
+Whichever hash function you use, you will need to install the required
+Chicken egg with one of the following commands:
+
+<pre>chicken-install -s tiger-hash  # for tiger
+chicken-install -s sha2        # for the SHA hashes</pre>
+
+<h2>Compression</h2>
+
+<code>lzma</code> is the recommended compression option for
+low-bandwidth backends or when space is tight, but it's very slow to
+compress; deflate or no compression at all are better for fast local
+vaults. To have no compression at all, just remove the
+<code>(compression ...)</code> line entirely. Likewise, to use
+compression, you need to install a Chicken egg:
+
+<pre>chicken-install -s z3       # for deflate
+chicken-install -s lzma     # for lzma</pre>
+
+WARNING: The lzma egg is currently rather difficult to install, and
+needs rewriting to fix this problem.
+
+<h2>Encryption</h2>
+
+Likewise, the <code>(encryption ...)</code> line may be omitted to have no
+encryption; the only currently supported algorithm is aes (in CBC
+mode) with a key given in hex, as a passphrase (hashed to get a key),
+or a passphrase read from the terminal on every run. The key may be
+16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
+hex key, just supply it as a string, like so:
+
+<pre>(encryption aes "00112233445566778899AABBCCDDEEFF")</pre>
+
+...for 128-bit AES,
+
+<pre>(encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")</pre>
+
+...for 192-bit AES, or
+
+<pre>(encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")</pre>
+
+...for 256-bit AES.
+
+Alternatively, you can provide a passphrase, and specify how large a
+key you want it turned into, like so:
+
+<pre>(encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))</pre>
+
+I would recommend that you generate a long passphrase from a secure
+entropy source, such as:
+
+<pre>dd if=/dev/random bs=1 count=64 | base64 -w 0</pre>
+
+Finally, the extra-paranoid can request that Ugarit prompt for a
+passphrase on every run and hash it into a key of the specified
+length, like so:
+
+<pre>(encryption aes ([16|24|32] prompt))</pre>
+
+(note the lack of quotes around <code>prompt</code>, distinguishing it from a passphrase)
+
+Please read the [./security.wiki|Security model] documentationfor
+details on the implications of different encryption setups.
+
+Again, as it is an optional feature, to use encryption, you must
+install the appropriate Chicken egg:
+
+<pre>chicken-install -s aes</pre>
+
+<h2>Caching</h2>
+
+Ugarit can use a local cache to speed up various operations. If a path
+to a file is provided through the <code>cache</code> or
+<code>file-cache</code> directives, then a file will be created at
+that location and used as a cache. If not, then a default path of
+<code>~/.ugarit-cache</code> will be used instead.
+
+WARNING: If you use multiple different vaults from the same UNIX
+account, and the same tag names are used in those different vaults,
+and you use the default cache path (or explicitly specify cache paths
+that point to the same file), you will get a somewhat confused
+cache. The effects of this will be annoying (searches finding things
+that then can't be fetched) rather than damaging, but it's still best
+avoided!
+
+The cache is used to cache snapshot records and archive import
+records. This is used by operations that extract snapshot history and
+archive objects; snapshots are stored in a linked list of snapshot
+objects, each referring to the previous snapshot. Therefore, reading
+the history of a snapshot tag requires reading many objects from the
+storage, which can be time-consuming for a remote storage! Similarly,
+archives are represented as a linked list of imports, and searching
+for an object in the archive can involve traversing the chain of
+imports until a match is found (and then searching on until the end to
+see if any further matches can be found!). The cache is even more
+important for archive imports, as it not only keeps a local copy of
+all the import information, it also records the "current" metadata of
+every object in the archive (so that we don't need to search through
+superceded previous versions of the metadata of an object when looking
+for something), and uses B-tree indexes to enable fast searching of
+the cached metadata.
+
+If you configure the cache path with <code>file-cache</code> rather
+than just <code>cache</cache>, then as well as the snapshot/archive
+metadata caching, you will also enable file hash caching.
+
+This significantly speeds up subsequent snapshots of a filesystem
+tree. The file cache maps filenames to (mtime,size,hash) tuples; as it
+scans the filesystem, if it finds a file in the cache and the mtime
+and size have not changed, it will assume it is already stored under
+the specified hash. This saves it from having to read the entire file
+to hash it and then check if the hash is present in the vault. In
+other words, if only a few files have changed since the last snapshot,
+then snapshotting a directory tree becomes an O(N) operation, where N
+is the number of files, rather than an O(M) operation, where M is the
+total size of files involved.
+
+WARNING: If you use a file cache, and a file is cached in it but then
+subsequently deleted from the vault, Ugarit will fail to re-upload it
+at the next snapshot. If you are using a file cache and you go
+deleting things from your vault (should that be implemented in
+future), you'll want to flush the cache afterwards. We might implement
+automatic removal of deleted files from the local cache, but file
+caches on other Ugarit installations that use the same vault will not
+be aware of the deletion.
+
+<h2>Other options</h2>
+
+<code>double-check</code>, if present, causes Ugarit to perform extra
+internal consistency checks during backups, which will detect bugs but
+may slow things down.
+
+<h2>Example</h2>
+
+For example:
+
+<pre>(storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata'")
+(hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
+(encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
+(compression lzma)
+(file-cache "/var/ugarit/cache")</pre>
+
+Be careful to put a set of parentheses around each configuration
+entry. White space isn't significant, so feel free to indent things
+and wrap them over lines if you want.
+
+Keep copies of this file safe - you'll need it to do extractions!
+Print a copy out and lock it in your fire safe! Ok, currently, you
+might be able to recreate it if you remember where you put the
+storage, but encryption keys and hash salts are harder to remember...

ADDED   docs/intro.wiki
Index: docs/intro.wiki
==================================================================
--- docs/intro.wiki
+++ docs/intro.wiki
@@ -0,0 +1,375 @@
+<h1>About Ugarit</h1>
+
+<h2>What's content-addressible storage?</h2>
+
+Traditional backup systems work by storing copies of your files
+somewhere. Perhaps they go onto tapes, or perhaps they're in archive
+files written to disk. They will either be full dumps, containing a
+complete copy of your files, or incrementals or differentials, which
+only contain files that have been modified since some point. This
+saves making repeated copies of unchanging files, but it means that to
+do a full restore, you need to start by extracting the last full dump
+then applying one or more incrementials, or the latest differential,
+to get the latest state.
+
+Not only do differentials and incrementals let you save space, they
+also give you a history - you can restore up to a previous point in
+time, which is invaluable if the file you want to restore was deleted
+a few backup cycles ago!
+
+This technology was developed when the best storage technology for
+backups was magnetic tape, because each dump is written sequentially
+(and restores are largely sequential, unless you're skipping bits to
+pull out specific files).
+
+However, these days, random-access media such as magnetic disks and
+SSDs are cheap enough to compete with magnetic tape for long-term bulk
+storage (especially when one considers the cost of a tape drive or
+two). And having fast random access means we can take advantage of
+different storage techniques.
+
+A content-addressible store is a key-value store, except that the keys
+are always computed from the values. When a given object is stored, it
+is hashed, and the hash used as the key. This means you can never
+store the same object twice; the second time you'll get the same hash,
+see the object is already present, and re-use the existing
+copy. Therefore, you get deduplication of your data for free.
+
+But, I hear you ask, how do you find things again, if you can't choose
+the keys?
+
+When an object is stored, you need to record the key so you can find
+it again later. In Ugarit, everything is stored in a tree-like
+directory structure. Files are uploaded and their hashes obtained, and
+then a directory object is constructed containing a list of the files
+in the directory, and listing the key of the Ugarit objects storing
+the contents of each file. This directory object itself has a hash,
+which is stored inside the directory entry in the parent directory,
+and so on up to the root. The root of a tree stored in a Ugarit vault
+has no parent directory to contain it, so at that point, we store the
+key of the root in a named "tag" that we can look up by name when we
+want it.
+
+Therefore, everything in a Ugarit vault can be found by starting with
+a named tag and retrieving the object whose key it contains, then
+finding keys inside that object and looking up the objects they refer
+to, until we find the object we want.
+
+When you use Ugarit to back up your filesystem, it uploads a complete
+snapshot of every file in the filesystem, like a full dump. But
+because the vault is content-addressed, it automatically avoids
+uploading anything it already has a copy of, so all we upload is an
+incremental dump - but in the vault, it looks like a full dump, and so
+can be restored on its own without having to restore a chain of incrementals.
+
+Also, the same storage can be shared between multiple systems that all
+back up to it - and the incremental upload algorithm will mean that
+any files shared between the servers will only need to be uploaded
+once. If you back up a complete server, than go and back up another
+that is running the same distribution, then all the files in <tt>/bin</tt>
+and so on that are already in the storage will not need to be backed
+up again; the system will automatically spot that they're already
+there, and not upload them again.
+
+As well as storing backups of filesystems, Ugarit can also be used as
+the primary storage for read-only files, such as music and photos. The
+principle is exactly the same; the only difference is in how the files
+are organised - rather than as a directory structure, the files are
+referenced from metadata objects that specify information about the
+file (so it can be found) and a reference to the contents. Sets of
+metadata objects are pointed to by tags as well, so they can also be
+found.
+
+<h2>So what's that mean in practice?</h2>
+
+<h3>Backups</h3>
+You can run Ugarit to back up any number of filesystems to a shared
+storage area (known as a <i>vault</i>, and on every backup, Ugarit
+will only upload files or parts of files that aren't already in the
+vault - be they from the previous snapshot, earlier snapshots,
+snapshot of entirely unrelated filesystems, etc. Every time you do a
+snapshot, Ugarit builds an entire complete directory tree of the
+snapshot in the vault - but reusing any parts of files, files, or
+entire directories that already exist anywhere in the vault, and
+only uploading what doesn't already exist.
+
+The support for parts of files means that, in many cases, gigantic
+files like database tables and virtual disks for virtual machines will
+not need to be uploaded entirely every time they change, as the
+changed sections will be identified and uploaded.
+
+Because a complete directory tree exists in the vault for any
+snapshot, the extraction algorithm is incredibly simple - and,
+therefore, incredibly reliable and fast. Simple, reliable, and fast
+are just what you need when you're trying to reconstruct the
+filesystem of a live server.
+
+Also, it means that you can do lots of small snapshots. If you run a
+snapshot every hour, then only a megabyte or two might have changed in
+your filesystem, so you only upload a megabyte or two - yet you end up
+with a complete history of your filesystem at hourly intervals in the
+vault.
+
+Conventional backup systems usually either store a full backup then
+incrementals to their archives, meaning that doing a restore involves
+reading the full backup then reading every incremental since and
+applying them - so to do a restore, you have to download *every
+version* of the filesystem you've ever uploaded, or you have to do
+periodic full backups (even though most of your filesystem won't have
+changed since the last full backup) to reduce the number of
+incrementals required for a restore. Better results are had from
+systems that use a special backup server to look after the archive
+storage, which accept incremental backups and apply them to the
+snapshot they keep in order to maintain a most-recent snapshot that
+can be downloaded in a single run; but they then restrict you to using
+dedicated servers as your archive stores, ruling out cheaply scalable
+solutions like Amazon S3, or just backing up to a removable USB or
+eSATA disk you attach to your system whenever you do a backup. And
+dedicated backup servers are complex pieces of software; can you rely
+on something complex for the fundamental foundation of your data
+security system?
+
+<h3>Archives</h3>
+
+You can also use Ugarit as the primary storage for read-only
+files. You do this by creating an archive in the vault, and importing
+batches of files into it along with their metadata (arbitrary
+attributes, such as "author", "creation date" or "subject").
+
+Just as you can keep snapshots of multiple systems in a Ugarit vault,
+you can also keep multiple separate archives, each identified by a
+named tag.
+
+However, as it's all within the same vault, the usual de-duplication
+rules apply. The same file may be in multiple archives, with different
+metadata in each, as the file contents and metadata are stored
+separately (and associated only within the context of each
+archive). And, of course, the same file may appear in snapshots and in
+archives; perhaps a file was originally downloaded into your home
+directory, where it was backed up into Ugarit snapshots, and then you
+imported it into your archive. The archive import would not have had
+to re-upload the file, as its contents would have already been found
+in the vault, so all that needs to be uploaded is the metadata.
+
+Although we have mainly spoken of storing files in archives, the
+objects in archives can be files or directories full of files, as
+well. This is useful for storing MacOS-style files that are actually
+directories, or for archiving things like completed projects for
+clients, which can be entire directory structures.
+
+<h2>System Requirements</h2>
+
+Ugarit should run on any POSIX-compliant system that can run
+[http://www.call-with-current-continuation.org/|Chicken Scheme]. It
+stores and restores all the file attributes reported by the <code>stat</code>
+system call - POSIX mode permissions, UID, GID, mtime, and optionally
+atime and ctime (although the ctime cannot be restored due to POSIX
+restrictions). Ugarit will store files, directories, device and
+character special files, symlinks, and FIFOs.
+
+Support for extended filesystem attributes - ACLs, alternative
+streams, forks and other metadata - is possible, due to the extensible
+directory entry format; support for such metadata will be added as
+required.
+
+Currently, only local filesystem-based vault storage backends are
+complete: these are suitable for backing up to a removable hard disk
+or a filesystem shared via NFS or other protocols. However, the
+backend can be accessed via an SSH tunnel, so a remote server you are
+able to install Ugarit on to run the backends can be used as a remote
+vault.
+
+However, the next backend to be implemented will be one for Amazon S3,
+and an SFTP backend for storing vaults anywhere you can ssh
+to. Other backends will be implemented on demand; a vault can, in
+principle, be stored on anything that can store files by name, report
+on whether a file already exists, and efficiently download a file by
+name. This rules out magnetic tapes due to their requirement for
+sequential access.
+
+Although we need to trust that a backend won't lose data (for now), we
+don't need to trust the backend not to snoop on us, as Ugarit
+optionally encrypts everything sent to the vault.
+
+<h2>Terminology</h2>
+
+A Ugarit backend is the software module that handles backend
+storage. An actual storage area - managed by a backend - is called a
+storage, and is used to implement a vault; currently, every storage is
+a valid vault, but the planned future introduction of a distributed
+storage backend will enable multiple storages (which are not,
+themselves, valid vaults as they only contain some subset of the
+information required) to be combined into an aggregrate storage, which
+then holds the actual vault. Note that the contents of a storage is
+purely a set of blocks, and a series of named tags containing
+references to them; the storage does not know the details of
+encryption and hashing, so cannot make any sense of its contents.
+
+For example, if you use the recommended "splitlog" filesystem backend,
+your vault might be <samp>/mnt/bigdisk</samp> on the server
+<samp>prometheus</samp>. The backend (which is compiled along with the
+other filesystem backends in the <code>backend-fs</code> binary) must
+be installed on <samp>prometheus</samp>, and Ugarit clients all over
+the place may then use it via ssh to <samp>prometheus</samp>. However,
+even with the filesystem backends, the actual storage might not be on
+<samp>prometheus</samp> where the backend runs -
+<samp>/mnt/bigdisk</samp> might be an NFS mount, or a mount from a
+storage-area network. This ability to delegate via SSH is particularly
+useful with the "cache" backend, which reduces latency by storing a
+cache of what blocks exist in a backend, thereby making it quicker to
+identify already-stored files; a cluster of servers all sharing the
+same vault might all use SSH tunnels to access an instance of the
+"cache" backend on one of them (using some local disk to store the
+cache), which proxies the actual vault storage to a vault on the other
+end of a high-latency Internet link, again via an SSH tunnel.
+
+A vault is where Ugarit stores backups (as chains of snapshots) and
+archives (as chains of archive imports). Backups and archives are
+identified by tags, which are the top-level named entry points into a
+vault. A vault is based on top of a storage, along with a choice of
+hash function, compression algorithm, and encryption that are used to
+map the logical world of snapshots and archive imports into the
+physical world of blocks stored in the storage.
+
+A snapshot is a copy of a filesystem tree in the vault, with a header
+block that gives some metadata about it. A backup consists of a number
+of snapshots of a given filesystem.
+
+An archive import is a set of filesystem trees, each along with
+metadata about it. Whereas a backup is organised around a series of
+timed snapshots, an archive is organised around the metadata; the
+filesystem trees in the archive are identified by their properties.
+
+<h2>So what, exactly, is in a vault?</h2>
+
+A Ugarit vault contains a load of blocks, each up to a maximum size
+(usually 1MiB, although other backends might impose smaller
+limits). Each block is identified by the hash of its contents; this is
+how Ugarit avoids ever uploading the same data twice, by checking to
+see if the data to be uploaded already exists in the vault by
+looking up the hash. The contents of the blocks are compressed and
+then encrypted before upload.
+
+Every file uploaded is, unless it's small enough to fit in a single
+block, chopped into blocks, and each block uploaded. This way, the
+entire contents of your filesystem can be uploaded - or, at least,
+only the parts of it that aren't already there! The blocks are then
+tied together to create a snapshot by uploading blocks full of the
+hashes of the data blocks, and directory blocks are uploaded listing
+the names and attributes of files in directories, along with the
+hashes of the blocks that contain the files' contents. Even the blocks
+that contain lists of hashes of other blocks are subject to checking
+for pre-existence in the vault; if only a few MiB of your
+hundred-GiB filesystem has changed, then even the index blocks and
+directory blocks are re-used from previous snapshots.
+
+Once uploaded, a block in the vault is never again changed. After all,
+if its contents changed, its hash would change, so it would no longer
+be the same block! However, every block has a reference count,
+tracking the number of index blocks that refer to it. This means that
+the vault knows which blocks are shared between multiple snapshots (or
+shared *within* a snapshot - if a filesystem has more than one copy of
+the same file, still only one copy is uploaded), so that if a given
+snapshot is deleted, then the blocks that only that snapshot is using
+can be deleted to free up space, without corrupting other snapshots by
+deleting blocks they share. Keep in mind, however, that not all
+storage backends may support this - there are certain advantages to
+being an append-only vault. For a start, you can't delete something by
+accident! The supplied fs and sqlite backends support deletion, while
+the splitlog backend does not yet. However, the actual snapshot
+deletion command in the user interface hasn't been implemented yet
+either, so it's a moot point for now...
+
+Finally, the vault contains objects called tags. Unlike the blocks,
+the tags' contents can change, and they have meaningful names rather
+than being identified by hash. Tags identify the top-level blocks of
+snapshots within the system, from which (by following the chain of
+hashes down through the index blocks) the entire contents of a
+snapshot may be found. Unless you happen to have recorded the hash of
+a snapshot somewhere, the tags are where you find snapshots from when
+you want to do a restore.
+
+Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
+files, directories, and index blocks required, it looks up the tag you
+have identified as the target of the snapshot. If the tag already
+exists, then the snapshot it currently points to is recorded in the
+new snapshot as the "previous snapshot"; then the snapshot header
+containing the previous snapshot hash, along with the date and time
+and any comments you provide for the snapshot, and is uploaded (as
+another block, identified by its hash). The tag is then updated to
+point to the new snapshot.
+
+This way, each tag actually identifies a chronological chain of
+snapshots. Normally, you would use a tag to identify a filesystem
+being backed up; you'd keep snapshotting the filesystem to the same
+tag, resulting in all the snapshots of that filesystem hanging from
+the tag. But if you wanted to remember any particular snapshot
+(perhaps if it's the snapshot you take before a big upgrade or other
+risky operation), you can duplicate the tag, in effect 'forking' the
+chain of snapshots much like a branch in a version control system.
+
+Archive imports cause the creation of one or more archive metadata
+blocks, each of which lists the hashes of files or filesystem trees in
+the archive, along with their metadata. Each import then has a single
+archive import block pointing to the sequence of metadata blocks, and
+pointing to the previous archive import block in that archive. The
+same filesystem tree can be imported more than once to the same
+archive, and the "latest" metadata always wins.
+
+Generally, you should create lots of small archives for different
+categories of things - such as one for music, one for photos, and so
+on. You might well create separate archives for the music collections
+of different people in your household, unless they overlap, and
+another for Christmas music so it doesn't crop up in random shuffle
+play! It's easy to merge archives if you over-compartmentalise them,
+but harder to split an archive if you find it too cluttered with
+unrelated things.
+
+I've spoken of archive imports, and backup snapshots, each having a
+"previous" reference to the last import or snapshot in the chain, but
+it's actually more complex than that: they have an arbitrary list of
+zero or more previous objects. As such, it's possible for several
+imports or snapshots to have the same "previous", known as a "fork",
+and it's possible to have an import or snapshot that merges multiple
+previous ones.
+
+Forking is handy if you want to basically duplicate an archive,
+creating two new archives with the same contents to begin with, but
+each then capable of diverging thereafter. You might do this to keep
+the state of an archive before doing a bit import, so you can go back
+to the original state if you regret the import, for instance.
+
+Forking a backup tag is a more unusual operation, but also
+useful. Perhaps you have a server running many stateful services, and
+the hardware becomes overloaded, so you clone the basic setup onto
+another server, and run half of the services on the original and half
+on the new one; if you fork the backup tag of the original server to
+create a backup tag for the new server, then both servers' snapshot
+history will share the original shared state.
+
+Merging is most useful for archives; you might merge several archives
+into one, as mentioned.
+
+And, of course, you can merge backup tags, as well. If your earlier
+splitting of one server into two doesn't work out (perhaps your
+workload reduces, or you can now afford a single, more powerful,
+server to handle everything in one place), you might rsync back the
+service state from the two servers onto the new server, so it's all
+merged in the new server's filesystem. To preserve this in the
+snapshot history, you can merge the two backup tags of the two servers
+to create a backup tag for the single new server, which will
+accurately reflect the history of the filesystem.
+
+Also, tags might fork by accident - I plan to introduce a distributed
+storage backend, which will replicate blocks and tags across multiple
+storages to create a single virtual storage to build a vault on top
+of; in the event of the network of actual storages suffering a
+failure, it may be that snapshots and imports are only applied to some
+of the storages - and then subsequent snapshots and imports only get
+applied to some other subset of the storages. When the network is
+repaired and all the storages are again visible, they will have
+diverged, inconsistent, states for their tags, and the distributed
+storage system will resolve the situation by keeping the majority
+state as the state of the tag on all the backends, but preserving any
+other states by creating new tags, with the original name plus a
+suffix. These can then be merged to "heal" the conflict.

ADDED   docs/release-2.0.wiki
Index: docs/release-2.0.wiki
==================================================================
--- docs/release-2.0.wiki
+++ docs/release-2.0.wiki
@@ -0,0 +1,33 @@
+<h1>Ugarit 2.0 release notes</h1>
+
+<h2>What's new?</h2>
+
+Archival mode [dae5e21ffc], and to support its integration into
+Ugarit, implemented typed tags [08bf026f5a], displaying tag types in
+the VFS [30054df0b6], refactoring the Ugarit internals [5fa161239c],
+made the storage of logs in the vault better [68bb75789f], made it
+possible to view logs from within the VFS [4e3673e0fe], supported
+hidden tags [cf5ef4691c], recording configuration information in the
+vault (and providing instant notification if your vault
+hashing/encryption setup is incorrect, thanks to a clever idea by Andy
+Bennett) [0500d282fc], rearranged how local caching is handled
+[b5911d321a], and added support for the history of a snapshot or
+archive tag to have arbitrary branches and merges [a987e28fef], which
+(as a side-effect) improved the performance of running "ls" in long
+snapshot histories [fcf8bc942a]. Also added an sqlite backend
+[8719dfb84f], which makes testing easier but is useful in its own
+right as it's fully-featured and crash-safe, while storing the vault
+in a single file; and improved the appearance of the explore mode ls
+command, as the VFS layout has become more complex with the new
+log/properties views and all the archive mode stuff.
+
+<h2>Upgrading</h2>
+
+Ugarit 2.0 uses a new format for tags and logs, as well as the whole
+new concept of archive tags. As such, the vault format has
+changed. Ugarit 2.0 will read a vault created by prior versions of
+Ugarit, and will silently upgrade it when it adds things to the vault
+(by using the new formt for new things, and keeping the old format for
+old things). As such, when you upgrade to Ugarit 2.0 and start using
+it on an existing vault, older versions of Ugarit will not be able to
+read things that Ugarit 2.0 has added to the vault.

ADDED   docs/release-old.wiki
Index: docs/release-old.wiki
==================================================================
--- docs/release-old.wiki
+++ docs/release-old.wiki
@@ -0,0 +1,143 @@
+<h1>Ugarit v1.* release history</h1>
+
+
+  *  1.0.9:  More humane display of sizes in explore's directory
+     listings, using low-level I/O to reduce CPU usage. Myriad small
+     bug fixes and some internal structural improvements.
+
+  *  1.0.8: Bug fixes to work with the latest chicken master, and
+     increased unit test coverage to test stuff that wasn't working
+     due to chicken bugs. Looking good!
+
+  *  1.0.7: Fixed bug with directory rules (errors arose when files
+     were skipped). I need to improve the test suite coverage of
+     high-level components to stop this happening!
+
+  *  1.0.6: Fixed missing features from v1.0.5 due to a fluffed merge
+     (whoops), added tracking of directory sizes (files+bytes) in the
+     vault on snapshot and the use of this information to display
+     overall percentage completion when extracting. Directory sizes
+     can be seen in the explore interface when doing "ls -l" or "ls -ll".
+
+  *  1.0.5: Changed the VFS layout slightly, making the existence of
+     snapshot objects explicit (when you go into a tag, then go into a
+     snapshot, you now need to go into "contents" to see the actual
+     file tree; the snapshot object itself now exists as a node in the
+     tree). Added traverse-vault-* functions to the core API, and tests
+     for same, and used traverse-vault-node to drive the cd and get
+     functions in the interactive explore mode (speeding them up in the
+     process!). Added "extract" command. Added a progress reporting
+     callback facility for snapshots and extractions, and used it to
+     provide progress reporting in the front-end, every 60 seconds or
+     so by default, not at all with -q, and every time something
+     happens with -v. Added tab completion in explore mode.
+
+  *  1.0.4: Resurrected support for compression and encryption and SHA2
+  hashes, which had been broken by the failure of the
+  <code>autoload</code> egg to continue to work as it used to. Tidying
+  up error and ^C handling somewhat.
+
+  *  1.0.3: Installed sqlite busy handlers to retry when the database is
+   locked due to concurrent access (affects backend-fs, backend-cache,
+   and the file cache), and gained an EXCLUSIVE lock when locking a
+   tag in backend-fs; I'm not clear if it's necessary, but it can't
+   hurt.
+
+   BUGFIX: Logging of messages from storage backends wasn't
+   happening correctly in the Ugarit core, leading to errors when the
+   cache backend (which logs an info message at close time) was closed
+   and the log message had nowhere to go.
+
+  *  1.0.2: Made the file cache also commit periodically, rather than on
+  every write, in order to improve performance. Counting blocks and
+  bytes uploaded / reused, and file cache bytes as well as hits;
+  reporting same in snapshot UI and logging same to snapshot
+  metadata. Switched to the <code>posix-extras</code> egg and ditched our own
+  <code>posixextras.scm</code> wrappers. Used the <code>parley</code> egg in the <code>ugarit
+  explore</code> CLI for line editing. Added logging infrastructure,
+  recording of snapshot logs in the snapshot. Added recovery from
+  extraction errors. Listed lock state of tags in explore
+  mode. Backend protocol v2 introduced (retaining v1 for
+  compatability) allowing for an error on backend startup, and logging
+  nonfatal errors, warnings, and info on startup and all protocol
+  calls. Added <code>ugarit-archive-admin</code> command line interface to
+  backend-specific administrative interfaces. Configuration of the
+  splitlog backend (write protection, adjusting block size and logfile
+  size limit and commit interval) is now possible via the admin
+  interface. The admin interface also permits rebuilding the metadata
+  index of a splitlog vault with the <code>reindex!</code> admin command.
+
+  BUGFIX: Made file cache check the file hashes it finds in the
+    cache actually exist in the vault, to protect against the case
+    where a crash of some kind has caused unflushed changes to be
+    lost; the file cache may well have committed changes that the
+    backend hasn't, leading to references to nonexistant blocks. Note
+    that we assume that vaults are sequentially safe, eg if the
+    final indirect block of a large file made it, all the partial
+    blocks must have made it too.
+
+  BUGFIX: Added an explicit <code>flush!</code> command to the backend
+    protocol, and put explicit flushes at critical points in higher
+    layers (<code>backend-cache</code>, the vault abstraction in the Ugarit
+    core, and when tagging a snapshot) so that we ensure the blocks we
+    point at are flushed before committing references to them in the
+    <code>backend-cache</code> or file caches, or into tags, to ensure crash
+    safety.
+
+  BUGFIX: Made the splitlog backend never exceed the file size limit
+    (except when passed blocks that, plus a header, are larger than
+    it), rather than letting a partial block hang over the 'end'.
+
+  BUGFIX: Fixed tag locking, which was broken all over the
+    place. Concurrent snapshots to the same tag should now block for
+    one another, although why you'd want to *do* that is questionable.
+
+  BUGFIX: Fixed generation of non-keyed hashes, which was
+    incorrectly appending the type to the hash without an outer
+    hash. This breaks backwards compatability, but nobody was using
+    the old algorithm, right? I'll introduce it as an option if
+    required.
+
+  *  1.0.1: Consistency check on read blocks by default. Removed warning
+  about deletions from backend-cache; we need a new mechanism to
+  report warnings from backends to the user. Made backend-cache and
+  backend-fs/splitlog commit periodically rather than after every
+  insert, which should speed up snapshotting a lot, and reused the
+  prepared statements rather than re-preparing them all the
+  time.
+
+  BUGFIX: splitlog backend now creates log files with
+  "rw-------" rather than "rwx------" permissions; and all sqlite
+  databases (splitlog metadata, cache file, and file-cache file) are
+  created with "rw-------" rather then "rw-r--r--".
+
+  *  1.0: Migrated from gdbm to sqlite for metadata storage, removing the
+  GPL taint. Unit test suite. backend-cache made into a separate
+  backend binary. Removed backend-log.
+
+  BUGFIX: file caching uses mtime *and*
+  size now, rather than just mtime. Error handling so we skip objects
+  that we cannot do something with, and proceed to try the rest of the
+  operation.
+
+  *  0.8: decoupling backends from the core and into separate binaries,
+  accessed via standard input and output, so they can be run over SSH
+  tunnels and other such magic.
+
+  *  0.7: file cache support, sorting of directories so they're archived
+  in canonical order, autoloading of hash/encryption/compression
+  modules so they're not required dependencies any more.
+
+  *  0.6: .ugarit support.
+
+  *  0.5: Keyed hashing so attackers can't tell what blocks you have,
+  markers in logs so the index can be reconstructed, sha2 support, and
+  passphrase support.
+
+  *  0.4: AES encryption.
+
+  *  0.3: Added splitlog backend, and fixed a .meta file typo.
+
+  *  0.2: Initial public release.
+
+  *  0.1: Internal development release.

ADDED   docs/security.wiki
Index: docs/security.wiki
==================================================================
--- docs/security.wiki
+++ docs/security.wiki
@@ -0,0 +1,170 @@
+<h1>Security model</h1>
+
+I have designed and implemented Ugarit to be able to handle cases
+where the actual vault storage is not entirely trusted.
+
+However, security involves tradeoffs, and Ugarit is configurable in
+ways that affect its resistance to different kinds of attacks. Here I
+will list different kinds of attack and explain how Ugarit can deal
+with them, and how you need to configure it to gain that
+protection.
+
+<h2>Vault snoopers</h2>
+
+This might be somebody who can intercept Ugarit's communication with
+the vault at any point, or who can read the vault itself at their
+leisure.
+
+Ugarit's splitlog backend creates files with "rw-------" permissions
+out of the box to try and prevent this. This is a pain for people who
+want to share vaults between UIDs, but we can add a configuration
+option to override this if that becomes a problem.
+
+<h3>Reading your data</h3>
+
+If you enable encryption, then all the blocks sent to the vault are
+encrypted using a secret key stored in your Ugarit configuration
+file. As long as that configuration file is kept safe, and the AES
+algorithm is secure, then attackers who can snoop the vault cannot
+decode your data blocks. Enabling compression will also help, as the
+blocks are compressed before encrypting, which is thought to make
+cryptographic analysis harder.
+
+Recommendations: Use compression and encryption when there is a risk
+of vault snooping. Keep your Ugarit configuration file safe using
+UNIX file permissions (make it readable only by root), and maybe store
+it on a removable device that's only plugged in when
+required. Alternatively, use the "prompt" passphrase option, and be
+prompted for a passphrase every time you run Ugarit, so it isn't
+stored on disk anywhere.
+
+<h3>Looking for known hashes</h3>
+
+A block is identified by the hash of its content (before compression
+and encryption). If an attacker was trying to find people who own a
+particular file (perhaps a piece of subversive literature), they could
+search Ugarit vaults for its hash.
+
+However, Ugarit has the option to "key" the hash with a "salt" stored
+in the Ugarit configuration file. This means that the hashes used are
+actually a hash of the block's contents *and* the salt you supply. If
+you do this with a random salt that you keep secret, then attackers
+can't check your vault for known content just by comparing the hashes.
+
+Recommendations: Provide a secret string to your hash function in your
+Ugarit configuration file. Keep the Ugarit configuration file safe, as
+per the advice in the previous point.
+
+<h2>Vault modifiers</h2>
+
+These folks can modify Ugarit's writes into the vault, its reads
+back from the vault, or can modify the vault itself at their leisure.
+
+Modifying an encrypted block without knowing the encryption key can at
+worst be a denial of service, corrupting the block in an unknown
+way. An attacker who knows the encryption key could replace a block
+with valid-seeming but incorrect content. In the worst case, this
+could exploit a bug in the decompression engine, causing a crash or
+even an exploit of the Ugarit process itself (thereby gaining the
+powers of a process inspector, as documented below). We can but hope
+that the decompression engine is robust. Exploits of the decryption
+engine, or other parts of Ugarit, are less likely due to the nature of
+the operations performed upon them.
+
+However, if a block is modified, then when Ugarit reads it back, the
+hash will no longer match the hash Ugarit requested, which will be
+detected and an error reported. The hash is checked after
+decryption and decompression, so this check does not protect us
+against exploits of the decompression engine.
+
+This protection is only afforded when the hash Ugarit asks for is not
+tampered with. Most hashes are obtained from within other blocks,
+which are therefore safe unless that block has been tampered with; the
+nature of the hash tree conveys the trust in the hashes up to the
+root. The root hashes are stored in the vault as "tags", which an
+vault modifier could alter at will. Therefore, the tags cannot be
+trusted if somebody might modify the vault. This is why Ugarit
+prints out the snapshot hash and the root directory hash after
+performing a snapshot, so you can record them securely outside of the
+vault.
+
+The most likely threat posed by vault modifiers is that they could
+simply corrupt or delete all of your vault, without needing to know
+any encryption keys.
+
+Recommendations: Secure your vaults against modifiers, by whatever
+means possible. If vault modifiers are still a potential threat,
+write down a log of your root directory hashes from each snapshot, and keep
+it safe. When extracting your backups, use the <code>ls -ll</code> command in the
+interface to check the "contents" hash of your snapshots, and check
+they match the root directory hash you expect.
+
+<h2>Process inspectors</h2>
+
+These folks can attach debuggers or similar tools to running
+processes, such as Ugarit itself.
+
+Ugarit backend processes only see encrypted data, so people who can
+attach to that process gain the powers of vault snoopers and
+modifiers, and the same conditions apply.
+
+People who can attach to the Ugarit process itself, however, will see
+the original unencrypted content of your filesystem, and will have
+full access to the encryption keys and hashing keys stored in your
+Ugarit configuration. When Ugarit is running with sufficient
+permissions to restore backups, they will be able to intercept and
+modify the data as it comes out, and probably gain total write access
+to your entire filesystem in the process.
+
+Recommendations: Ensure that Ugarit does not run under the same user
+ID as untrusted software. In many cases it will need to run as root in
+order to gain unfettered access to read the filesystems it is backing
+up, or to restore the ownership of files. However, when all the files
+it backs up are world-readable, it could run as an untrusted user for
+backups, and where file ownership is trivially reconstructible, it can
+do restores as a limited user, too.
+
+<h2>Attackers in the source filesystem</h2>
+
+These folks create files that Ugarit will back up one day. By having
+write access to your filesystem, they already have some level of
+power, and standard Unix security practices such as storage quotas
+should be used to control them. They may be people with logins on your
+box, or more subtly, people who can cause servers to writes files;
+somebody who sends an email to your mailserver will probably cause
+that message to be written to queue files, as will people who can
+upload files via any means.
+
+Such attackers might use up your available storage by creating large
+files. This creates a problem in the actual filesystem, but that
+problem can be fixed by deleting the files. If those files get
+stored into Ugarit, then they are a part of that snapshot. If you
+are using a backend that supports deletion, then (when I implement
+snapshot deletion in the user interface) you could delete that entire
+snapshot to recover the wasted space, but that is a rather serious
+operation.
+
+More insidiously, such attackers might attempt to abuse a hash
+collision in order to fool the vault. If they have a way of creating
+a file that, for instance, has the same hash as your shadow password
+file, then Ugarit will think that it already has that file when it
+attempts to snapshot it, and store a reference to the existing
+file. If that snapshot is restored, then they will receive a copy of
+your shadow password file. Similarly, if they can predict a future
+hash of your shadow password file, and create a shadow password file
+of their own (perhaps one giving them a root account with a known
+password) with that hash, they can then wait for the real shadow
+password file to have that hash. If the system is later restored from
+that snapshot, then their chosen content will appear in the shadow
+password file. However, doing this requires a very fundamental break
+of the hash function being used.
+
+Recommendations: Think carefully about who has write access to your
+filesystems, directly or indirectly via a network service that stores
+received data to disk. Enforce quotas where appropriate, and consider
+not backing up "queue directories" where untrusted content might
+appear; migrate incoming content that passes acceptance tests to an
+area that is backed up. If necessary, the queue might be backed up to
+a non-snapshotting system, such as rsyncing to another server, so that
+any excessive files that appear in there are removed from the backup
+in due course, while still affording protection.

ADDED   docs/storage-admin.wiki
Index: docs/storage-admin.wiki
==================================================================
--- docs/storage-admin.wiki
+++ docs/storage-admin.wiki
@@ -0,0 +1,91 @@
+<h1>Storage administration</h1>
+
+Each backend offers a number of administrative commands for
+administering the storage underlying vaults. These are accessible via
+the <code>ugarit-storage-admin</code> command line interface.
+
+To use it, run it with the following command:
+
+<pre>$ ugarit-storage-admin '<vault identifier>'</pre>
+
+The available commands differ between backends, but all backends
+support the <code>info</code> and <code>help</code> commands, which
+give basic information about the vault, and list all available
+commands, respectively. Some offer a <code>stats</code> command that
+examines the vault state to give interesting statistics, but which may
+be a time-consuming operation.
+
+<h2>Administering <code>splitlog</code> storages</h2>
+
+The splitlog backend offers a wide selection of administrative
+commands. See the <code>help</code> command on a splitlog vault for
+details. The following commands are available:
+
+<dl>
+
+<dt><code>help</code></dt>
+<dd>List the available commands.</dd>
+
+<dt><code>info</code></dt>
+<dd>List some basic information about the storage.</dd>
+
+<dt><code>stats</code></dt>
+<dd>Examine the metadata to provide overall statistics about the
+archive. This may be a time-consuming operation on large
+storages.</dd>
+
+<dt><code>set-block-size! BYTES</code></dt>
+<dd>Sets the block size to the given number of bytes. This will affect
+new blocks written to the storage, and leave existing blocks
+untouched, even if they are larger than the new block size.</dd>
+
+<dt><code>set-max-logfile-size! BYTES</code></dt>
+<dd>Sets the size at which a log file is finished and a new one
+started (likewise, existing log files will be untouched; this will
+only affect new log files)</dd>
+
+<dt><code>set-commit-interval! UPDATES</code></dt>
+<dd>Sets the frequency of automatic synching of the storage
+state to disk. Lowering this harms performance when writing to the
+storage, but decreases the number of in-progress block writes that
+can fail in a crash.</dd>
+
+<dt><code>write-protect!</code></dt>
+<dd>Disables updating of the storage.</dd>
+
+<dt><code>write-unprotect!</code></dt>
+<dd>Re-enables updating of the storage.</dd>
+
+<dt><code>reindex!</code></dt>
+<dd>Reindex the storage, rebuilding the block and tag state from the
+contents of the log. If the metadata file is damaged or lost,
+reindexing can rebuild it (although any configuration changes made
+via other admin commands will need manually repeating as they are
+not logged).</dd>
+</dl>
+
+<h2>Administering <code>sqlite</code> storages</h2>
+
+The sqlite backend has a similar administrative interface to the
+splitlog backend, except that it does not have log files, so lacks the
+<code>set-max-logfile-size!</code> and <code>reindex!</code> commands.
+
+<h2>Administering <code>cache</code> storages</h2>
+
+The cache backend provides a minimalistic interface:
+
+<dl>
+
+<dt><code>help</code></dt>
+<dd>List the available commands.</dd>
+
+<dt><code>info</code></dt>
+<dd>List some basic information about the storage.</dd>
+
+<dt><code>stats</code></dt>
+<dd>Report on how many entries are in the cache.</dd>
+
+<dt><code>clear!</code></dt>
+<dd>Clears the cache, dropping all the entries in it.</dd>
+
+</dl>

Index: ugarit-api.scm
==================================================================
--- ugarit-api.scm
+++ ugarit-api.scm
@@ -402,12 +402,12 @@
                         ('store-atime
                          (set! (job-store-atime? (current-job)) #t))
                         ('store-ctime
                          (set! (job-store-ctime? (current-job)) #t))
                         (('storage command-line)
-                         (set! *storage* 
-                               (with-backend-logging 
+                         (set! *storage*
+                               (with-backend-logging
                                 (import-storage command-line))))
                         (('hash . conf) (set! *hash* conf))
                         (('compression . conf) (set! *compression* conf))
                         (('encryption . conf) (set! *crypto* conf))
                         (('cache path)

Index: ugarit.release-info
==================================================================
--- ugarit.release-info
+++ ugarit.release-info
@@ -7,5 +7,6 @@
 (release "1.0.4")
 (release "1.0.5")
 (release "1.0.6")
 (release "1.0.7")
 (release "1.0.9")
+(release "2.0")

Index: ugarit.setup
==================================================================
--- ugarit.setup
+++ ugarit.setup
@@ -1,8 +1,8 @@
 (use posix)
 
-(define *version* "1.0.9")
+(define *version* "2.0")
 
 (define (newer file1 file2)
   (or
    (not (get-environment-variable "UGARIT_FAST_BUILD"))
    (not (file-exists? file2))