You can obtain the latest sources, all history, and a local copy of
the ticket database using [http://www.fossil-scm.org/|Fossil], like so:
fossil clone https://www.kitten-technologies.co.uk/project/ugarit ugarit.fossil
Index: README.wiki
==================================================================
--- README.wiki
+++ README.wiki
@@ -1,11 +1,11 @@
Introduction
Ugarit is a backup/archival system based around content-addressible
-storage.
+storage. [./docs/intro.wiki|Learn more...]
News
Development priorities are: Performance, better error handling, and
fixing bugs! After I've cleaned house a little, I'll be focussing on
@@ -12,11 +12,11 @@
replicated backend storage (ticket [f1f2ce8cdc]), as I now have a
cluster of storage devices at home.
-
FIXME: Version 2.0 is released, containing rudimentary archive
+
2015-06-12: [./docs/release-2.0.wiki|Version 2.0] is released, containing rudimentary archive
mode, plus many minor improvements! See the release notes at the
bottom for more details.
2014-11-02: Chicken itself has gained
[http://code.call-cc.org/cgi-bin/gitweb.cgi?p=chicken-core.git;a=commit;h=a0ce0b4cb4155754c1a304c0d8b15276b11b8cd2|significantly
@@ -24,1777 +24,24 @@
writing; I look forward to it being in a formal release, as it sped up
Ugarit snapshot benchmarks (dumping a 256MiB file into an sqlite
backend) by a factor of twenty-something.
2014-02-21: User [http://rmm.meta.ph/|Rommel Martinez] has written
-[https://ebzzry.github.io/blog/2014/02/21/an-introduction-to-ugarit/|An introduction to Ugarit]!
+[http://rmm.meta.ph/blog/2014/02/21/an-introduction-to-ugarit/|An introduction to Ugarit]!
-
About Ugarit
-
-
What's content-addressible storage?
-
-Traditional backup systems work by storing copies of your files
-somewhere. Perhaps they go onto tapes, or perhaps they're in archive
-files written to disk. They will either be full dumps, containing a
-complete copy of your files, or incrementals or differentials, which
-only contain files that have been modified since some point. This
-saves making repeated copies of unchanging files, but it means that to
-do a full restore, you need to start by extracting the last full dump
-then applying one or more incrementials, or the latest differential,
-to get the latest state.
-
-Not only do differentials and incrementals let you save space, they
-also give you a history - you can restore up to a previous point in
-time, which is invaluable if the file you want to restore was deleted
-a few backup cycles ago!
-
-This technology was developed when the best storage technology for
-backups was magnetic tape, because each dump is written sequentially
-(and restores are largely sequential, unless you're skipping bits to
-pull out specific files).
-
-However, these days, random-access media such as magnetic disks and
-SSDs are cheap enough to compete with magnetic tape for long-term bulk
-storage (especially when one considers the cost of a tape drive or
-two). And having fast random access means we can take advantage of
-different storage techniques.
-
-A content-addressible store is a key-value store, except that the keys
-are always computed from the values. When a given object is stored, it
-is hashed, and the hash used as the key. This means you can never
-store the same object twice; the second time you'll get the same hash,
-see the object is already present, and re-use the existing
-copy. Therefore, you get deduplication of your data for free.
-
-But, I hear you ask, how do you find things again, if you can't choose
-the keys?
-
-When an object is stored, you need to record the key so you can find
-it again later. In Ugarit, everything is stored in a tree-like
-directory structure. Files are uploaded and their hashes obtained, and
-then a directory object is constructed containing a list of the files
-in the directory, and listing the key of the Ugarit objects storing
-the contents of each file. This directory object itself has a hash,
-which is stored inside the directory entry in the parent directory,
-and so on up to the root. The root of a tree stored in a Ugarit vault
-has no parent directory to contain it, so at that point, we store the
-key of the root in a named "tag" that we can look up by name when we
-want it.
-
-Therefore, everything in a Ugarit vault can be found by starting with
-a named tag and retrieving the object whose key it contains, then
-finding keys inside that object and looking up the objects they refer
-to, until we find the object we want.
-
-When you use Ugarit to back up your filesystem, it uploads a complete
-snapshot of every file in the filesystem, like a full dump. But
-because the vault is content-addressed, it automatically avoids
-uploading anything it already has a copy of, so all we upload is an
-incremental dump - but in the vault, it looks like a full dump, and so
-can be restored on its own without having to restore a chain of incrementals.
-
-Also, the same storage can be shared between multiple systems that all
-back up to it - and the incremental upload algorithm will mean that
-any files shared between the servers will only need to be uploaded
-once. If you back up a complete server, than go and back up another
-that is running the same distribution, then all the files in /bin
-and so on that are already in the storage will not need to be backed
-up again; the system will automatically spot that they're already
-there, and not upload them again.
-
-As well as storing backups of filesystems, Ugarit can also be used as
-the primary storage for read-only files, such as music and photos. The
-principle is exactly the same; the only difference is in how the files
-are organised - rather than as a directory structure, the files are
-referenced from metadata objects that specify information about the
-file (so it can be found) and a reference to the contents. Sets of
-metadata objects are pointed to by tags as well, so they can also be
-found.
-
-
So what's that mean in practice?
-
-
Backups
-You can run Ugarit to back up any number of filesystems to a shared
-storage area (known as a vault, and on every backup, Ugarit
-will only upload files or parts of files that aren't already in the
-vault - be they from the previous snapshot, earlier snapshots,
-snapshot of entirely unrelated filesystems, etc. Every time you do a
-snapshot, Ugarit builds an entire complete directory tree of the
-snapshot in the vault - but reusing any parts of files, files, or
-entire directories that already exist anywhere in the vault, and
-only uploading what doesn't already exist.
-
-The support for parts of files means that, in many cases, gigantic
-files like database tables and virtual disks for virtual machines will
-not need to be uploaded entirely every time they change, as the
-changed sections will be identified and uploaded.
-
-Because a complete directory tree exists in the vault for any
-snapshot, the extraction algorithm is incredibly simple - and,
-therefore, incredibly reliable and fast. Simple, reliable, and fast
-are just what you need when you're trying to reconstruct the
-filesystem of a live server.
-
-Also, it means that you can do lots of small snapshots. If you run a
-snapshot every hour, then only a megabyte or two might have changed in
-your filesystem, so you only upload a megabyte or two - yet you end up
-with a complete history of your filesystem at hourly intervals in the
-vault.
-
-Conventional backup systems usually either store a full backup then
-incrementals to their archives, meaning that doing a restore involves
-reading the full backup then reading every incremental since and
-applying them - so to do a restore, you have to download *every
-version* of the filesystem you've ever uploaded, or you have to do
-periodic full backups (even though most of your filesystem won't have
-changed since the last full backup) to reduce the number of
-incrementals required for a restore. Better results are had from
-systems that use a special backup server to look after the archive
-storage, which accept incremental backups and apply them to the
-snapshot they keep in order to maintain a most-recent snapshot that
-can be downloaded in a single run; but they then restrict you to using
-dedicated servers as your archive stores, ruling out cheaply scalable
-solutions like Amazon S3, or just backing up to a removable USB or
-eSATA disk you attach to your system whenever you do a backup. And
-dedicated backup servers are complex pieces of software; can you rely
-on something complex for the fundamental foundation of your data
-security system?
-
-
Archives
-
-You can also use Ugarit as the primary storage for read-only
-files. You do this by creating an archive in the vault, and importing
-batches of files into it along with their metadata (arbitrary
-attributes, such as "author", "creation date" or "subject").
-
-Just as you can keep snapshots of multiple systems in a Ugarit vault,
-you can also keep multiple separate archives, each identified by a
-named tag.
-
-However, as it's all within the same vault, the usual de-duplication
-rules apply. The same file may be in multiple archives, with different
-metadata in each, as the file contents and metadata are stored
-separately (and associated only within the context of each
-archive). And, of course, the same file may appear in snapshots and in
-archives; perhaps a file was originally downloaded into your home
-directory, where it was backed up into Ugarit snapshots, and then you
-imported it into your archive. The archive import would not have had
-to re-upload the file, as its contents would have already been found
-in the vault, so all that needs to be uploaded is the metadata.
-
-Although we have mainly spoken of storing files in archives, the
-objects in archives can be files or directories full of files, as
-well. This is useful for storing MacOS-style files that are actually
-directories, or for archiving things like completed projects for
-clients, which can be entire directory structures.
-
-
System Requirements
-
-Ugarit should run on any POSIX-compliant system that can run
-[http://www.call-with-current-continuation.org/|Chicken Scheme]. It
-stores and restores all the file attributes reported by the stat
-system call - POSIX mode permissions, UID, GID, mtime, and optionally
-atime and ctime (although the ctime cannot be restored due to POSIX
-restrictions). Ugarit will store files, directories, device and
-character special files, symlinks, and FIFOs.
-
-Support for extended filesystem attributes - ACLs, alternative
-streams, forks and other metadata - is possible, due to the extensible
-directory entry format; support for such metadata will be added as
-required.
-
-Currently, only local filesystem-based vault storage backends are
-complete: these are suitable for backing up to a removable hard disk
-or a filesystem shared via NFS or other protocols. However, the
-backend can be accessed via an SSH tunnel, so a remote server you are
-able to install Ugarit on to run the backends can be used as a remote
-vault.
-
-However, the next backend to be implemented will be one for Amazon S3,
-and an SFTP backend for storing vaults anywhere you can ssh
-to. Other backends will be implemented on demand; a vault can, in
-principle, be stored on anything that can store files by name, report
-on whether a file already exists, and efficiently download a file by
-name. This rules out magnetic tapes due to their requirement for
-sequential access.
-
-Although we need to trust that a backend won't lose data (for now), we
-don't need to trust the backend not to snoop on us, as Ugarit
-optionally encrypts everything sent to the vault.
-
-
Terminology
-
-A Ugarit backend is the software module that handles backend
-storage. An actual storage area - managed by a backend - is called a
-storage, and is used to implement a vault; currently, every storage is
-a valid vault, but the planned future introduction of a distributed
-storage backend will enable multiple storages (which are not,
-themselves, valid vaults as they only contain some subset of the
-information required) to be combined into an aggregrate storage, which
-then holds the actual vault. Note that the contents of a storage is
-purely a set of blocks, and a series of named tags containing
-references to them; the storage does not know the details of
-encryption and hashing, so cannot make any sense of its contents.
-
-For example, if you use the recommended "splitlog" filesystem backend,
-your vault might be /mnt/bigdisk on the server
-prometheus. The backend (which is compiled along with the
-other filesystem backends in the backend-fs binary) must
-be installed on prometheus, and Ugarit clients all over
-the place may then use it via ssh to prometheus. However,
-even with the filesystem backends, the actual storage might not be on
-prometheus where the backend runs -
-/mnt/bigdisk might be an NFS mount, or a mount from a
-storage-area network. This ability to delegate via SSH is particularly
-useful with the "cache" backend, which reduces latency by storing a
-cache of what blocks exist in a backend, thereby making it quicker to
-identify already-stored files; a cluster of servers all sharing the
-same vault might all use SSH tunnels to access an instance of the
-"cache" backend on one of them (using some local disk to store the
-cache), which proxies the actual vault storage to a vault on the other
-end of a high-latency Internet link, again via an SSH tunnel.
-
-A vault is where Ugarit stores backups (as chains of snapshots) and
-archives (as chains of archive imports). Backups and archives are
-identified by tags, which are the top-level named entry points into a
-vault. A vault is based on top of a storage, along with a choice of
-hash function, compression algorithm, and encryption that are used to
-map the logical world of snapshots and archive imports into the
-physical world of blocks stored in the storage.
-
-A snapshot is a copy of a filesystem tree in the vault, with a header
-block that gives some metadata about it. A backup consists of a number
-of snapshots of a given filesystem.
-
-An archive import is a set of filesystem trees, each along with
-metadata about it. Whereas a backup is organised around a series of
-timed snapshots, an archive is organised around the metadata; the
-filesystem trees in the archive are identified by their properties.
-
-
So what, exactly, is in a vault?
-
-A Ugarit vault contains a load of blocks, each up to a maximum size
-(usually 1MiB, although other backends might impose smaller
-limits). Each block is identified by the hash of its contents; this is
-how Ugarit avoids ever uploading the same data twice, by checking to
-see if the data to be uploaded already exists in the vault by
-looking up the hash. The contents of the blocks are compressed and
-then encrypted before upload.
-
-Every file uploaded is, unless it's small enough to fit in a single
-block, chopped into blocks, and each block uploaded. This way, the
-entire contents of your filesystem can be uploaded - or, at least,
-only the parts of it that aren't already there! The blocks are then
-tied together to create a snapshot by uploading blocks full of the
-hashes of the data blocks, and directory blocks are uploaded listing
-the names and attributes of files in directories, along with the
-hashes of the blocks that contain the files' contents. Even the blocks
-that contain lists of hashes of other blocks are subject to checking
-for pre-existence in the vault; if only a few MiB of your
-hundred-GiB filesystem has changed, then even the index blocks and
-directory blocks are re-used from previous snapshots.
-
-Once uploaded, a block in the vault is never again changed. After all,
-if its contents changed, its hash would change, so it would no longer
-be the same block! However, every block has a reference count,
-tracking the number of index blocks that refer to it. This means that
-the vault knows which blocks are shared between multiple snapshots (or
-shared *within* a snapshot - if a filesystem has more than one copy of
-the same file, still only one copy is uploaded), so that if a given
-snapshot is deleted, then the blocks that only that snapshot is using
-can be deleted to free up space, without corrupting other snapshots by
-deleting blocks they share. Keep in mind, however, that not all
-storage backends may support this - there are certain advantages to
-being an append-only vault. For a start, you can't delete something by
-accident! The supplied fs and sqlite backends support deletion, while
-the splitlog backend does not yet. However, the actual snapshot
-deletion command in the user interface hasn't been implemented yet
-either, so it's a moot point for now...
-
-Finally, the vault contains objects called tags. Unlike the blocks,
-the tags' contents can change, and they have meaningful names rather
-than being identified by hash. Tags identify the top-level blocks of
-snapshots within the system, from which (by following the chain of
-hashes down through the index blocks) the entire contents of a
-snapshot may be found. Unless you happen to have recorded the hash of
-a snapshot somewhere, the tags are where you find snapshots from when
-you want to do a restore.
-
-Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
-files, directories, and index blocks required, it looks up the tag you
-have identified as the target of the snapshot. If the tag already
-exists, then the snapshot it currently points to is recorded in the
-new snapshot as the "previous snapshot"; then the snapshot header
-containing the previous snapshot hash, along with the date and time
-and any comments you provide for the snapshot, and is uploaded (as
-another block, identified by its hash). The tag is then updated to
-point to the new snapshot.
-
-This way, each tag actually identifies a chronological chain of
-snapshots. Normally, you would use a tag to identify a filesystem
-being backed up; you'd keep snapshotting the filesystem to the same
-tag, resulting in all the snapshots of that filesystem hanging from
-the tag. But if you wanted to remember any particular snapshot
-(perhaps if it's the snapshot you take before a big upgrade or other
-risky operation), you can duplicate the tag, in effect 'forking' the
-chain of snapshots much like a branch in a version control system.
-
-Archive imports cause the creation of one or more archive metadata
-blocks, each of which lists the hashes of files or filesystem trees in
-the archive, along with their metadata. Each import then has a single
-archive import block pointing to the sequence of metadata blocks, and
-pointing to the previous archive import block in that archive. The
-same filesystem tree can be imported more than once to the same
-archive, and the "latest" metadata always wins.
-
-Generally, you should create lots of small archives for different
-categories of things - such as one for music, one for photos, and so
-on. You might well create separate archives for the music collections
-of different people in your household, unless they overlap, and
-another for Christmas music so it doesn't crop up in random shuffle
-play! It's easy to merge archives if you over-compartmentalise them,
-but harder to split an archive if you find it too cluttered with
-unrelated things.
-
-I've spoken of archive imports, and backup snapshots, each having a
-"previous" reference to the last import or snapshot in the chain, but
-it's actually more complex than that: they have an arbitrary list of
-zero or more previous objects. As such, it's possible for several
-imports or snapshots to have the same "previous", known as a "fork",
-and it's possible to have an import or snapshot that merges multiple
-previous ones.
-
-Forking is handy if you want to basically duplicate an archive,
-creating two new archives with the same contents to begin with, but
-each then capable of diverging thereafter. You might do this to keep
-the state of an archive before doing a bit import, so you can go back
-to the original state if you regret the import, for instance.
-
-Forking a backup tag is a more unusual operation, but also
-useful. Perhaps you have a server running many stateful services, and
-the hardware becomes overloaded, so you clone the basic setup onto
-another server, and run half of the services on the original and half
-on the new one; if you fork the backup tag of the original server to
-create a backup tag for the new server, then both servers' snapshot
-history will share the original shared state.
-
-Merging is most useful for archives; you might merge several archives
-into one, as mentioned.
-
-And, of course, you can merge backup tags, as well. If your earlier
-splitting of one server into two doesn't work out (perhaps your
-workload reduces, or you can now afford a single, more powerful,
-server to handle everything in one place), you might rsync back the
-service state from the two servers onto the new server, so it's all
-merged in the new server's filesystem. To preserve this in the
-snapshot history, you can merge the two backup tags of the two servers
-to create a backup tag for the single new server, which will
-accurately reflect the history of the filesystem.
-
-Also, tags might fork by accident - I plan to introduce a distributed
-storage backend, which will replicate blocks and tags across multiple
-storages to create a single virtual storage to build a vault on top
-of; in the event of the network of actual storages suffering a
-failure, it may be that snapshots and imports are only applied to some
-of the storages - and then subsequent snapshots and imports only get
-applied to some other subset of the storages. When the network is
-repaired and all the storages are again visible, they will have
-diverged, inconsistent, states for their tags, and the distributed
-storage system will resolve the situation by keeping the majority
-state as the state of the tag on all the backends, but preserving any
-other states by creating new tags, with the original name plus a
-suffix. These can then be merged to "heal" the conflict.
-
-
Using Ugarit
-
-
Installation
-
-Install [http://www.call-with-current-continuation.org/|Chicken Scheme] using their [http://wiki.call-cc.org/man/4/Getting%20started|installation instructions].
-
-Ugarit can then be installed by typing (as root):
-
- chicken-install ugarit
-
-See the [http://wiki.call-cc.org/manual/Extensions#chicken-install-reference|chicken-install manual] for details if you have any trouble, or wish to install into your home directory.
-
-
Setting up a vault
-
-Firstly, you need to know the vault identifier for the place you'll
-be storing your vaults. This depends on your backend. The vault
-identifier is actually the command line used to invoke the backend for
-a particular vault; communication with the vault is via standard
-input and output, which is how it's easy to tunnel via ssh.
-
-
Local filesystem backends
-
-These backends use the local filesystem to store the vaults. Of
-course, the "local filesystem" on a given server might be an NFS mount
-or mounted from a storage-area network.
-
-
Logfile backend
-
-The logfile backend works much like the original Venti system. It's
-append-only - you won't be able to delete old snapshots from a logfile
-vault, even when I implement deletion. It stores the vault in two
-sets of files; one is a log of data blocks, split at a specified
-maximum size, and the other is the metadata: an sqlite database used
-to track the location of blocks in the log files, the contents of
-tags, and a count of the logs so a filename can be chosen for a new one.
-
-To set up a new logfile vault, just choose where to put the two
-parts. It would be nice to put the metadata file on a different
-physical disk to the logs directory, to reduce seeking. If you only
-have one disk, you can put the metadata file in the log directory
-("metadata" is a good name).
-
-You can then refer to it using the following vault identifier:
-
- "backend-fs splitlog ...log directory... ...metadata file..."
-
-
SQLite backend
-
-The sqlite backend works a bit like a
-[http://www.fossil-scm.org/|Fossil] repository; the storage is
-implemented as a single file, which is actually an SQLite database
-containing blocks as blobs, along with tags and configuration data in
-their own tables.
-
-It supports unlinking objects, and the use of a single file to store
-everything is convenient; but storing everything in a single file with
-random access is slightly riskier than the simple structure of an
-append-only log file; it is less tolerant of corruption, which can
-easily render the entire storage unusable. Also, that one file can get
-very large.
-
-SQLite has internal limits on the size of a database, but they're
-quite large - you'll probably hit a size limit at about 140
-terabytes.
-
-To set up an SQLite storage, just choose a place to put the file. I
-usually use an extension of .vault; note that SQLite will
-create additional temporary files alongside it with additional
-extensions, too.
-
-Then refer to it with the following vault identifier:
-
- "backend-sqlite ...path to vault file..."
-
-
Filesystem backend
-
-The filesystem backend creates vaults by storing each block or tag
-in its own file, in a directory. To keep the objects-per-directory
-count down, it'll split the files into subdirectories. Because of
-this, it uses a stupendous number of inodes (more than the filesystem
-being backed up). Only use it if you don't mind that; splitlog is much
-more efficient.
-
-To set up a new filesystem-backend vault, just create an empty
-directory that Ugarit will have write access to when it runs. It will
-probably run as root in order to be able to access the contents of
-files that aren't world-readable (although that's up to you), so
-unless you access your storage via ssh or sudo to use another user to
-run the backend under, be careful of NFS mounts that have
-maproot=nobody set!
-
-You can then refer to it using the following vault identifier:
-
- "backend-fs fs ...path to directory..."
-
-
Proxying backends
-
-These backends wrap another vault identifier which the actual
-storage task is delegated to, but add some value along the way.
-
-
SSH tunnelling
-
-It's easy to access a vault stored on a remote server. The caveat
-is that the backend then needs to be installed on the remote server!
-Since vaults are accessed by running the supplied command, and then
-talking to them via stdin and stdout, the vault identified needs
-only be:
-
- "ssh ...hostname... '...remote vault identifier...'"
-
-
Cache backend
-
-The cache backend is used to cache a list of what blocks exist in the
-proxied backend, so that it can answer queries as to the existance of
-a block rapidly, even when the proxied backend is on the end of a
-high-latency link (eg, the Internet). This should speed up snapshots,
-as existing files are identified by asking the backend if the vault
-already has them.
-
-The cache backend works by storing the cache in a local sqlite
-file. Given a place for it to store that file, usage is simple:
-
- "backend-cache ...path to cachefile... '...proxied vault identifier...'"
-
-The cache file will be automatically created if it doesn't already
-exist, so make sure there's write access to the containing directory.
-
- - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
-
-If you use a cache on a vault shared between servers, make sure
-that you either:
-
- * Never delete things from the vault
-
-or
-
- * Make sure all access to the vault is via the same cache
-
-If a block is deleted from a vault, and a cache on that vault is
-not aware of the deletion (as it did not go "through" the caching
-proxy), then the cache will record that the block exists in the
-vault when it does not. This will mean that if a snapshot is made
-through the cache that would use that block, then it will be assumed
-that the block already exists in the vault when it does
-not. Therefore, the block will not be uploaded, and a dangling
-reference will result!
-
-Some setups which *are* safe:
-
- * A single server using a vault via a cache, not sharing it with
- anyone else.
-
- * A pool of servers using a vault via the same cache.
-
- * A pool of servers using a vault via one or more caches, and
- maybe some not via the cache, where nothing is ever deleted from
- the vault.
-
- * A pool of servers using a vault via one cache, and maybe some
- not via the cache, where deletions are only performed on servers
- using the cache, so the cache is always aware.
-
-
Writing a ugarit.conf
-
-ugarit.conf should look something like this:
-
-(storage )
-(hash tiger "")
-[double-check]
-[(compression [deflate|lzma])]
-[(encryption aes )]
-[(file-cache "")]
-[(rule ...)]
-
-The hash line chooses a hash algorithm. Currently Tiger-192
-(tiger), SHA-256 (sha256), SHA-384
-(sha384) and SHA-512 (sha512) are supported;
-if you omit the line then Tiger will still be used, but it will be a
-simple hash of the block with the block type appended, which reveals
-to attackers what blocks you have (as the hash is of the unencrypted
-block, and the hash is not encrypted). This is useful for development
-and testing or for use with trusted vaults, but not advised for use
-with vaults that attackers may snoop at. Providing a salt string
-produces a hash function that hashes the block, the type of block, and
-the salt string, producing hashes that attackers who can snoop the
-vault cannot use to find known blocks (see the "Security model"
-section below for more details).
-
-I would recommend that you create a salt string from a secure entropy
-source, such as:
-
- dd if=/dev/random bs=1 count=64 | base64 -w 0
-
-Whichever hash function you use, you will need to install the required
-Chicken egg with one of the following commands:
-
- chicken-install -s tiger-hash # for tiger
- chicken-install -s sha2 # for the SHA hashes
-
-double-check, if present, causes Ugarit to perform extra
-internal consistency checks during backups, which will detect bugs but
-may slow things down.
-
-lzma is the recommended compression option for
-low-bandwidth backends or when space is tight, but it's very slow to
-compress; deflate or no compression at all are better for fast local
-vaults. To have no compression at all, just remove the
-(compression ...) line entirely. Likewise, to use
-compression, you need to install a Chicken egg:
-
- chicken-install -s z3 # for deflate
- chicken-install -s lzma # for lzma
-
-WARNING: The lzma egg is currently rather difficult to install, and
-needs rewriting to fix this problem.
-
-Likewise, the (encryption ...) line may be omitted to have no
-encryption; the only currently supported algorithm is aes (in CBC
-mode) with a key given in hex, as a passphrase (hashed to get a key),
-or a passphrase read from the terminal on every run. The key may be
-16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
-hex key, just supply it as a string, like so:
-
- (encryption aes "00112233445566778899AABBCCDDEEFF")
-
-...for 128-bit AES,
-
- (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
-
-...for 192-bit AES, or
-
- (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
-
-...for 256-bit AES.
-
-Alternatively, you can provide a passphrase, and specify how large a
-key you want it turned into, like so:
-
- (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
-
-I would recommend that you generate a long passphrase from a secure
-entropy source, such as:
-
- dd if=/dev/random bs=1 count=64 | base64 -w 0
-
-Finally, the extra-paranoid can request that Ugarit prompt for a
-passphrase on every run and hash it into a key of the specified
-length, like so:
-
- (encryption aes ([16|24|32] prompt))
-
-(note the lack of quotes around prompt, distinguishing it from a passphrase)
-
-Please read the "Security model" section below for details on the
-implications of different encryption setups.
-
-Again, as it is an optional feature, to use encryption, you must
-install the appropriate Chicken egg:
-
- chicken-install -s aes
-
-A file cache, if enabled, significantly speeds up subsequent snapshots
-of a filesystem tree. The file cache is a file (which Ugarit will
-create if it doesn't already exist) mapping filenames to
-(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
-file in the cache and the mtime and size have not changed, it will
-assume it is already stored under the specified hash. This saves it
-from having to read the entire file to hash it and then check if the
-hash is present in the vault. In other words, if only a few files
-have changed since the last snapshot, then snapshotting a directory
-tree becomes an O(N) operation, where N is the number of files, rather
-than an O(M) operation, where M is the total size of files involved.
-
-For example:
-
- (storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata'")
- (hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
- (encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
- (compression lzma)
- (file-cache "/var/ugarit/cache")
-
-Be careful to put a set of parentheses around each configuration
-entry. White space isn't significant, so feel free to indent things
-and wrap them over lines if you want.
-
-Keep copies of this file safe - you'll need it to do extractions!
-Print a copy out and lock it in your fire safe! Ok, currently, you
-might be able to recreate it if you remember where you put the
-storage, but encryption keys and hash salts are harder to remember...
-
-
Your first backup
-
-Think of a tag to identify the filesystem you're backing up. If it's
-/home on the server gandalf, you might call it gandalf-home. If
-it's the entire filesystem of the server bilbo, you might just call
-it bilbo.
-
-Then from your shell, run (as root):
-
- # ugarit snapshot [-c] [-a]
-
-For example, if we have a ugarit.conf in the current directory:
-
- # ugarit snapshot ugarit.conf -c localhost-etc /etc
-
-Specify the -c flag if you want to store ctimes in the vault;
-since it's impossible to restore ctimes when extracting from an
-vault, doing this is useful only for informational purposes, so it's
-not done by default. Similarly, atimes aren't stored in the vault
-unless you specify -a, because otherwise, there will be a lot of
-directory blocks uploaded on every snapshot, as the atime of every
-file will have been changed by the previous snapshot - so with -a
-specified, on every snapshot, every directory in your filesystem will
-be uploaded! Ugarit will happily restore atimes if they are found in
-a vault; their storage is made optional simply because uploading
-them is costly and rarely useful.
-
-
Exploring the vault
-
-Now you have a backup, you can explore the contents of the
-vault. This need not be done as root, as long as you can read
-ugarit.conf; however, if you want to extract files, run it as root
-so the uids and gids can be set.
-
- $ ugarit explore ugarit.conf
-
-This will put you into an interactive shell exploring a virtual
-filesystem. The root directory contains an entry for every tag; if you
-type ls you should see your tag listed, and within that
-tag, you'll find a list of snapshots, in descending date order, with a
-special entry current for the most recent
-snapshot. Within a snapshot, you'll find the root directory of your
-snapshot under contents, and the detailts of the snapshot itself in
-propreties.sexpr, and will be able to cd into
-subdirectories, and so on:
-
- > ls
- localhost-etc/
- > cd localhost-etc
- /localhost-etc> ls
- current/
- 2015-06-12 22:49:34/
- 2015-06-12 22:49:25/
- /localhost-etc> cd current
- /localhost-etc/current> ls
- log.sexpr
- properties.sexpr
- contents/
- /localhost-etc/current> cat properties.sexpr
- ((previous . "a140e6dbe0a7a38f8b8c381323997c23e51a39e2593afb61")
- (mtime . 1434102574.0)
- (contents . "34eccf1f5141187e4209cfa354fdea749a0c3c1c4682ec86")
- (stats (blocks-stored . 12)
- (bytes-stored . 16889)
- (blocks-skipped . 50)
- (bytes-skipped . 6567341)
- (file-cache-hits . 0)
- (file-cache-bytes . 0))
- (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
- (hostname . "ahe")
- (source-path . "/etc")
- (notes)
- (files . 112)
- (size . 6563588))
- /localhost-etc/current> cd contents
- /localhost-etc/current/contents> ls
- zoneinfo
- vconsole.conf
- udev/
- tmpfiles.d/
- systemd/
- sysctl.d/
- sudoers.tmp~
- sudoers
- subuid
- subgid
- static
- ssl/
- ssh/
- shells
- shadow-
- shadow
- services
- samba/
- rpc
- resolvconf.conf
- resolv.conf
- -- Press q then enter to stop or enter for more...
- q
- /localhost-etc/current/contents> ls -ll resolv.conf
- -rw-r--r-- 0 0 [2015-05-23 23:22:41] 78B/-: resolv.conf
- key: #f
- contents: "e33ea1394cd2a67fe6caab9af99f66a4a1cc50e8929d3550"
- size: 78
- ctime: 1432419761.0
-
-As well as exploring around, you can also extract files or directories
-(or entire snapshots) by using the get command. Ugarit
-will do its best to restore the metadata of files, subject to the
-rights of the user you run it as.
-
-Type help to get help in the interactive shell.
-
-The interactive shell supports command-line editing, history and tab
-completion for your convenience.
-
-
Extracting things directly
-
-As well as using the interactive explore mode, it is also possible to
-directly extract something from the vault, given a path.
-
-Given the sample vault from the previous example, it would be possible
-to extract the README.txt file with the following
-command:
-
- ugarit extract ugarit.conf /Test/current/contents/README.txt
-
-
Forking tags
-
-As mentioned above, you can fork a tag, creating two tags that
-refer to the same snapshot and its history but that can then have
-their own subsequent history of snapshots applied to each
-independently, with the following command:
-
- $ ugarit fork
-
-
Merging tags
-
-And you can also merge two or more tags into one. It's possible to
-merge a bunch of tags to make an entirely new tag, or you can merge a
-tag into an existing tag, by having the "output" tag also be one of
-the "input" tags.
-
-The command to do this is:
-
- $ ugarit merge