Ugarit
View Ticket
Login
2014-11-02
01:49 Closed ticket [dae5e21ffc]: Archival mode plus 3 other changes artifact: 30a907ac5b user: alaric
2014-10-29
14:59
Tidied up error handling a bit for [4363bc7631], and added checking for duplication in imports for [dae5e21ffc] check-in: 19aee81ea4 user: alaric tags: alaricsp
2014-10-26
09:07
Archive mode usability FIXMEs [dae5e21ffc] check-in: 0d079797f2 user: alaric tags: alaricsp
08:33
Archive searching now returns the associated import objects [dae5e21ffc] check-in: 6d7c0ddce2 user: alaric tags: alaricsp
2014-10-25
22:04
Archive mode [dae5e21ffc]: support for ($import <prop>) searches, and unit tests for advanced archive search expressions. check-in: 7152d37509 user: alaric tags: alaricsp
19:19
Archival mode command-line interface to search an archive tag for matching objects, available properties, and available values; and to extract or stream files from a vault. [dae5e21ffc] check-in: 80b324f3af user: alaric tags: alaricsp
13:54
Basic command-line interface to vault searching for archive mode [dae5e21ffc]. check-in: 40472e20fd user: alaric tags: alaricsp
2014-10-12
22:51
[dae5e21ffc] Tidied up archive entry property handling, import/export, etc. check-in: 367047cc7f user: alaric tags: alaricsp
22:01
Added a MIME type database, and improvements on extension guessing for archival mode [dae5e21ffc]. check-in: 17289c8f53 user: alaric tags: alaricsp
17:47
[dae5e21ffc] Archival mode VFS support, allowing browsing of the audit trail and extraction of individual files thereof, and viewing of log entries (from [68bb75789f]) in the VFS as per [4e3673e0fe]. Nabbed an old-style vault as the start of [068790c20c] check-in: 2944fc5c08 user: alaric tags: alaricsp
2014-10-11
23:18
Partial progress on VFS exploration of archive tags for [dae5e21ffc] check-in: b318b20412 user: alaric tags: alaricsp
15:58
[dae5e21ffc] archival mode - import CLI interface check-in: a575a3184c user: alaric tags: alaricsp
2014-10-05
21:59
[dae5e21ffc] In-progress work on providing references to the archive import objects from archive entries returned by search-archive. check-in: f8cf5a43f5 user: alaric tags: alaricsp
2014-10-04
15:25
Minor improvements to new archive-mode utilities from last commit, but mainly unit tests for them. Work on ticket [dae5e21ffc] check-in: 7cd3e21f7a user: alaric tags: alaricsp
2014-10-01
11:55
Initial implementation of merge-archive-tags!, list-archive-properties and list-archive-property-values for [dae5e21ffc] check-in: 2de06dd81c user: alaric tags: alaricsp
2014-09-28
00:13
Tests for archival mode [dae5e21ffc] and fixes thereto check-in: d140873f0f user: alaric tags: alaricsp
2014-09-27
11:00 Claimed ticket [dae5e21ffc]: Archival mode artifact: 9fca86ad79 user: alaric
10:59 New ticket [a987e28fef] Snapshots and Archives as Trees. artifact: f53d02fe1f user: alaric
2014-09-24
15:50
Logging into an sexpr-stream [68bb75789f]

Storing of job log+stats into archive import blocks for archival mode [dae5e21ffc]

Refactored .setup file to avoid duplication of junk

Preparations to refactor ugarit-core [5fa161239c]

(As yet untested) check-in: df5776d4ca user: alaric tags: alaricsp

2014-09-23
22:15
Made test.scm a clearer demo of [dae5e21ffc] check-in: 24aa36028b user: alaric tags: alaricsp
22:10
Basic archive searching now works! [dae5e21ffc] check-in: 77767b2998 user: alaric tags: alaricsp
2014-09-21
20:33
Implemented typed tags [08bf026f5a] and listing tag types in the VFS [30054df0b6] and building/maintaining a cache of archive metadata in the file cache [dae5e21ffc]. check-in: 2d2532f32b user: alaric tags: alaricsp
14:22
Starting to add framework for archival mode! [dae5e21ffc] check-in: f6b3c6a00a user: alaric tags: alaricsp
2013-01-04
12:31 New ticket [b5911d321a] Better local caching in the front-end. artifact: 497525901b user: alaric
2012-05-04
10:35 Ticket [dae5e21ffc] Archival mode status still Open with 1 other change artifact: ab1ddeba6f user: alaric
2012-04-16
13:20 Ticket [dae5e21ffc]: 1 change artifact: 48f90cc1c8 user: alaric
2012-04-13
10:26 Ticket [dae5e21ffc]: 2 changes artifact: 30f8826959 user: alaric
2012-04-12
14:14 Ticket [dae5e21ffc]: 1 change artifact: 7a7230e687 user: alaric
2012-04-08
14:26 Ticket [dae5e21ffc]: 1 change artifact: 07ebcc2875 user: alaric
14:24 Ticket [dae5e21ffc]: 2 changes artifact: e9a5be1f9a user: alaric
14:24 New ticket [dae5e21ffc]. artifact: 9e6f9fa0af user: alaric

Ticket Hash: dae5e21ffcd3d517e021d8b855fb86ff7d9a271a
Title: Archival mode
Status: Closed Type: Feature_Request
Severity: UNSPECIFIED Priority: 2_Medium
Subsystem: Archival_Frontend Resolution: Open
Last Modified: 2014-11-02 01:49:53
Version Found In:
Description:
Currently, I have been implementing Ugarit's backup facility through its "snapshot" mode, but it's meant to be a backup *and archival* system.

Whereas snapshot mode takes a filesystem tree and adds it to a chain of snapshots of the same tree rooted at a tag, archival mode takes a filesystem tree and inserts it into a differently-structured thing called a library, also rooted at a tag.

A library is implemented as a chain of snapshot-like blocks, each of which refers to the previous library in the chain, has a small amount of metadata, and points to a contents block, However, the contents is an s-expression stream of metadata entries. Each metadata entry has a hash (pointing to the root block of the archived filesystem tree, which may often be a raw file rather than a directory), then an alist mapping metadata keys to values.

The metadata for a given archived filesystem tree may be superceded by later libraries in the chain, in which case the earlier metadata is ignored.

The library metadata should be cached by the front-end, in an SQLite database, all keyed on the tag name. The hash of the latest library is stored in the cache, so that whenever the archive is opened, it can be compared to the current state of the library tag and the chain followed (processing updates as we go) until the previous point is found, thereby only importing the latest changes. The metadata of a given filesystem tree in the library is the metadata attached to it by the library entry, plus any metadata attached to the top-level library block itself, which is inherited to all metadata created in that library.

The default virtual filesystem presented by the explore command, when it finds a library tag, can present the library chain like a snapshot chain, but the virtual filesystem provided by 9P/NFS/WebDAV/FUSE mode can be configurable to provide multiple views on the archive.

One that comes to mind is to specify a number of metadata keys. The virtual filesystem then has a directory level per metadata key, within which all filesystem trees with the given set of values, matching a global filter restriction, are found. By setting a global restriction of type=music, and giving the directory keys as artist, album and title, we get a nice music browser. Further extensions might be to extend the syntax for directory keys from single symbols that select a metadata key to constructs like (track-number "-" title) to generate compound strings at each level, and configuring what to do with filesystem trees that lack the metadata key in question (the options being to ignore that filesystem tree, or to provide a default value such as "Unknown").


alaric added on 2012-04-16 13:20:06 UTC:
It would be useful to record the exact absolute path and hostname a file tree came from when it goes into an archive, as that can be useful metadata in figuring out what it is later.

Now, when we import a file into the archive by snapshotting it and then introducing a metadata record about it into an archive delta, we must check to see if the file already exists in that archive, so as to not overwrite previous rich metadata with naff initial auto-generated metadata. However, it might be nice to read in the previous metadata and append a new "archived from" entry specifying the hostname, location, and time. As the metadata is an alist, it will be easy to do this as long as "archived from" is a single property, so we can tie together the hostname/location/time triple as a single item.


alaric added on 2012-05-04 10:35:07 UTC:
I should also like to add that the "location archived from" should be represented (for convenience) as four components: hostname, absolutely directory path, filename, and extension.

So the metadata alist of a file might include one or more of:

(archived-from "2011-04-32 22:45:01" "anger" "/home/alaric/projects/foo" "backup" "tar.gz")

One would hope that the extension would be the same (modulo case?) for all the archives, but we can never be sure :-)

When building search queries, it would therefore be nice to be able to say (second archived-from) to extract the second field from archived-from - or maybe even to define a table of aliases so we can say archived-from-hostname.

User Comments:
alaric added on 2014-11-02 01:49:53:

The basics are now there. From the command line, you can import a manifest of objects, search for objects matching a query string, list available properties of objects matching a query string, list available values of a property for objects matching a query string, stream a chosen object to stdout (if it's a file), or extract a chosen object to the filesystem.

Next steps are [9c3ac71f94] for generic property-based explorer in the VFS, and [fff691ada2] for customised views, and [33fd928177] for a manifest generator.

Outside of archival mode, but enhancing its utility tremendously, are a 9p/fuse/puffs client to allow mounting a vault as a read-only filesystem; and replicated storage [f1f2ce8cdc] - with archival mode, the vault starts to become the primary storage for data, rather than just a backup, and so internal vault replication for resilience becomes all the more important.

Future work of note includes an archive tagging GUI [7b6588068f], a public gallery viewer for images [5b07f64457], and support for storing emails in an archive [ea1b7f9ad7].