LMERCURY is the mechanism by which entities communicate with each other. An entity provides a number of "interfaces" via MERCURY, and other entities may access them.
By way of an introduction and some rationale, here's some blog thoughts behind the design decisions.
MERCURY handles the communication of such requests across a network. There's really three cases. When the target entity is in the same cluster as the originating entity, it might be able to handle the request on the local machine, in which case no network need be involved at all. If not, it can handle it elsewhere within the same cluster by using WOLFRAM's communications infrastructure. And if the request is for an entity in another cluster, it will have to use the dedicated inter-cluster MERCURY protocol, which runs over IRIDIUM on a different UDP port to WOLFRAM, for ease of firewalling.
However, the entities themselves need not worry about the distinctions; it's all abstracted away beneath the MERCURY model.
Every entity is identified by an entity ID (EID), which is a binary object. It's not very meaningful to humans - CARBON provides a naming system so that humans need not be exposed to raw EIDs.
The full structure of an EID is:
- EID format version number
- Volume master public key, or just a hash thereof.
- Volume ID version number
- Volume public key, or just a hash thereof.
- Optional signature of the volume public key, against the volume master public key.
- List of IPv4s and IPv6s of inter-volume MERCURY server nodes, organised by priority and weighting in the manner of SRV records in DNS
- Volume-local part of entity ID (variable-length string)
Optional persona field (any IRON object, but generally a signed packet containing a list of a symbol and an arbitrary value)
- Optional list of interfaces (giving their symbolic names and their local numbers) and endpoints for this entity, with endpoints tagged with transit security level numbers, hashcash costs, and any other message requirements for each. This is not an exhaustive list of available endpoints, merely a hint for initial transit security requirements to save making a request that gets rejected due to insufficient extras.
- Optional list of accepted encryption algorithms, each with a transit security clearance. This is compulsory if the previous optional section appears, so that the classifications (which are opaque numbers) can be mapped to available encryption algorithms.
- EID signature (hash of the entire EID apart from this bit, against the volume's public key)
Clearly, this is not a small data structure, nor particularly human-readable.
The validity of an EID can be confirmed by checking the EID signature; that checks for its internal validity, and that it was produced by the volume with that public key. However, a volume is really identified by its master public key hash rather than the volume public key, so even that does not confirm the key has not been tampered with; if the signature of the volume key is present, we can check that against the master key. In the worst case the EID might only contain the hashes of the volume key and the volume master key rather than the full keys, and no signature of the public key, in which case the EID signature can't be checked at all!
So what do we do? Well, every MERCURY server node in the volume can be asked to provide the full content of both keys and the signature of the public key against the master key. So given an EID, one can find a node to provide this extra data, which can then be cached in the calling cluster so we don't need to request it again.
Entity IDs are generated when an entity asks for its own EID. The entity can request whether the EID should contain full keys or just hashes, and whether to include optional parts; and if it does not specify either way, cluster-wide configuration defaults will be used. Whether it's worth including them in the EID or not is really a tuning parameter, based on how likely it is that the EID will need to be verified.
Volume ID changes
A volume is allowed to generate a new public key whenever it wants. When it does so, it must increment the volume ID version number, and then start issuing the new key with EIDs minted thereafter. Clients who have a stored EID include the version number with network requests when they contact that entity, and a new copy of the volume ID will be bundled with the response if it's outdated. When entity IDs are stored in TUNGSTEN, the volume ID is stripped off and stored only once per node, so the volume ID can be quickly updated. This mechanism is also used if the list of addresses of MERCURY server nodes changes.
When MERCURY needs to make contact with an entity, it will use the published IP addresses. However, if the node it tries to contact does not respond, or responds with an error code indicating a node-local problem, it can try another in the specified priority order and with the specified weightings.
The client may choose to adjust the priorities and weightings a bit if it realises a particular node is particularly close in network terms, however, perhaps by having a high number of prefix bits in common between the remote and local node addresses.
The node might reply with an error including a more recent volume ID. If so, then the volume ID needs to be updated, and the search started again with the new volume ID's address list.
Also, the node might reply with a specific redirect to another IP address, which is to then be tried. If that fails as well, then continue with the priority-based order; if the redirect included a newer volume ID, then you have to start again from the beginning.
This could allow an implementation to publish only the IPs of particularly "stable" nodes in the EIDs, while having a larger group of less-stable nodes still able to service requests for that volume. Depending on the desired architecture, the published nodes may proxy requests onward to processing nodes internally, or just reply to clients with redirects.
Interfaces and Endpoints
Each entity provides a number of MERCURY interfaces; each of these is identified by an IRON symbol, which is a name in the CARBON naming hierarchy. However, for purposes of compactness on the wire, each interface is mapped to a small integer, within the scope of a particular entity.
The mapping from interface symbols to numbers is available in the CARBON metadata exposed by an entity (and as CARBON itself runs over MERCURY, this is bootstrapped with the hardcoded knowledge that CARBON is available on interface zero), and may be bundled inside the EID to save the client making a round trip to get it. However, the list inside the EID might be incomplete, and clients not finding the desired interface in the EID can consult the entity via CARBON to try to find it.
The symbolic name of an interface, if resolved in CARBON, leads to a protocol specification that identifies what endpoints are available in the interface, and tags each with a numeric identifier for use on the wire.
The types of endpoint are basically the same as IRIDIUM request types; they might be message endpoints (which are asynchronous message sinks), request endpoints (which produce a reply), or connection establishment endpoints (which look like a request, but if successful, returns a connection handle).
The protocol specification gives a type declaration for the messages and responses for each endpoint; this means that the IRON binary encoding used on the wire can elide outer type tags where applicable, as the type is statically known at both ends.
Connections deserve a little more detailed explanation. When you open a connection, both the client (that originates the request) and the server (that receives it) provide interfaces within the context of that connection. In other words, when you request a connection, you need to provide your own mapping of MERCURY interface identifiers to lists of LITHIUM handlers in your entity that will be made available for the server to invoke as callbacks. At the server end, to accept a request, the server must provide a mapping, too. These are like the interfaces published by entities, except that they are within the scope of the connection. The connection is assigned a unique identity on both the client and server, which is provided whenever the handlers are invoked, so the entities at each end can associate "connection state" with the connection identified if they need to.
Connections exist to give a stateful context for communications, and also as a vehicle to identify groups of MERCURY traffic so they can have bandwidth reservations applied, if necessary.
An important thing to realise about connections is that they exist between entities, not between nodes. A MERCURY connection might be created from one node to another, and this will generally cause an IRIDIUM virtual circuit to be created to handle the connection; but if either node goes down and the entities at each end of the connection try to use it, then it will be re-established to a different node. The IRIDIUM VCs that support MERCURY connections are transient, and re-creatable on demand.
When protocols change, a new interface symbol must be chosen for the new version.
Entities may only declare positive interface numbers; all negative interface numbers, and interface zero, are reserved for special purposes.
The CARBON interface is exported as interface zero by every entity, and is described in more detail in the CARBON page.
Interface -1 is the entity administrative interface, as described in the Administrator's View page.
Interface -2 is the volume administrative interface, also described in the Administrator's View page, but only present on volume entities.
If a message is sent to an EID that contains a persona field, then the persona field is included with the request.
This allows an entity to publish lots of versions of its EID, with different persona fields, and be able to tell which one was used when requests come in.
There are two main security concerns in MERCURY: Access control to entity endpoints, and transit security of the messages in flight from snoopers or tamperers.
For an entity to access an operation on an endpoint on another entity, the request must satisfy both mandatory (configured by the cluster security administrator) and discretionary (configured by the entity itself) access control, as well as anti-spam protection.
The accessing entity
Access control decisions are based around which entity is trying to do the accessing.
For requests between entities within the same cluster, we can trust MERCURY on the originating node to include this information, and we can trust WOLFRAM to protect it from tampering.
For requests from other clusters, we need to cryptographically validate the source EID included in the request, and reject the request if it is not correctly signed by the volume public key.
That gives us the originating entity ID. The important parts of it from an access control perspective are the volume master public key hash (which uniquely identifies the volume), the volume-local entity ID (which identifies the entity within the volume), and the persona field (which identifies the persona the entity is adopting, if it has several). Everything else is unrelated to entity identity and can be ignored.
However, a request may also bear a certificate, stating the the originating entity ID is allowed to act on behalf of another entity ID until a specified absolute timestamp, signed by the volume key of the other entity ID. If the originating entity ID matches the originating entity ID in the certificate, and the timestamp has not passed, then we can consider the request as having come from the other entity ID. However, we should still log the "real" originating entity ID as part of the audit trail, along with the other one!
In principle, a chain of such certificates could be presented with the request, all the way from the originating entity ID to a final other entity ID.
A request may also be anonymous, with no information identifying the caller within.
And finally, a request may be identified by a pseudonym rather than an entity - an arbitrary keypair. In which case, the public key is included in the request, and the request is signed with that public key.
Discretionary access control
Each entity also has an access control list attached to each endpoint in each interface.
The access control list is simply a list of entities that may access it, or alternatively, a blacklist of entities that may not access it.
Spamming is sending lots of unwanted requests via MERCURY, either to try and get attention, to hog a public service, or to try and deny service to others.
As such, alongside the usual access control lists on endpoints, a HashCash cost may be attached. Incoming messages that do not have a hashcash stamp showing sufficient proof of work will be rejected, and the error message will specify how much hashcash is required, before any other security checking is performed. Other forms of payment to use a service can also be attached, but I need to finalise more details of AURUM to decide exactly how that'll work. Suffice to say, the same part of the protocol header where hashcash stamps can be attached will also be where other payment stamps can go.
When a request is being sent to a remote entity on another node, potentially in another cluster, or a response is being sent back, it must be protected against eavesdropping and tampering by the untrusted network. Both cases are considered as "a message" here.
This is represented as a classification level for the information in transit, which must be met by a clearance of the communication channel.
A minimum transit classification for a given operation may be specified by the entity, in which case it will reject requests with a lower classification (prompting the sender to try again with a more trusted communication path). And the sender may use a higher transit classification if they wish to. The entity ID may contain a suggested minimum transit classification for any given operation, to avoid an initial rejection.
Within the cluster
Within the cluster, transit security is relatively easy, and is the responsibility of WOLFRAM. If there is a communication group offering a protected path at the required clearance, then it can be used. Otherwise, an encryption algorithm offering the desired clearance can be chosen and used.
MERCURY will not allow a message with a transit classification higher than the clearance of the target entity to be sent.
Between clusters, classifications and clearances are not comparable - they are just labels with cluster-specific interpretation. However, each cluster can express a classification or clearance by looking up what encryption algorithms it considers trusted at that level, and sending a list of encryption algorithm IDs. The receiving cluster can then work out the lowest clearance it assigns to any of those algorithms, and have a local equivalent of the clearance or classification.
Therefore, the suggested minimum transit clearances in EIDs are given with reference to lists of encryption algorithm codes, and if a message comes in with an encryption algorithm considered insufficiently trusted for the minimum clearance required, a list of acceptable algorithms is returned in the error message. The originating node chooses an algorithm from the intersection of that set and the set it considers adequate (ideally, the least expensive option...) and tries again.
Messages from other clusters are NOT rejected if the transit classification exceeds the entity's clearance - that check should be performed by the sender.
Handling incoming messages
When the MERCURY server stack on a node receives an incoming request, it has a decision to make if the request is for a volume that is not mirrored locally (which it can find out by asking WOLFRAM).
It could reply with a redirect to a public node that mirrors the entity (if there is one).
It can redirect the request internally, using MERCURY over WOLFRAM to pass the request to a node that mirrors that entity.
Or it can run the request locally, causing access to the entity's state to all be done remotely by WOLFRAM to a node that carries the entity.
And regardless of which of the above it chooses, it may or may not perform security checks on the request before redirecting.
What to do should be configurable in the cluster entity!
An incoming message, be it to the primary MERCURY interface of an entity or to a connection interface, is handled by firing off an appropriate handler request to LITHIUM (unless it's one of the special system-provided interfaces, in which case the request is handled by the kernel directly).
A message sent to the primary interface is mapped to a handler by looking in a particular slice in the entity's TUNGSTEN state ($MERCURY). Therein are CARBON tuples mapping endpoints to LITHIUM handler specifications.
If the incoming request message contains a persona field, as the request was made to an EID with a persona field attached, then a different CARBON tuple is looked up, in the (persona_scheme $MERCURY) slice. This lets entities provide different interfaces on different personas.
The other part of the persona field (persona parameters) is simply passed to the handler as part of the request metadata.
Failures: Rejections and Errors
Of course, a MERCURY request might fail. When that happens, useful information for the caller to know what to do about it is provided in the response.
- If the request was inherently erroneous, then a code indicating why is returned.
- If the request was OK but not enough transit security, hashcash, or some other requirement was applied, then a list of the failings are returned so the caller can address them and try again.
- If the request was all in order but the caller lacks appropriate credentials, then we don't want to reveal anything about the actual rule that was failed as it might give away too much, but we can provide the (if available in the configuration somewhere) contact details of a security administrator to appeal to.
- If the problem is that the resource being requested has been censored (sad, but it happens, and it's good to make it explicit when it's so) then the jurisdiction applying the censorship and a reference to the relevant law should be explained in the response. See HTTP 451 for some prior art.
- If the request would have succeeded but this node experienced a technical problem, the response should include a hint as to whether to just try again, to try a different node in the volume at the caller's choice, or to try a recommended node in the volume (a kind of redirect) - the latter potentially indicating a node that's not publicly listed in the entity ID.
- If the node is just too busy, the response should include a recommended delay before trying again, and as per the previous case, an optional hint as to which node to try next.