There's no shortage of pundits bemoaning the poor security in current operation systems. At the start of the 1970s, Multics had security features that modern systems still don't match (Paul A. Karger, Roger R. Schell, Thirty Years Later: Lessons from the Multics Security Evaluation (IBM, 2002)). The computer systems the commercial world is built upon rarely exceed class "C" in the criteria used to evaluate military computer systems (the "Orange Book"), due to inherent limitations in the security architecture of commodity operating systems.
This saddens me. So for ARGON I've designed what I hope is a good security model, that should allow installations to achieve classes "B" or "A".
ARGON doesn't have an "element named" component for security, as it's not the responsibility of an isolated software component; different aspects of security are handled throughout the kernel components, and this page documents how it all fits together. However, there will clearly be a need for a core library of encryption algorithms and tools like operations on classifications and clearances (eg, finding the sum of two, seeing if one is a superset of another, finding out if a given clearance is sufficient for a given classification, and so on) that are needed by the various components, which will go into the ARGON kernel glue code.
Security within the node
User code is run by LITHIUM, by asking an appropriate handler for the type of code. The only handler I'm specifying for now is CHROME, a high-level programming language that compiles to the HYDROGEN code generation interface. HYDROGEN code is able to "do anything"; HYDROGEN provides a single address space, and no way of preventing access to low-level device interfaces. As such, CHROME has to ensure sandbox safety of the compiled code it generates, like a Java virtual machine.
Within CHROME, access to privileged functions is provided in controlled ways by wrapping up privileged capabilities into objects that are injected from outside the sandbox, and can be used by user code within it. There is no way for user code to generate objects with capabilities the code does not already have access to, nor to introspect into objects it is given, to interfere with their operation or obtain their internal state.
Other language handlers might take different approaches, including using hardware memory management to run untrusted code with the ability to manipulate pointers directly.
HELIUM provides resource usage limits for user code, too; handlers are given CPU time and memory limits, and a priority for scheduling access to the CPU and I/O resources, in order to mitigate denial of service attacks through resource starvation.
Security within the cluster
As recommended by the Orange Book, we provide both mandatory and discretionary control for access to information.
Information is assigned a classification, a set of one or more security labels defined cluster-wide by a security administrator (managed by the cluster security entity and stored in the cluster entity). One site might just use labels "Private" and "Public", another might use a more complex hierarchy.
Security labels are connected to each other with a relationship indicating that a label "covers" another; "Private" might cover "Public" because anything trusted to handle "Private" information can also handle "Public". This means that the labels are joined into a directed acyclic graph, but the graph doesn't need to be fully connected. This graph of classification labels is part of the cluster's shared configuration.
Places where information may be stored, processed, or carried from place to place are awarded sets of labels called "clearances", which reflect the maximum secrecy level of information they are trusted to carry and a list of what codewords they are trusted with.
The set of labels in a classification of some data means that anything processing that data needs to be cleared for all the labels in the classification. In other words, adding more labels might reduce the set of things cleared to access it.
The set of labels in a clearance means that the thing with that clearance is cleared to handle any of the labels in the clearance. In other words, adding more labels might increase the set of things it's cleared to access.
To be more precise, something with a given clearance is allowed access to data with a given classification if every label in the classification is either present in the clearance, or the clearance contains a label which "covers" the classification, directly or through a path of multiple "cover" links between labels.
For instance, if the cluster's security label graph is:
- Public
- Company Sensitive (covers Public)
- Customer Private (covers Public)
- Customer Payment Details (covers Customer Private, and therefore Public indirectly)
...and a node is cleared for Customer Payment Details, it will be able to process volumes classified as Public, Customer Private, and Customer Payment Details (or any combination of the above), but not volumes classified as Company Sensitive, even when in combination with other labels such as Public.
As well as the storage of volumes, mandatory access control is used to drive the encryption of data in transit between nodes, so that communications links are not given access to data they are not cleared to view.
The cluster has a set of available encryption algorithms, each with a clearance assigned; it is assumed that classified information can be carried over untrusted links if it is encrypted with an algorithm whose clearance rates it for that classification. The cluster configuration also stores a list of "communication groups" of nodes that are connected by particularly featureful network links. The security configuration of the cluster can assign clearances to these groups, allowing encryption to be foregone for classified information that is communicated purely within those groups. This is intended to allow dropping the encryption overhead between machines located entirely within a secure facility, where network traffic between them cannot leave that facility.
The classification level of a message is specified by its sender. WOLFRAM messages replicating TUNGSTEN data will classify the messages based on the classification of the volume the entity is from.
Where these checks happen
Every volume has a set of security labels, which is the classification of the information in the volume, and also the clearance the volume has to store information.
The cluster security volume's classification is the set of security labels available in the cluster - no other volume can be more highly classified.
Every node has a clearance, which is also the classification of the node's volume.
Every entity has a classification, which is also its clearance. The volume storing the entity must be cleared for the classification of the entity.
OPEN QUESTION: How is the classification of a newly created entity assigned?
WOLFRAM will not permit sending a message to a node that is not cleared to receive it.
WOLFRAM will permit sending messages over communications links not cleared to read them, but will use a level of encryption that is trusted to that clearance level to protect the messages in transit.
Nodes may have TUNGSTEN persistent storage devices attached. Those devices are configured with a set of security labels (defaulting to the clearance of the node) that function as the clearance the device has to store information, and which must be less than the clearance of the node. For any storage volumes with less clearance than the node itself, TUNGSTEN will use a level of encryption that is trusted to the clearance level of the node to store information. For storage volumes with the same clearance as the node, information will be stored unencrypted.
Intra-cluster MERCURY will not permit sending a message to an entity that is not cleared to receive it. Intra-cluster MERCURY will send messages classified as per the classification of the volume containing the recipient entity, which is known in the cluster configuration.
The sender of a MERCURY request may request any classification if it desires, and the recipient may demand any classification for a particular endpoint (rejecting incoming requests without sufficient classification), which may cause the message security level to be raised further than the default, but it can never be lessened.
Security between clusters
Security between clusters is more interesting. Any cluster may contact any other over the public Internet via MERCURY, packet filters permitting.
Every volume has a public keypair, the private key of which is known to every node trusted to store data or perform computation for that volume, and the public key of which is part of the public entity ID of every entity within the volume.
Discretionary access control (and other trust decisions) between clusters are based upon the entities receiving the messages. However, we only actually authenticate the originating volume, and when ensuring that our messages cannot be snooped, we merely ensure that they reach the correct volume. This is because we cannot trust an entity any more than we can trust the nodes in the volume that hosts it. If we trust an entity with some information, we cannot really tell if the node hosting that entity is really sending the information to that entity or some other, so there is no point in authenticating at a finer granularity than the volume.
Every cluster's configuration maintains its own mapping of classifications to inter-cluster communications protection algorithms. The function of such an algorithm is, given the source cluster's public key pair and just the target cluster's public key, to convey sequences of bytes (raw MERCURY/CARBON messages) over a lower-level IRIDIUM transport, while ensuring that only the target volume's nodes can recover the message, that no men-in-the-middle can recover or alter the message without detection, and that the target node can check that the source node sent the message.
Of course, how well an algorithm does this job varies wildly. There's a set of algorithms which only offer signing of the data in transit, that just sends the bytes as-is with no encryption whatsoever. This is fast, so might be used for 'public' classified communications.
On the other hand, a better algorithm might open an IRIDIUM virtual circuit and negotiate a session key, signing requests so that each end can check the identify of the other, then proceed to exchange messages using a modern block cipher with the session key, and frequent re-negotation of said session key.
MERCURY, assuming that algorithms attempt to be fast in the general case by doing session key negotation at VC setup and shutdown, will attempt to cache already-negotiated algorithm channels between nodes and reuse them for more than one communication operation, where no security concerns with doing so exist.
Now, if a node is attempting to send a message to a node in another cluster, it will consider the classification of the message, and will attempt to use the algorithm that cluster is configured to use for that classification. Classification hierarchies are unique to each cluster, but when the message (or initial request to set up shared session keys etc) is received by the destination, it applies its mapping from classifications to algorithms to decide which classification this algorithm represents (from its perspective), to tag the incoming message with.
If the destination cluster does not recognise the algorithm or finds it insufficient for the endpoint, it replies with a rejection, specifying the list of algorithms it would consider sufficiently trusted for the transit classification of the target MERCURY/CARBON endpoint. The sending node then finds an algorithm it will trust with the message (eg, an algorithm associated with the message's classification or higher) that is in the list, and uses that. If there is none, then it must sadly fail!
The minimum required clearance for any given endpoint is listed in the MERCURY metadata in the $MERCURY slice inside an entity, and is also subject to a volume-wide configured minimum set of security labels which are added to every endpoint's clearance. When that metadata is exposed via CARBON, or embedded in an entity ID, it is converted to sets of encryption algorithms which would be considered sufficient for those classifications by the receiving cluster so that other clusters can attempt to find a suitable algorithm for their request and avoid having to retry with a better encryption algorithm.
TODO: Clarify where all these classifications and clearances are stored
Classifications and clearances are cluster-specific as each cluster may have its own set of security labels.
Generally, they should only be exposed at all to members of some kind of "security administrator" group. They need to be stored in cluster/volume configuration somewhere, and interfaces to access them provided via an endpoint that can be ACLed for security administrators to get at.
What interface should be provided to entity code to understand whether some action is permissable in advance of trying it? How should failure to meet mandatory access control restrictions be communicated back in error messages? Do we reveal the security labels involved and explain the mismatch, or just give the contact details of the security administrator(s) responsible?
OPEN QUESTION: Write up / Read down
Under the current proposal, there is no minimum classification level for the MERCURY messages an entity sends, so an entity full of classified information is free to leak it with impunity.
We should define that all message sent by an entity must have be classified to at least that entity's clearance, so that entities can "write up" (send messages to higher-cleared entities) and "read down" (receive messages from lower-cleared entitites).
But we also need to define a framework for security administrators to allow entities limited means to send lower-classified messages (be they outgoing messages/requests, or replies to incoming requests) where the entity is trusted to do so.
Perhaps give entities an optionally overridable "send classification"?
OPEN QUESTION: Mandatory access control on top of MERCURY/CARBON ACLs
The mandatory access control mechanisms described above only protect the state within an entity, and the content of network messages involved in the MERCURY/CARBON protocol between entities.
However, there is also a potential need for mandatory access control alongside the discretionary access control provided by ACLs (as discussed in the Administrator view page).
This could be handled for intra-cluster communications by providing some configuration (controlled at a cluster-wide or volume-wide level, not as part of the entity's state) to to put "access classification" levels on MERCURY interfaces and CARBON slices.
Access to the MERCURY interfaces or CARBON slices would then only be granted if the calling entity's clearance was high enough for the classification of the endpoint.
How to do that for inter-cluster communications is harder to define; perhaps the cluster configuration may contain a set of ACLs that provide access to given clearances for external callers? Or can a trust bridge between clusters or volumes be defined, containing a bilateral agreement to use an agreed mapping between the clearance labels on each side, which are then transmitted along with requests between them?
TODO: Clarify access / storage / transit classifications
I've vaguely wandered between access classifications ("What clearance do I need to access this MERCURY interface/endpoint?"), storage classifications ("What nodes may store, or process, state for this entity?") and transit classifications ("What level of encryption is needed to carry this information on this link?") in the discussions above. Break it up better: explain the meanings of security labels, classifications, and clearances. Discuss the security configuration. Discuss how that security configuration is applied for access / storage / transit.
Certificates
It is possible for an entity to act on behalf of another entity for a while. For example, when somebody logs onto a desktop computer, they (in effect) tell that computer what their user agent entity is, then enter a password so that the computer can demonstrate to their user agent that it's really them. The agent entity then sends the computer your favourite user-interface software and settings, but as you browse CARBON from the computer, the entities you interact with must see the actions as coming from your agent, not from the host entity of the computer you're on, or else they will be unable to make useful access-control decisions.
This could be done by proxying all your activities through your user agent, but that would not be very efficient. Instead, the user interface software keeps a connection to your agent open for the duration of your session, and every minute, your agent sends it a certificate (signed by the volume as coming from that entity), stating that the entity of the user interface is allowed to act as it for the next two minutes.
This certificate is then sent along with every MERCURY or CARBON message issued on your behalf by the user interface (or early on in every virtual circuit, and then left out thereafter). The messages are still signed by the algorithm chosen for communications between the user interface node and the target node, but with the certificate wrapped within. The recipient, upon seeing the certificate, checks that the entity authorised to act on behalf within the certificate is the same entity that's originating the request, checks the certificate is not out of date, and then subsequently considers the request to have come from the entity that issued the certificate for access control purposes (with the identify of the intermediate entity still kept for auditing purposes). It's possible for a request to contain an ordered list of certificates; if the first certificate in the chain authorises the sender of the request A to act as principal B, and the second certificate authorises principal B to act as principal C, then the request will be deemed to come from principal C for all purposes except audit logging (which will record the full chain A/B/C of principals and the unique IDs of the certificates).
The four different discretionary access control mechanisms
In general, there are four different core access control mechanisms:
- Entity based: Access based on the originating entity, where a request is signed by the originating entity's volume key and doesn't certificate or capability, and access is granted based on an ACL entry based on the entity ID.
- Pseudonym based: Access based on a pseudonym, where a request is signed by a pseudonym's key and doesn't use a certificate or capability, and access is granted based on an ACL entry associated with the pseudonym's public key (included in the request).
- Delegated: Access based on a certificate chain, where a request is signed by either the originating entity's volume key or a pseudonym's key and carries a chain of one or more certificates delegating authority from an originating principal, through a chain of other principals, down to the principal signing the request; and access is granted based on an ACL entry associated with the originating principal.
- Capability based: Access based on a capability, where a request is signed by either the originating entity's volume key or a pseudonym's key and may carry delegated authority via a chain of certificates (which now matters only for audit purposes), and access is granted based on the signed and encrypted capability embedded in the request (obtained from the EID, which was minted with an embedded capability)
Each is useful for different things.
Entity based access is the "default": what happens when one obtains a non-capability EID and access it without supplying any certificates or pseudonyms. So it's the easiest to use and think about. It also inherently identifies the originator of the request with an entity, which might then be contacted in return: it has a "reply address".
Pseudonyms are useful as they can create an identity that can be shared between multiple entities, and don't have a "reply address" entity. These can be useful for representing a role. A human user might create a suite of pseudonyms and ask their user agent to identify using different pseudonyms as they switch between different personal roles, for instance. If they have multiple user agents, they might share some pseudonyms between some of them by just copying the keypairs over.
Delegated access is useful in the case of a user interface device. The user's user agent, once the user has authenticated through the device, can grant the device a time-limited certificate to act as the user's user agent entity, or as a persona of the user, without needing to reveal the persona's keypair to the device, nor to proxy all requests made on behalf of the user through the user agent. Other uses may appear, too.
Capabilities are useful for situations where access rights need to be communicable as data (like with a pseuodnym, that can be shared by giving the keypair) but without requiring the accessed entity to maintain corresponding individual ACL entries. A capability EID is just an EID that can be passed around and used to gain those capabilities, with the user being oblivious to it being a capability EID. Given that an entity accessing another entity can attach a capability to its own EID used as the originating principal, it's possible to access a remote entity in such a way as to grant it special access to "call back" to you: without needing to keep a list of all the entities you have contacted. This might be a useful option for a user agent, for instance granting entities you use the ability to send you messages back, without opening message-sending ability to the world in general. You might also voluntarily send your personal user agent EID to third parties (as a QR code on a business card, in a message, etc) with a capability granting them chosen rights to contact you. Capabilities let you do that "blindly", as opposed to needing to obtain the EID of the recipient to add them to your ACL.
Either way, the MERCURY security model is able to meet a wide array of use cases by granting access in different ways. To prevent this being confusing - and confusion in security systems generally leads to the worst kinds of trouble - we need to make sure that appropriate levels of complexity are exposed in different contexts. The explanation of the system here has covered the entire system as a whole, but different views of it will exist:
- Entity code developers will largely just access MERCURY using the EIDs given to them from whatever source (not caring if it's a capabilty EID or not), using the entity's own identity rather than a certificate or pseudonym.
- Situations where a certificate or pseuodnym are used will arise during the design of the system, so entity code will "know" when it needs to adopt such an alternate identity and can take appropriate steps.
- Services that need a certificate to be provided in order to operate on behalf of the caller (eg, when user agents delegate to a user interface) will have a requirement for such a certificate to be provided, so the caller will know to generate one and provide it.
- Entity code that wants to mint capability EIDs will do so when they want to, and use those EIDs in the appropriate contexts. They'll declare the capability labels they use in the $MERCURY slice (or the appropriate slice for the persona schema, if in use).
- Security admins editing an ACL can add EIDs and pseudonyms they obtain from any source to ACLs, or can pick from capability labels declared alongside the ACL, to use as the principals in the ACL.