ARGON
Documentation
Not logged in

IRIDIUM is a low-level networking protocol that sits underneath both MERCURY for inter-cluster and WOLFRAM for intra-cluster communications; it is not directly accessible to user code. It's implemented in CHROME as CHROME does not depend upon it, so it can be! As an ARGON system component, it has access to the basic networking interfaces of HYDROGEN, which may either be raw Ethernet interfaces (in which case it has to provide an IP stack as well, shared with FLUORINE); or on "hosted" HYDROGEN platforms, it will be given access to higher-level "IP address" devices that wrap the underlying TCP/IP stack.

It provides both connectionless and virtual-circuit communication of arbitrarily sized messages between nodes, offering flow control, retransmission, bandwidth reservation (where available), and optional sequencing of messages within a virtual circuit. It also provides for the recipient of a message to reply, with the reply and the acknowledgement of receipt of the original request being carried together, and the reply being automatically tied to the request.

I've written on my blog about the rationale for IRIDIUM.

I plan to implement it on top of UDP for Internet use, and on top of raw byte streams for use over point-to-point links (with full bandwidth reservation capability) before long to provide real time networking.

It is a design requirement that small requests should be tranmissable in a single UDP datagram, with no requirement to negotiate any kind of connection with the server, and that small responses should likewise fit in a single UDP datagram. This is to minimise the latency for various common message transactions.

Consider a DNS server; a single UDP datagram containing a request comes in, and the response goes out as a single UDP datagram. There is no need to establish any connection state with the DNS server beforehand. Therefore, DNS servers manage to serve a vast number of clients with very little resource usage, and very low latency.

When a virtual circuit is established over UDP, a unique source port is allocated for that VC. This is to allow bandwidth reservation for that VC using RSVP, if available.

When a virtual circuit is established over UDPv6, as well as the unique source port, the IPv6 flow field can be used to enable bandwidth reservation with RSVPv6.

The basic operations made available by IRIDIUM, specified in terms of a Java-like garbage collected host language (although not assuming any object orientation), are:

IridiumHost resolveUDPv4Host(
	byte4 address,
	unsigned short port,
	unsigned int estimatedOutboundBandwidth,
	unsigned int estimatedInboundBandwidth,
	unsigned int estimatedMTU,
	unsigned float estimatedDatagramLoss,
	unsigned float estimatedDelay)
IridiumHost resolveUDPv6Host(
	byte16 address,
	unsigned short port,
	unsigned int estimatedOutboundBandwidth,
	unsigned int estimatedInboundBandwidth,
	unsigned int estimatedMTU,
	unsigned float estimatedDatagramLoss,
	unsigned float estimatedDelay)
IridiumHost resolveStreamHost(
	platform dependent identifier of a serial or parallel
		stream device, eg a UNIX fd,
	unsigned int outboundBandwidth,
	unsigned int inboundBandwidth,
	unsigned int MTU,
	float estimatedDatagramLoss,
	float estimatedDelay)

Returns a Host object for the given remote node. If there is already a Host object for that node, then it should return the same Host object rather than creating a new one. The reason for this is that a Host object contains counters used for flow control to the remote host across all virtual circuits and connectionless traffic. Different UDP destination port numbers return different IridiumHost objects, but sharing the same underlying flow control counters.

No actual communication is performed at this stage.

For stream connections, the MTU must be specified. Otherwise, existing estimates of network capability can be provided as starting points for the flow control and path MTU discovery algorithms - but all zeroes can be given if no existing information is known.

void sendMessage (
	IridiumHost destination,
	byte[] message,
	AsynchRequestCallback callback,
	boolean idempotent,
	byte dropPriority,
	byte deliveryPriority)
void sendMessage (
	IridiumVC destination,
	byte[] message,
	AsynchRequestCallback callback,
	boolean idempotent,
	byte dropPriority,
	byte deliveryPriority)

Send a message to a host (in connectionless mode) or down a virtual circuit.

This function returns immediately, and the requests is queued with the flow control and retransmission engine, which runs asynchronously. The callback is invoked when the message is successfully delivered, or delivery fails.

For a message with a drop priority above zero, the callback may never be called, because IRIDIUM is not guaranteeing delivery anyway.

The idempotent flag, if set, means that IRIDIUM need not try to avoid the message arriving more than once.

The dropPriority is a QoS parameter; if it is 0, then the message must be delivered, and the IRIDIUM layer will ask the remote end to return an acknowledgement. If the ack is not received in reasonable time, then it will be retransmitted until there is reasonable certainty the remote host is not reachable, at which point it will throw an exception. If it is not zero, then it lets the underlying protocol know that in the case of congestion causing router buffers to overflow, the message can be dropped; messages with a higher drop priority should be dropped in preference to those with a lower priority.

The deliveryPriority is another QoS parameter; if a link in the underlying physical network is congested, then messages with a higher deliveryPriority should be sent down the link first.

When implemented on top of a datagram protocol like UDP, then if the message is small enough, it may be sent as a single datagram. Otherwise, it is instead sent with a protocol styled upon NetBLT, where a single datagram is sent indicating the size of the message (along with a message ID) followed by a stream of data packets, each tagged with their offset into the message. In this case, the protocol inherently guarantees delivery, even if not requested to do so by the application. Flow control is maintained on a per-host basis rather than per-message, so individual request datagrams and the datagrams that comprise large requests are all governed by the same flow control.

If the transmission is going to take some time, then the asynch request callback may be called from time to time to be told transmission is in progress, and given both a percentage progress and an estimated time to completion (based upon the bandwidth estimates kept by the flow control system).

IridiumRequest sendRequest (
	IridiumHost destination,
	byte[] message,
	AsynchRequestCallback callback,
	boolean idempotent,
	byte deliveryPriority)
IridiumRequest sendRequest (
	IridiumVC destination,
	byte[] message,
	AsynchRequestCallback callback,
	boolean idempotent,
	byte deliveryPriority)

Send a message to a host in connectionless mode, or down a virtual circuit, then get a response back.

The sending of the original message is done exactly as per sendMessage, except a flag is set indicating that the far end must send back a response. If the code at the far end that generates the response completes quickly enough, then the response message counts as the acknowledgement of receipt of the request; if it appears to be taking longer, then an acknowledgement datagram may be necessary to prevent the client from retransmitting the request.

The response is then sent back in much the same manner as any other message, except marked as a response, and specifying the message ID it is in response to. All the same logic about whether it will fit into a single message or not, retransmission, and flow control apply.

There are eight possiblities; each of the request and the response may either be small enough to fit in a single datagram, or not. And the response may be ready in time to piggyback the acknowledgement of receipt, or not. Each combination offers different possibilities for piggybacking acknowledgements on requests, or acknowledgements not being needed (in the case of the large-message transfer protocol being required on the request).

If either transmission of the request or reception of the response is going to take some time, the asynch request callback may be invoked with the percentage completion of either phase, and an esimated time to completion of that phase.

The async request callback is, finally, invoked with the result, once it has arrived.

void cancelRequest(IridiumRequest request)

Cancel an outstanding request. A reliable message is sent to the recipient of the request quoting the request ID and asking for it to be cancelled. If the request is still running when it gets there, then it is told to cancel itself.

unsigned int getOutboundBandwidthEstimate (IridiumHost destination)
unsigned int getInboundBandwidthEstimate (IridiumHost destination)
unsigned float getDelayEstimate (IridiumHost destination)
unsigned float getDatagramLossEstimate (IridiumHost destination)
unsigned boolean supportsBandwidthReservation (IridiumHost destination)
unsigned long getTotalBytesSentTo (IridiumHost destination)
unsigned long getTotalBytesReceivedFrom (IridiumHost destination)

Retrieve approximate details about the network between here and the host.

The delay is the time taken (in seconds) for a zero-byte message to get to the remote host; not a round trip time.

The bandwidth estimate is the number of bytes that can be transferred in a second, in each direction.

The datagram loss estimate is the approximate fraction of datagrams that get lost in transit.

supportsBandwidthReservation indicates whether there appears to be support for reserved bandwidth in virtual circuits to the remote host. Bear in mind that this may change unpredictably; a current value is no guarantee of future values.

These statistics may take some communication with the far end to actually settle to useful values, as they are updated based on past performance.

IridiumVC openVC (
	IridiumHost destination,
	byte[] message,
	AsynchRequestCallback callback,
	byte deliveryPriority,
	boolean orderMessages,
	unsigned int outboundBandwidthRequired,
	unsigned int inboundBandwidthRequired,
	IridiumVCHandler handler)

Attempts to open a virtual circuit.

The attempt carries an application message with it, which can contain things like login details that the server may use to decide whether to accept or reject the connection. This is unlike TCP, where one must perform a three way handshake before being able to authenticate.

All of the parameters are as per the sendMessage and sendRequest procedures, except for orderMessages and bandwidthRequired.

If orderMessages is set, then messages sent on the resulting VC will arrive in the same order they were sent; if the far end receives a message without having received the message before it, it will buffer that message and not pass it to the server application until the missing message has been delivered.

If bandwidthRequired is more than zero, then an attempt will be made to reserve bandwidth. If bandwidth reservation is not possible due to lack of support in the protocol beneath or there is insufficient bandwidth available, then an appropriate exception will be passed to the AsynchRequestCallback. The presence of a successful bandwidth reservation implies that the VC can carry messages up to the requested bandwidth without causing a rise in message loss (leading to them never arriving if they have a drop priority, or to increased latency due to retransmission if not) beyond that which is inherent in the physics of the link (which will hopefully be zero). VCs are welcome to send more data than they have reserved bandwidth, but they may then suffer message loss due to link congestion.

The recipient may refuse the request, in which case an exception is passed back via the callback.

If the connection is established, then the IridiumVCHandler instance is notified of any incoming messages or requests, request cancellations, or connection closures that occur on the VC.

closeVC (IridiumVC connection)

Closes a virtual circuit.

IridiumListener listenUDPv4 (
	unsigned short port,
	IridiumHandler handler)
IridiumListener listenUDPv6 (
	unsigned short port,
	IridiumHandler handler)
IridiumListener listenStream (
	platform dependent identifier of a serial or parallel
		stream device, eg a UNIX fd,
	unsigned int outboundBandwidth,
	unsigned int inboundBandwidth,
	unsigned int MTU,
	IridiumHandler handler)

Sets IRIDIUM listening on a given UDP port or stream device.

The IridiumHandler object has callbacks to handle incoming messages, requests, request cancellations, and VC open requests.

stopListening (IridiumListener listener)

Closes down a listener.

Implementation

That the system maintains a table of hosts, for flow control and liveness checking purposes. As opposed to TCP, which does this seperately for each connection. This allows a lot of savings in the protocol's use of the underlying network when there is a lot of communication to the same host, at little more cost than a TCP connection when there is not.

To Do

I need to define the above API to provide multicast and anycast communication models, to take advantage of these services when provided by underlying protocols like IPv6 and IPv4 with MBONE - then write it up in Docbook as a draft specification.