CAESIUM is responsible for scheduling periodic events. Entities are able to react (via LITHIUM) to asynchronous events such as MERCURY requests and messages, but they are not bound to be quiescent until explicitly invoked by some external agency; they can also register to perform periodic scheduled actions. They do this by placing CARBON tuples into their TUNGSTEN state in a section called /argon/caesium/schedule. These tuples look a bit like crontab entries, and refer to a LITHIUM handler ID.

As such, there is no CAESIUM API to entity code, per se - all the entity needs to do is to declare its wishes in TUNGSTEN, and CAESIUM will pick them up.

When changes to tuples in this section are replicated via WOLFRAM to the TUNGSTEN replicas of an entity, the CAESIUM module on each node receiving the update is informed, and updates an index of scheduled activities this node may have to deal with.

The nodes within each replication volume elect a schedule master for that volume, which examines its CAESIUM index for that volume and looks for upcoming scheduled events. When it decides an event is due, it selects a node mirroring the volume to run it on based on current load levels reported by WOLFRAM. It then uses WOLFRAM to send a request to that node to run the specified handler on the specified entity.

The cluster configuration contains an "election priority" for each node; the reachable node with the highest election priority will win the election, or a random one with that highest priority if it is shared. This allows administrators to encourage more reliable nodes to win the elections, preventing scheduling delays caused by re-elections if the schedule node fails or disappears.

Dealing with network partitions

However, an extra field specified in the CARBON tuples that is not found in crontab entries specifies the scheduling semantics in the event of network partitions. When the node cannot see all the other nodes mirroring an entity, it cannot tell if the other nodes are down or merely unreachable. If it decides to perform a scheduled action, then a node on the other side of a network failure might also decide to do it, meaning it gets done twice.

Therefore, there are two choices for the partitioned scheduling semantics: "at most once" and "at least once". If "at most once" is selected, then the system cannot risk the scheduled item happening twice at the same time; it would rather risk that it not happen at all. The schedule will never be run unless a quorum (more than half) of the nodes mirroring that entity are reachable. In the event of a network split, there can never be more than one reachable quorum in effect (although there might be none), so we get "at most once" execution.

If "at least once" is chosen then network partitions do not affect the execution of the schedule. As a network partition will cause a scheduling master to be chosen in each inter-reachable group of nodes carrying the volume, then the scheduled handler will be invoked once in each such group. The only way it can avoid being executed at all is if all nodes mirroring that volume are not running, meaning the entire volume is unavailable.