ARGON
Documentation
Not logged in

HELIUM is the thread scheduler and memory manager; the core mediator of access to the fundamental shared resources of processor time and memory space. It will use the processor state control and signal handling interfaces in HYDROGEN to schedule threads, and to allocate per-thread heaps from a global heap. It depends only on HYDROGEN itself, so must be written as low-level HYDROGEN source code.

ARGON has a somewhat different execution model to UNIX. Under UNIX, or indeed anything even vaguely like POSIX, you tend to have long-lived processes which spend much of their time blocked waiting for user input or incoming network requests. All consuming memory, and all requiring starting at boot time and stopping at exit time, and all requiring monitoring and restarting should they fail.

I think this is inefficient and unnecessary.

Threads

An ARGON node will not tend to have long-running user threads sitting around. Indeed, the design tries to keep blocked threads at a minimum, since they have state encapsulated in them that would be hard to recreate if the node crashed.

HELIUM threads are more like "tasks", with the expectation of a short lifespan. As such, they are very lightweight to create and destroy. They start off with a fixed-size heap that they allocate from by advancing a pointer, and only start reusing freed space if they fill it up, and only start requesting more memory blocks from the global heap when they need to. Threads that do not exceed their initial memory budget have only a single memory block to return to the heap when they terminate, as even thread bookkeeping state can be allocated from that block.

On my blog, I have documented an approach to CPU scheduling designed for ARGON, and a design for a garbage collector for short-lived processes.

How do threads get created? LITHIUM is the software component responsible for that; in response to various stimuli (hardware events, incoming network requests from MERCURY or FLUORINE, scheduled invocations from CAESIUM, distributed processing via WOLFRAM, etc), it reads in handler code from entities in TUNGSTEN and compiles them into executable code with CHROME (unless there's already a cached version in TUNGSTEN), and then runs them in a HELIUM thread.

This radical departure from the POSIX model of processes that block waiting for jobs to do is justified, I believe, because most software these days boils down to event handlers. Web applications are all about reacting to events, as indeed are any kind of network server. User interface code is all about event handling.

There are situations where you need long-running activities, however; but they are generally "batch jobs" such as large database queries, and are all fundamentally invoked as event handlers. The HELIUM scheduling and memory management model supports a natural evolution from lightweight threads into heavyweight ones if they fail to terminate soon and continue to allocate memory, so naturally handles this case. In addition, direct access to threading will be provided to entity code in CHROME, so that bulk data processing can take advantage of multiple processors. Rather than choosing how many threads to create and splitting the workload between them, however, such libraries are encouraged to instead register a "job generator" callback with HELIUM, which will be invoked whenever there is no work available at that priority level or higher, and which will return bounded jobs until there are none left; that way, HELIUM will automatically split the jobs over as many processors as are available, taking into account workloads at the same or higher priority, and how CPU-bound the jobs are, automatically and efficiently. And, of course, it'll be possible to submit a single job to do, which goes into the same pool of jobs.

However, user code will not directly use HELIUM's scheduling interface; WOLFRAM will provide facilities to entity code to request that algorithms be distributed across the cluster. Again, a job generator is provided, which produces jobs on demand, or a single job; but WOLFRAM jobs are specified as an entity handler to invoke with LITHIUM and a job object to pass to it. WOLFRAM communicates with available nodes to register job generators with HELIUM, that will in turn call the distributed job generator and run the jobs in parallel via LITHIUM.

Finally, low-priority "idle job generators" may be configured cluster-wide or for individual nodes, to try to find uses for spare CPU power for grid computing applications.

The CPU scheduling priorities can also be used to schedule access to disk bandwidth (to make higher-priority commits more rapid) and WOLFRAM/MERCURY network bandwidth - bandwidth reservation conflicts may be handled by allowing higher priority tasks to cancel existing lower-priority task's reservations, and the scheduling priority may be used by WOLFRAM and MERCURY to calculate the IRIDIUM delivery priorities.

Blocking

When a thread needs to be blocked for some future event, it is suspended, placed onto a wait queue, and the scheduler invoked to run the next applicable thread.

Wait queues are a facility provided by HELIUM, but they are created and managed by external components that need them, such as device drivers. When a thread wants to synchronously perform some action on a device that the driver cannot satisfy immediately, the driver is responsible for placing the thread onto a queue it has created for that purpose, and having some interrupt handler or other piece of code that picks the thread off of the queue and asks HELIUM to make it live again (which might involve pre-empting a lower-priority thread already running on a CPU, or adding it to the run queue (which is in itself just another HELIUM wait queue) for when a CPU becomes available). Wait queues are concurrent-safe data structures, implemented as lock-freely and wait-freely as possible.

Wait queues automatically order threads by the thread's priority. That means that the priority doesn't just affect CPU scheduling, but also I/O scheduling. However, when a thread is placed on a wait queue, the caller can give a sub-priority value that is used to order threads within the same scheduling priority band; this can be used to give an advantage to threads that tend to release the waited-for resource quickly, in order to improve responsiveness. Wait queues also support real-time deadline scheduling, as discussed in the scheduling blog post; they can allow threads to reserve exclusive access to a resource in advance, according to a regular schedule.

However, some resources aren't accessed by priority alone, as it may sometimes be more efficient to handle some requests than others. How much the device trades off between convenience and strict priority adherance is up to it, as it depends on the details of the device, so the wait queues just provide the tools. Namely, a thread added to a wait queue can be assigned a numeric "bin", and a driver requesting a thread from the queue may request that it receive the highest priority thread in a given bin, or in the nearest occupied bin if there are none in that bin. This might be used by a disk driver to handle all pending writes to a given track (regardless of priority) to avoid the cost of a seek, then to find the track with the highest-priority pending write and go there to deal with all writes to that track, or various other combinations - the wait queues need to have a range of accessors to make it easy to have different access patterns.

Real Time

I hope to imbue ARGON with good real-time programming facilities in the longer run. The scheduler supports real-time scheduling, but that's only as good as the estimates of the time required for handlers to run are honest.

Soft real-time is easy enough, but high-priority interrupt handlers, variable memory access times, and contention over the memory bus between processors are all hard to accurately account for.

Hard real time systems

Hard real time is more challenging. I suspect that only nodes with hard real time hardware (easily-modelled caches, that sort of thing) will be able to provide true hard real time.

However, a soft real time or non real time node can act as an interface between the world of ARGON and a hard real time embedded system. It can communicate via a serial link, to provide an entity that allows interaction with the hard real time node.

Quotas

HELIUM also provides accounting and quotas for CPU time and memory used by a thread. Soft and hard limits can be set - if a soft CPU limit is exceeded, then the thread will be interrupted by a specified signal handler; if a soft memory limit is exceed allocations will fail with an error code until the soft limit is raised. Hard CPU limits will result in thread termination, and hard memory limits will make further allocation impossible. This basic mechanism is built upon by AURUM as a place into which abstract resource credits can be "cashed" in order to use them.