CHROME is the high-level programming language that the bulk of ARGON itself, and user applications written above ARGON, will be written in. HYDROGEN already provides a programming language, but it's terribly low level, and "unsafe"; HYDROGEN has a significant level of access to the low-level aspects of its own implementation. In HYDROGEN one can write code which writes random numbers to random locations in memory until the system falls over. Therefore, HYDROGEN code cannot be accepted from untrusted sources. Apart from low-level libraries, things involved in bootstrapping, and the occasional bit of hand-coded stuff that requires particularly high performance, as much HYDROGEN code as possible should be generated by higher-level language compilers - such as the CHROME compiler.
The design goals of CHROME are, in order of decreasing priority:
- Safety. It should be possible to run untrusted code, giving it access only to resources explicitly granted, with no way for the untrusted code to obtain access to other resources or interfere with the operation of other parts of the system (although issues such as the consumption of CPU time and memory are outside of the language's scope; HELIUM exists to control those).
- Ease of programming. The language should be simple to understand, and easy to reason about. There should be the minimum of surprises and gotchas. The language should be expressive enough to handle complex concepts naturally.
- Efficiency of implementation. The compiler should be able to easily convert the input source code to efficient HYDROGEN VM code with the minimum of complex time-consuming analysis. There are two efficiencies at stake here - the efficiency of the compiler, and the efficiency of the resulting code. I'm also conflating time and space efficiency, too.
As a language, it's a purely functional language with a strong LISPy flavour, representing code in the IRON data model rather than s-expressions. The IRON data model also serves as the fundamental data model of CHROME itself; CHROME programs manipulate IRON values. Functions are first-class and bindings are lexically scoped, so functions close over their arguments.
Laziness can come with a performance cost, but the Haskell folks have done lots of good work on strictness analysis. In exchange, you can write conditional constructs without needing macros, streams, infinite data structures, some performance gains, and easier semantics for recursive expressions.
Uniqueness typing has many many advantages, so we'll use that instead of nasty monads. But we can do monads too, when they're useful.
Haskell-style typeclasses and generic functions are used to provide polymorphism, rather than class- or prototype-based object orientation.
Hygienic macros (with a module system based on the CARBON namespaces used for IRON symbols) are provided for syntactic extensibility. However, they are "second-class values"; they are bound in the lexical environment just like functions and other values, but they're not fully first-class as there is a constraint that macros in code to be compiled must be statically deducible, in that every reference to them must be through an expression whose value can be found at compile time. Attempting to apply a macro at run time is an error.
Macros are arbitrary code, given access to the input expression as a "syntax object", decorating the original source code with useful metadata such as the position in the original input source code expression (so error locations can be mapped back) and the lexical environments in effect. Lexical environments are first-class objects; in fact, they're CARBON knowledge bases containing name->value bindings, type information, and other arbitrary declarations (documentation, typeclasses, etc).
But for common cases, we'll implement syntax-rules style rewriting macros as a library in terms of the core's low-level macros.
The language defines a range of primitives, for all the usual arithmetic and other core functions that can't practically be implemented otherwise. These primitives are what core language symbols are bound to. They are a subtype of functions, in that they can be applied directly, but unlike functions in general, they allow equality comparison. A macro can tell if an application is a primitive, and which one it is.
Bounded-space tail calls and first-class partial continuations provide for flow control, but I need to think more about how that will interact with the uniqueness types. A continuation may contain many closed-over unique values, so the continuation itself must by a unique object. That's fine for implementing threading with continuations, but what about amb, exceptions, and so on? Perhaps we need a means for capturing a non-unique continuation that only works if the environment we close over in the continuation contains no unique values. Code that handles mutation by passing a chain of unique values through it, but which contains try/throw/catch exceptions, would need the throw to embed ALL unique values currently in scope (but not in scope at the catch) in the exception object to avoid leaving them in the abandoned continuation; so invoking a continuation (and thus abandoning the current continuation) would be forbidden if the current continuation closed over unique values (and was therefore unique). You'll need to explicitly destroy them, or pass them to the new continuation.
There's no need for unwind-protect or dynamic-wind in a pure language, thankfully.
Dynamically-scoped parameters are a boon in Scheme, so we'll have them, too.
Static typing is important, as it can help to detect a wide range of programming errors. However, it can't become a straitjacket. It's useful in Scheme to be able to (read) arbitrary s-expressions. Haskell's typeclass generics go a long way, but it should be perfectly possible to have some bindings that we know nothing static about - which are entirely dynamically typed - alongside bindings with very narrowly defined static types, which can be implemented fully unboxed in the compiled code, with everything specialised, and type violations detected at compile time. So let's have a type hierarchy that's a pointed complete partial order, giving us an "Any" type we can use in those cases.
Dependant types are awesome, too. Let's have those while we're at it. We're now wel into "Type system is Turing complete" territory, so we'd better have function totality as part of the type system to keep our type expressions halting.
Idris does a pretty epic job of making such a type system work in practice, but I do find the Haskelly syntax for types a bit obtuse at times; it's too pattern-matchy for me. The way constraints are written in "design by contract" languages (eg, Eiffel) are much more readable, to my eyes at least - and as far as I can see, identical in expressiveness. Design by Contract and dependent types are much the same idea, just expressed in different ways, so let's bring them together; we can statically check the constraints that are amenable to that, and leave anything else to run time checking. It just becomes another aspect of gradual typing.
Currying is sometimes handy, but (I find) often conflates what's going on; so let's not put currying in the core syntax. We'll still have single-argument lambdas, but we'll pass tuples to them, thereby making it practical to get at the tuple as a first-class value to implement variadic functions. The tuples will normally be unique values, created afresh, so the system will have little difficulty in keeping them off the heap.
In summary, it looks a lot like a Scheme/Clean/Idris hybrid written in IRON syntax, without mutation and with generic functions in the core; and with macros as second-class values.
Clearly, I need to write a lot more detail on this. Details matter tremendously in programming language design, but while major innovation happens at the large-scale of a language design, the details are usually a matter of "not making mistakes" rather than "being clever" - so relatively tedious!
The CHROME implementation appears (to the rest of the system) to be an interpreter, accepting CHROME expressions encoded in IRON and returning the IRON value they evaluate to. However, closures are represented as executable code, so evaluating the definition of a function results in its compilation, something like how FORTH (and, therefore, HYDROGEN) interprets source code to produce compiled code. All the ARGON system components other than HYDROGEN (written in a platform-specific language), HELIUM and IRON (written in HYDROGEN), and the CHROME boostrap are written in CHROME - even CHROME itself; the "CHROME bootstrap" is a very simple and naive CHROME interpreter written in HYDROGEN which is used to run the CHROME compiler, which then compiles itself and replaces the naive interpreter. The boot process of an ARGON node will involve loading the HYDROGEN source code for HELIUM, IRON, and the CHROME bootstrap, compiling the full CHROME compiler, then compiling the rest of the system components from CHROME source code. But the biggest user of CHROME will be LITHIUM, which reads application code from entities, compiles it with CHROME (using a persistent cache of compiled code where useful), and runs it.
System components will be compiled with an initial namespace enriched beyond the standard libraries with interfaces that allow them to inject raw HYDROGEN code, as a kind of "foreign function interface" allowing them to directly access hardware interfaces. User code will not be given this option - any access to hardware facilities beyond the standard library will have to be provided indirectly through system components that expose a "safe" access-controlled interface.
I'm leaning towards the CPS transformation as an implementation model, as documented elsewhere.
The structure of the implementation is a pipeline:
Input IRON source code is converted into syntax objects, node-for-node. Every node is annotated with the path to it from the root in the original source, and is bound to the initial default lexical environment, which is filled with bindings from modules the source code imports.
Macro-expansion occurs, by walking the source tree from the top down, performing partial evaluation; names are looked up in the environment, and the values they refer to are substituted for them, so environments are eliminated as we move down the expression. When an application turns out to be a macro application, the macro is applied to the body syntax object, and the result (with a possibly modified environment, if the macro chose to do so) replaces the subexpression for further expansion. Macros may recursively invoke the expander on their arguments, in order to manipulate the post-expansion code; if the input to the expander had no free variables then this will be fully reduced to the kernel language, but otherwise, it may only be partially reduced (and the expansion of the result of the macro will ensure that it is fully reduced). Where a macro creates new syntax objects afresh, they will inherit the environment and location path from the macro application syntax object; where it substitutes values into a template defined in the macro body, then the nodes in that template will keep their environment and location path from the macro definition; and nodes passed through from the macro call will, of course, preserve their metadata from the input. Therefore, every node in the output has useful metadata.
The result of macro-expansion is an expression in the minimal kernel language, which is then ripe for type inference, and uniqueness and totality verification. The partial evaluation process will have simplified the code to a great extent, by performing applications of known functions to known arguments and the like, and the code is at this point an arbitrary DAG rather than a tree: shared subexpressions may exist.
The result is an expression in the kernel language with full type information and trust that the type information is correct, which the implementation can then interpret or compile as it sees fit.