Evolving ECMAScript into Spice -- a rationale

1 introduction

This document presents some of the ways in which the current definition of Spice differs from ECMAScript 262 and justifies them in terms of the goals that Spice was designed to satisfy. Further changes are likely as work proceeds on merging ECMAScript and Spice.

We attempt to show how these changes can be viewed as evolutionary, rather than revolutionary, steps.

In reading this be prepared for syntax like:

  for ... do ... endfor

  while ... do ... endwhile

  switch ... case ... endswitch

rather than the more familiar "C" style syntax:

  for (...)
   { ... }

  while ( ... )
   { ... }

  switch ( ... )
   { ... }

The new syntax makes it easier for programmers to ensure that control constructs are correctly nested. However, Spice also supports the "C" style syntax for people who don't want to change.

aside. While this document refers throughout to "the [Spice] compiler", this is not meant to preclude an interpreted implementation. The reader should regard compilation as simply the steps required to prepare Spice source for execution, including a variety of checks for legality.

Existing ECMAScript programs cannot be blindly treated as Spice code; they will need conversion. While mechanical translation is possible (and we advise that tools to do this be created and made available) the full benefits of Spice can only be obtained by rewriting code to exploit its new features.

table of contents


1 introduction
2 Spice goals
3 style sheets
  3.1 units
4 scaling
  4.1 modules
  4.2 avoiding gotchas: declarations
  4.3 closing-keyword syntax
5 expressive power
  5.1 array expressions
  5.2 procedure expressions
  5.3 multiple values
  5.4 loops and switches are expressions
  5.5 loops
  5.6 switches
  5.7 enumerations
6 taming properties: classes, methods, slots, and overloading
  6.1 property declarations
  6.2 class declarations
  6.3 uniform reference

2 Spice goals

Spice has several more-or-less consistent goals:

3 style sheets

Style sheets are the original reason for Spice existing. They provide a declarative mechanism for attaching rendering properties to markup and are used to control how XML tags are mapped to flow objects. Such objects are things like paragraphs that know how to flow their text within given margins, images, form fields and tables etc. Objects can also be interfaces to instruments, or databases etc. You can import flow objects from libraries and you can write a few lines of script to extend imported flow objects. This makes Spice very attractive for building Web-based applications using XML.

A style declaration specifies the style properties to be attached to a node in an XML tree that satisfies a node selector. A node in an XML tree has a tag (in HTML terms, the part that says h1 or p or table) and some attributes. A style declaration looks like


  style selector
  {
     name: value ;
     name: value ;
     ...
     name: value ;
  }

The selector defines a pattern that can be used to check whether the style rule matches a given node in the markup tree. The syntax and semantics is that defined by W3C's CSS2 specification. See: http://www.w3.org/TR/REC-CSS2

Selectors match tag names and can place constraints on attribute values, ancestor elements, and immediately preceding peer elements. This allows style rules to be applied progressively as the markup is received over the network, without the need to wait until the full markup tree has been parsed. There is no need to keep the complete markup tree in memory.

The body of the style rule contains one or more named property values. These values are used by flow objects to control formatting and to handle events such as rendering requests. This places a premium on efficient access to style values. You don't want to waste time in a paint method!

To allow flow objects to interpret style property values efficiently, we would like to represent them as symbolic expressions rather than as character strings. These expressions are evaluated at run-time using an environment that provides the appropriate context, for instance, defining "100% " as the current font height or as the distance between the current left and right margins.

Spice doesn't limit style rules to the properties defined by CSS. You can use additional properties as appropriate to the flow objects you wish to use in your application.

The mechanism for applying the rules to markup trees starts with the root node of the markup tree. The display property is used to identify the name of a flow object class. An instance of this class is created and its format method called to format the markup tree. In many cases, the markup tree and the flow object hierarchy are isomorphic, i.e. each node in the markup tree corresponds to a different flow object. However, you are not restricted to this.

3.1 units

Spice allows values to have units attached. This is most obvious in the syntax of numeric literals, where the units can be directly attached: 44cm and 10px are both legal Spice values.

A unit has an associated dimension (eg mass, length, time) and scale (either absent or a number and another unit). Values with units can be added and subtracted if they have the same dimensions (otherwise an exception is thrown). Values with units can be multiplied and divided.

If the scale of a unit is absent, then adding it to a dimensionally-compatible value will generate a symbolic expression, rather than a simple unit value.

aside. This is required so that stylesheets can contain property bindings such as width: 1cm + 2px, where px and cm are both lengths but the scaling is not known until the rendering object has access to the size of pixels.

Spice also has percentage values, generated by the postfix operator %; these also generate symbolic expressions when added to values with units.

aside. There is an issue with the use of % as both the legacy remainder operator and as the percentage operator. Abolishing % as remainder (replacing it with rem or mod) is the cleanest solution, but may fail on legacy issues. A possible hack is to recognise the special cases of % followed by an expression starter (treat it as rem) or not (treat it as percent), but this fails with the current stylesheet grammar.

The procedure resolve(u, c), where u is a value with units and c is a context giving scaling factors for the unscaled units of u, delivers a single value with units calculated using the scaling factors and taking account of any percentages.

3.1.1 historical note

There have been several attempts in the past to define programming languages with support for units, yet current popular languages don't have it. Why is this?

We suspect it is a combination of several factors. First, such languages tried to do all the unit resolution at compile-time, representing the values internally as plain numbers. This makes it hard to write generic procedures operating on values with units (as similarly happened to Pascal's arrays before the various extensions for formal array parameters became common).

Second, the same obsession with reducing unit arithmetic to machine arithmetic meant that representations which involved store allocation (as ours can do) were not considered.

Third, while it is "nice" to be able to operate with units in scientific and engineering programs, careful commenting and explicit scaling will often serve; there simply wasn't the user pressure. However, stylesheet arithmetic needs to be able to manipulate values (such as 4px + 20%) which cannot be represented as plain numbers; once the machinery is in place to support those, the rest of the unit implementation comes cheaply.

If implementations of Spice pay careful attention to the efficient implementation of "small" values in a well-chosen set of units, much of the space (and some of the time) overhead can be saved.

4 scaling

As scripts get larger, it becomes increasingly important to avoid trivial mistakes (eg those due to misspellings) and to introduce ways in whioch a script can be broken cleanly into (reusable) components.

4.1 modules

Spice introduces a simple module system. This allows a script to be made up of multiple separate (reusable) components and provides the primary encapsulation mechanism. The Spice module system should present no problems (apart from new reserved words) to ECMAScript programmers.

All declarations can be marked public or private, with private being the default. A private name is simply not accessible outside its defining module.

Modules are named with a dot-separated sequence of identifiers. Spice does not require there to be any close relationship between the name of a module, the name of a file that may contain it, and the names of any directories that that file may be in; in particular, it is explicitly expected that modules with widely differing names be accessible from within a single directory.

Full module names only appear in module and import declarations. Normally a module will import other modules "in full": their public names are visible to the importing module without qualification. However, a module may be imported qualified, in which case an identifier x within an imported module eg alpha.beta.gamma is referred to as gamma::x; only the last component (the leafname) of the module name is used to qualify identifiers. Where there is ambiguity of leafnames, a module may be imported qualified with a new leafname.

This means that module names do not appear scattered through a module's body, that where they do appear they are not ridiculously long, and that name conflicts can be easily resolved.

4.2 avoiding gotchas: declarations

Implicit declarations (such as are found in many scripting languages) render them susceptible to spelling mistakes. Spice requires declarations.

4.2.1 variable declarations

Spice requires that all variables be declared; this avoids misspelling gotchas. The cost to the script-writer is insignificant (at worst the insertion of a var Name definition earlier in the procedure or top-level code).

As a recovery action, the Spice compiler can automatically declare undeclared identifiers as top-level variables. Whether or not a warning message is generated in browsers could be a matter for user preference settings.

4.2.2 typed identifiers and results

To make code easier to understand for the human reader, and to give the compiler more opportunity to check and optimise code, Spice allows variables, arguments, and procedure results to be typed.

For example, the declarations var x = 3 is Int and var y: String = "z" type their new identifiers. Only values of the specified types can be assigned to those variables. An untyped variable is implicitly typed Any, and can hold any [single] value whatsoever.

Arguments can be typed, as in function f( x: Int ) ..., where x is required to be an Int value. (This permits overloading; see later.) The result of a function can be typed, as in function f() returns Int ...

4.3 closing-keyword syntax

As a general rule, Spice allows most C-style control constructs to appear in an alternative closing-keyword form, and does the same with its new constructs. For example, the C-style

if (E) yes(); else no();

can be written instead as

if E then yes() else no() endif

The advantages of this are that compiler error reporting is better, and that adding (or removing) statements from the then or else arms does not involve adding (or removing) braces.

aside. Additionally it reduces the religious wars about where such braces should be written.

Also it is easier to see the end of a construct, because it is automatically tagged in a way that matches the beginning (making it reduntant to attach explanatory comments to closing braces, as is sometimes seen in C code).

5 expressive power

"Expressive power" means, roughly, that its easy for the programmer to write code that does common things, and that there are ways of re-using code that lets you "think big thoughts".

5.1 array expressions

Spice has an explicit array expression; [E1, E2, ..., En] evaluates to an array whose first component is the value of E1 and so on.

aside. The description above takes no account of multiple values.

5.2 procedure expressions

It is very useful to be able to write a procedure-valued expression without having to define a named function, usually remote from the place where it is used. Spice borrows from other languages the notion of a lambda expression; the form (args => expression), where args is a sequence of arguments as seen in a procedure declaration, represents an anonymous procedure with arguments args and body expression.

Spice lambda expressions can refer to, and update, variables in scope where the lambda expression was written; Spice has full lexical scope.

Spice has a further shorthand for lambda expressions where the body is a procedure call or operator expression where the operands are the arguments to the lambda; those operands may be replaced by holes, written _ possibly followed by an integer, and the lambda-arguments (and arrow) dropped.

Thus _ + 1 is shorthand for (x => x + 1) [where x can be any fresh identifier].

5.3 multiple values

Spice introduces expressions with multiple values. The principal reason for this is to allow procedure calls to deliver multiple results (thus avoiding the need for reference parameters and equivalent hacks).

Multiple values are generated by the comma operator; x1, x2 represents the collection of values consisting of all the values from x1 followed by all the values from x2. This is how the arguments to a procedure call are collected; as a multiple value formed from the argument expressions.

A procedure call (or the execution of an operator) consumes all the multiple argument values; "extra" ones do not escape. The procedure (or operator) checks that the right number of arguments have been supplied, and raises an exception if not.

An assignment may involve multiple values, in which case the target(s) of the assignment must consume exactly as many values as are provided by the source expression. Thus (x,y)=(1,2) assigns 1 to x and 2 to y, and all of (x,y)=1 and x=(1,2) and (x,y)=(1,2,3) are illegal.

The source expression is evaluated first, and then the assignments are done right-to-left. Thus (x,y)=(y,x) will exchange the values of x and y.

A declaration may also accept multiple values: eg var (x, y) = (1, 2) declares x and y, and assigns 1 to x and 2 to y.

implementation aside. We intend that Spice multiple values can be implemented using a single value stack, plus local copies of the top-of-stack pointer or stack depth counter.

Loops (see section 5.5) may deliver multiple values from their break and with result clauses. In addition, if the do is followed by the reserved word all, all the results from the loop body contribute to the result of the loop. Thus

for i from 1 to 10 do all i endfor

delivers ten values, the integers from 1 to 10. This is particularly useful for array construction, where the entire array value may be built all in one go, rather than allocating an array so big and then initialising its elements one-by-one.

5.3.1 controlling multiple values

The expression one E evaluates E and picks the first of its values, throwing away the rest; if E delivers no values, one delivers the special value absent.

The expression none delivers no values, and the expression none E evaluates E but discards all its values.

5.4 loops and switches are expressions

In Spice loops (and switches, and ifs) are expressions which return values. The result from a switch or if is the result from the arm that is executed. The result from a loop is given by a value attached to the break that it executes, if any, and otherwise by the expression in a with result clause. This is convenient for searches.

In a statement sequence S1; S2 any results from S1 are discarded.

5.5 loops

Spice has additional loop structures to those in ECMA 262.

The easy addition is the until loop, which is just a while loop with the test inverted; it is available simply to help loops read better.

The more complex addition is a general (and powerful) for loop syntax which allows several different collections of values to be iterated over at the same time, and without the programmer having to write explicit stepping code.

The for loop introduces a set of bindings which specify that a variable (thereby declared, local to the loop, and immutable) take on the values from some collection in turn until one of the collections is exhausted or until a break is executed or a while or until terminates the loop. For example

for ch in "alphabetical": String do ... ch ... endfor

will bind ch to successive elements of "alphabetical" in turn. The :String informs the compiler (and the human reader) that this is a String iteration (which is obvious when it's a string literal) and allows the compiler to generate efficient string access code. (In particular, it does not need to do bounds-checking on each access to an element of the string.)

for i from 1, ch in s: String with result absent do

if ch == lookFor then break i endif

endfor

This binds i to the integers from 1 upwards, and ch to successive elements of the string s. If one of those elements is equal to lookFor then the loop terminates delivering the appropriate value of i; otherwise it terminates delivering absent.

Note again that the compiler can use unchecked access to the elements of s and lightly-checked arithmetic to increment i, because both identifiers are under its control and the type of the iteration is tightly constrained.

This multiple-iteration approach, combined with loops as expressions and multiple values, makes many searches straightforward.

aside. The iteration techniques for a given type may be known by the compiler as special cases, or may be part of the specification for the types that may appear after the colon; this protocol is not yet fixed.

5.6 switches

Spice allows switch expressions. The value switched on may be any Spice value; it is not restricted to small integers. In particular, switching on String values allows simple table lookups to be done conveniently in-line (for example, decoding a small command set from an input string).

The case clauses in a Spice switch body are not as free and easy as those of C (to reduce gotchas). Each clause consists of a non-zero number of case E: prefixes, followed by a statement sequence. The statement sequence is terminated by the next clause, or by a default:. The expression E is not required to be a compile-time constant, and may deliver multiple values.

When the switch expression is executed, the switched value is evaluated and reduced to a single value as for one E. If there is a compile-time constant case label equal (as tested by ==, which is structural equivalence) to this value, the corresponding statement sequence is executed. Otherwise, the run-time case labels are tested for equality with the switched value, and the first matching one gets its statement sequence executed. If no labels match, the default is executed; if there isn't one, it is as though default: none had been supplied.

This allows the traditional switch on integer values, and a less traditional switch on String values. Spice compilers are expected to do better for String case labels than a simple linear search (for example, by hashing the constant labels and doing a preliminary hash-test on the switched value).

5.7 enumerations

Many programs call for a small set of named values, distinguished from all other values, to be used as handy labels for states -- eg the labels on nodes of a parse tree, or of display types. Languages without these enumeration values typically have to imitate them with constant named integers, which loses all type-safety.

Spice has a shorthand enumeration declaration

define enum EName = Name1, ..., Namen enddefine

to define enumeration values. EName is made the name of a new class type (see elsewhere), with the Namei being constants bound to instances of that type. Each instance has an associated index, which is i for Namei, and a name, with is the obvious name.

Because enumerations are simply shorthand for classes, no new linguistic machinery is needed to handle them.

aside. However, implementors are encouraged to use a concise immediate representation for enumeration values.

6 taming properties: classes, methods, slots, and overloading

The property mechanism of ECMAScript is a neat general-purpose mechanism which allows values to be associated with arbitrarily many named slots of objects. Unfortunately it suffers from serious scalability defects.

While speed and space concerns might be addressed by clever implementations, we feel that it is better to have explicit language constructs that the compiler can optimise freely.

6.1 property declarations

To address gotchas, Spice requires that properties be declared explicitly, using the syntax property Name = Value. For "standard" properties, either that are implicitly declared by the language, or they are imported using the module system (see elsewhere).

However, Spice replaces many of the uses of properties by method and slot declarations in classes.

6.2 class declarations

The notion of prototypes in ECMAScript, where they appear only as dynamic values with no compile-time syntax, is replaced in Spice by classes and class definitions.

The present ECMAScript approach to methods-on-objects relies on (a) the inheritance of properties from prototypes and (b) the assignment of functions into (typically prototype) properties; Spice replaces these with method and slot [instance variable, data member] declarations.

A class definition defines a new class object with a specified name, and says what slots objects of that class have. Each class has an associated prototypical object. All objects have a (fixed) associated class, which is returned by typeof, and the names of classes are legal type names for variable and argument declaration.

6.2.1 method definitions

The typical ECMAScript code for setting up methods is replaced by Spice definitions and actions as show in the table.

ECMAScript Spice
Create a prototype object P Declare a class P
Define a free-standing function f to be the method Define a method f in the class P
Assign f to P.f (no additional action required)
Fetch the method out referring to x.f Refer simply to f
Invoke by x.f(A) Invoke by x.f(A)

The principal differences are that the definition of the method replaces a function definition and an assignment, and that the method is accessed simply by name rather than as a property; the calling syntax remains the same.

define class Example
define method show() { print( this ) }
enddefine

On in the traditional style of syntax:

class Example
{
method show()
{
    print( this );
}
}

method seems preferable to function here, to make it clear that this is a method definition rather than a function.

6.2.2 slot definitions

Just as method definitions both define what happens when a method is called and attach that definition to the class, slot definitions define slots of a class and provide access to those slots.

define class Example2

slot x

slot y: Int

enddefine

Instances of Example2 have two slots, x and y; y is required to be of type Int.

If E is an instance of Example2, then E.x gets the value of E's x-slot and E.y gets the value of E's y-slot.

aside: the use of slot rather than var is intended to make it more obvious that the declaration is that of a object property. If you want to share the value of a slot with all instances of a class, you can mark it as shared.

6.3 uniform reference

Many programming languages [eg Java, C++] that espouse object-orientation, because it supports isolating the details of representation from the use of a type, still distinguish between access to the value of a slot of a data-structure and calling a function bound to that data-structure.

Spice does not do this. The syntax x.f is a call to the procedure f with argument x. The syntax x.f(A) is a call to the procedure f with arguments (x,A). [And thus x.f and x.f() are equivalent.] Slot declarations cause procedures to be generated for each slot; in Example2, x and y are both defined as methods which extract the appropriate slots from instances of Example2.

This uniform approach to methods/functions/slots means that a developer is free to change how an attribute of an object is implemented; it may be extracted with a monadic method or it may be a slot -- the calling syntax is the same. Further, methods are "just" functions with an implicit this argument.

Further, it is not just functions that are callable -- other object classes are callable, too, in particular properties; hence x.p, where p is a property, will "call" p (which in turn will do a property lookup for x).

6.3.1 updaters

Spice's uniform reference would be worthless if it prohibited slots to be updated; how do we preserve the notation x.p = E to assign the value of E to x's p-slot or property?

Every procedure may have an updater. When a procedure call appears as the target of an assignment, it is its updater that is invoked. Every slot procedure has an updater that does the obvious thing. User-defined procedures may also be given updaters (by declaring them with special syntax).

6.3.2 overloading

If two classes both define slots called thing, what happens?

Clearly it's not good enough to declare one of the classes the "winner" and forget the other definition for thing; we'd like to allow x.thing to extract the appropriate value from x, whichever class it is.

Spice permits procedures to be overloaded, that is, to have multiple definitions with different argument types. If an overloaded procedure is called, the "most closely matching" definition is invoked.

Slot and method definitions are overloaded on their first (implicit) argument. Thus different classes may have slots and methods with the same name without interference.

Procedures may be overloaded on any of their arguments, not just or only the first. This means that script writers do not have to write multiple-dispatch code "by hand" when required (ie, implement the so-called "visitor pattern"); Spice overloading subsumes this.

Spice does not limit overloading to methods defined in classes; any function definition can be overloaded. Thus there is no artificial line between "methods in a class" and "functions outside a class".

aside. Spice overloading is dynamic, not static, although if the compiler can optimise away the method dispatch it may. Thus an overloaded method can be passed as a parameter and used on objects of different type within the called procedure -- unlike, say, C++ and Java.

6.3.3 dynamic method definition

Spice does not presently have a mechanism for dynamic method definition (whereas present ECMAScript has only that).

If the syntactic approach that Spice uses is inadequate in practice for scripting, then additional procedure-call syntax can be added to give similar effects.