Evolving ECMAScript to add style rules and boost performance

A comparision between ECMAScript 262 and Spice

Dave Raggett <dsr@w3.org>
25th September 1998

This version incorporates a number of syntax changes inspired by discussions at ECMA. For instance, the use of 'class' rather than 'prototype' and the ability to use reserved words as identifiers.

Abstract

Application developers frequently find the limitations of HTML get in the way. Wouldn't it be nice if you could just add your own tags and write a few lines of script to handle them? Well HP has been busy developing a solution based upon a new way to combine style sheets and scripting, building upon CSS and ECMAScript. The result called "Spice" allows you to add new style properties and to define or import support for new ways to render XML tags. This paper suggests how to extend ECMAScript 262 to add style rules and boost performance for flexible object-oriented scripting.

HP has submitted Spice to the World Wide Web Consortium, see

http://www.w3.org/TR/1998/NOTE-spice-19980123.html

Style Rules

Style rules are an addition to ECMAScript and necessitate the introduction of "style" as a new reserved word. The application of style rules to markup trees is handled through libraries. This allows for different choices in how XML is parsed, e.g. do you keep the entire parse tree or just the ancestor elements and immediately preceding peers; do you apply style rules eagerly as tags are parsed or do you wait until the parse tree has been fully built?

Style rules conform to the definitions in the W3C CSS2 specification for the precedence of selectors and the cascading mechanism for associating style properties with elements in the markup parse tree. Note that you are not restricted to the properties defined in CSS2, and are free to use others as appropriate to new kinds of flow objects.

Style rules: style keyword followed by CSS2 selector and property list:

style em { fontweight: italic; display: inline }

Style rules are sorted by specificity for the tag to which they apply:

style p { fontfamily: "Times New Roman", serif; display: block }

style p.note { fontfamily: "MS Comic Sans", "sans-serif" }

As a result, p tags with the class "note" get the properties:

   fontfamily: "MS Comic Sans", "sans-serif"
   display: block

while other p tags get the properties:

   fontfamily: "Times New Roman", serif;
   display: block

Hyphenation

CSS uses hyphens within style property names. This presents problems in a scripting language as the hyphen will be confused with the infix minus operator. As a result Spice doesn't allow you to hyphenate property names.

What about hyphens in tag names and attribute names? One idea would be allow hyphens when immediately preceded by a backslash character, e.g. my\-tag which matches "my-tag". Alternatively, you could use the hexadecimal Unicode character code escape sequence as in my\u002Dtag. Note that a general escaping mechanism is needed to cater for the liberal definition of namechar in the XML 1.0 specification. This should perhaps apply to all ECMAScript identifiers.

Flow Objects

The display property names the flow object class for formatting the markup element the rule applies to. The above examples use inline flow objects for em elements and block for p elements.

Flow objects are imported from libraries or written directly in the scripting language. The process of applying the style rules to a markup tree creates a corresponding hierarchy of flow objects.

All flow objects must support a small set of methods:

   object.format(element);  // format markup element
   object.append(child);    // append flow object as child

Additional methods support rendering to a window, reformatting upon changes, e.g. when the window is resized, and dispatching of events.

Markup elements must support the properties:

   element.attributes;      // the set of attributes
   element.content;         // the element's content
   element.style;           // the element's style properties

Each attribute must support the properties:

   attribute.name;          // e.g. href
   attribute.value;         // e.g. monica.jpg

The format method allows the flow object to look at the elements attributes and style properties to determine what styles to adopt. The format method is called after the new flow object has been appended to its parent flow object. This ensures that the flow object can ask its parent flow object for inherited styles, for example font size and family.

At the minimum, style property values can be parsed as a sequence of lexical tokens, represented as text strings. For greater efficiency you can represent property values as symbolic expressions. Simple property values are identifiers such as bold or center or string literals such as "Times New Roman". Compound property values may include both comma and space separated items. You can also use operators like + and / for expressions, and functions such as rgb(128, 128, 255).

Spice defines a number of units such as "cm" for lengths, for instance 12cm. Other common units are "px" for pixels, "pt" for points and "em" for em's where one em is the same length as the current font height. Less common are "s" for seconds and "Hz" for frequency. "%" is used for percentages, e.g. for expressing the font size relative to that used by the parent flow object.

Units

The common case is a number followed immediately by an identifier indicating the units of measure, e.g.

12pt        // height in points
2.5cm       // distance in centimetres
3s          // time in seconds
1000Hz      // frequency in Hertz
120%        // 120 percent

To allow units to be used with variables, a different syntax is appropriate to avoid syntactic ambiguities. For instance:

units(x, pt)    // x is in points

Spice allows you to declare units in terms of other units. In some cases this allows the compiler to reduce expressions at compile time. For some units such as the em (the height of the current font) you don't know the conversion factor at compile time. The meaning of the % unit also depends on the context, e.g. the height of the parent font, or the distance between the current left and right margins. The evaluation of symbolic expressions involving units is carried out in the context of an environment that provides the means to convert units. This environment is set up by flow objects to contain the current definition of 1pt, 1em and 100% etc. Note that the conversion of lengths such as points and inches to pixels is not fixed and can be adjusted to zoom the document's contents.

Units can be handled in a similar fashion to semicolon omission. One way to think of this is as a filter between the lexer and the parser. The filter inserts missing semicolons and recognizes tokens acting as units of measure, labeling them as postfix operators. This approach avoids the need to treat units as reserved words, and leaves you free to use compiler generator technologies such as yacc for the parser.

The % token is an interesting case as it also serves as the remainder operator. The filter can use a simple rule to identify whether a % token should be considered as the postfix percent operator or as the infix remainder operator: Choose the infix role when followed by a term that unambigously starts an expression, otherwise chose the prefix role. By making % bind more tightly than + the expression "10% + 3" is always interpreted as (10%) + 3. While in "10% (+3)" or "10 % 3" the % is interpreted as remainder.

Importing Flow Object Classes

You will often want to use existing libraries of flow object classes. Such libraries could be written in Java, or C++ or even in ECMAScript. The import statement allows you to import named flow objects.

Interoperability across vendors and platforms is crucial to the Web as it greatly increases the number of people who can read each document. This in turn encourages the creation of content, and helps to explain why the Web has grown so dramatically.

Spice style sheets support interoperability by decoupling flow objects from their implementations. A Spice library (aka spice rack) names a set of flow objects with support for particular style properties. Each library is identified by a URL.

Libraries are decoupled from their implementations. This makes it practical to provide different implementations of a library for each platform. Each flow object has a name such as "paragraph" that is local in scope to the library in which it is defined. The import statement is used to import flow objects from libraries, e.g.

    import document, block, inline from "http://www.w3.org/Style/std.lib";

Here 'document' and 'inline' etc. are names of flow objects from the (hypothetical) library identified by the URL <http://www.w3.org/Style/std.lib>.

The implements statement is used to specify implementations for particular libraries, e.g.

    "css.spice" implements "http://www.w3.org/Style/std.lib" on "Spice";
    "css.jar" implements "http://www.w3.org/Style/std.lib" on "Java";
    "css.cab" implements "http://www.w3.org/Style/std.lib" on "ActiveX/win32";

Here "css.jar" and "css.cab" are relative URLs, defined as relative to the URL for the current style sheet. You can also use absolute URLs. The on keyword precedes a string naming the platform that this implementation applies to.

In the absence of a matching implements statement, the import statement expects to get the implementation from the URL specified by the from keyword. If that too is missing, you can simply list the URLs for the files you want to import, for instance:

    import "housestyle.css";    // imports a CSS style sheet
    import "koolbits.spice";    // imports a Spice style sheet

Greater Performance

As the scripting language becomes more powerful, its performance will correspondingly become more important. Unfortunately, ECMAScript 262 presents barriers to effective compilation:

The difficulties

Objects are bound to a set of properties. This set can be changed dynamically. The binding can be represented as a view from objects to properties or as a view from properties to objects. ECMAScript allows each property to be accessed by name or by number. The numbers needn't be contiguous.
Accessing a named property of an object involves a lookup process. The cost can be reduced by caching the result either with the indexing mechanism on the heap, or as local pointers in function definitions.
Properties may be inherited from prototypes, which may be changed dynamically. This needs to be taken into account in determining when the cache is invalid.
For function definitions, the compiler can allocate local pointers to cache lookups for properties accessed within the function definition . Code within loops can now use the local pointer rather than the heavyweight lookup mechanism.
If the code includes a function call, the compiler can't be sure that the local pointer remains valid after the call. Unfortunately, this is a particularly common case. Threading would also pose problems if two threads accessed the same property concurrently.

A fresh approach

Dramatic improvements in speed and memory needs can be achieved by number of adjustments to the language semantics. The first step is to provide an explicit syntax for declaring the properties and methods of classes. For instance:

  class Warning extends block
  {
      method format(element)
      {
          this.style.borderStyle = solid;
          this.append(new Text("Warning!"));
          ProcessChildren(element, this);
      }
  }

The next step is to preclude the ability to change methods by assigning a new method at run-time. This ensures that the compiler can exploit a static dispatch mechanism. This limitation is not unduely restrictive, as languages such as C++, SmallTalk and Java demonstrate. In ECMAScript 262 you had to define class methods by assigning functions to properties, but this need goes away with the new syntax.

To allow efficient code to be generated for local variables, the compiler wants to be able to allocate variables to known machine registers or known positions on the stack frame. This becomes possible if you can identify the scope of a variable statically. This precludes the ECMAScript with statement, and the use of eval with local variables. This is hard to work around because you don't know whether or not a procedure you call invokes these to look at this function's environment.

Yet another way to improve performance is to allow people to provide optional type information, for instance to specify that a given variable will always be a number. For large data structures, knowing the types can dramatically reduce the time the garbage collector spends tracing the heap.

ECMAScript 262 reserves a number of words for future use. Spice takes advantage of these where appropriate, but needs to reserve a few more, e.g. "style" and "method". To allow the language to grow it should be possible to add to the set of reserved words in revisions to the language. At the same time, older scripts in which these words have been used as variables should continue to work. How can this be arranged?

One idea is for names occurring in variable declarations or the left hand side of assignments, or as class or function names to be treated identifiers regardless of whether or not they are reserved words. An exception is when the word has previously be used in its intended role as a reserved word. This makes it practical to detect typographical errors where an identifier has been mistyped so that it clashes with a reserved word.

Further details are given in a companion papers.