ECMAScript Proposals

Dave Raggett, 10th December 1998

This is a summary of the ideas that have been discussed for extending ECMAScript to support types, classes, packages, and units. It is intended to stimulate discussion on the details and to prepare the ground for drafting text suitable for inclusion in the ECMAScript version 2 specification.

  1. Packages
    1. The import statement
    2. The package statement
  2. Declaring Types
    1. Core Types
    2. Type declaration syntax
    3. Defining types before use in type declarations?
    4. Typeof Operator
    5. Types of function arguments
    6. Function return type
    7. Arrays
  3. Classes
    1. Basic Syntax
    2. When is this needed?
    3. Early versus late binding
    4. Class variables and methods
    5. Static code
    6. Private, Protected and Public
    7. Static versus Dynamic Properties
    8. Visibility and early/late binding
  4. Interfaces
    1. Sharing a function across several interfaces
    2. Binding the same name to different functions
    3. Casting an object to an interface
  5. Units
    1. Declaring Units
    2. Deferred Evaluation
    3. The percentage unit

1 Packages

The fact that the wireless community have felt the need (in WMLscript) to provide a packaging scheme even on the smallest platforms only serves to emphasise the need to add a package mechanism to ECMAScript.

ECMA 262 rev 2 reserves the words package and import. It would appear natural to use these to define the package mechanism. WMLScript introduces the reserved word use for a similar purpose.

1.1 The import statement

The import statement could be used to import named packages, for instance:

import package-name [version version] [as local-name] [from  source];

The qualifiers have the following definitions:

package-name
An ECMAScript identifier or string literal specifying the package name. It must correspond to the name declared with the package statement in the package itself (see below).

Are package names restricted to ECMAScript identifiers, or can they be dot separated compound names as per Java, e.g. com.acme.animation ?

version
A string literal that allows you to specify a particular version of the package.
local-name
An ECMAScript identifier which can be used to designate an identifier from a given package, even when two packages independently use the same identifier, and allows you to distinguish the different definitions.

What is the default for the local-name? The examples below assume its the package name, or perhaps the last word if the package name is a dot separated compound name.

source
A string literal containing a URI (URL or URN) specifying the source of the package. Relative URIs are allowed and expanded using the normal rules.

Example:

   import graphics version "1.3.1" from "packages.zip";

If a package defines an identifier x which is also defined locally, the local definition always takes precedence. That is to say the x in the package is shadowed by the local definition and is therefore invisible. To get around this you can qualify the identifier by the package's short name.

We have yet to decide on the exact syntax for this, e.g. foo.x versus foo#x versus foo::x. The first one mixes the namespace for package names with identifiers for classes, functions and variables, and has the merit of mirroring Java. It would be an error to define a global variable or a class with the same name as an imported package.

The # syntax keeps the namespaces apart and borrows from URL fragment syntax, and is also used for the same purpose in WMLScript (a subset of ECMAScript defined by the WAP Forum). The :: syntax is used in C++ for binding methods to classes. The following examples assume the dot syntax.

   import graphics;    // package defines identifier x
   var x;

   x = ...             // refer to local x
   graphics.x = ...    // refer to x in package foo

Here is an example showing how the as qualifier can be used to deal with the case where you want to import two different packages which have the same name.

   import foo from "http://acme.com/";
   import foo as bar from "http://utopia.com/";

   foo.x = ...    // refer to x in package foo from acme.com
   bar.x = ...    // refer to x in package foo from utopia.com

It is not an error to import two packages that both define the same identifier. It becomes an error, though, if the program attempts to access an identifier with multiple definitions. This can detected when the packages are loaded or as a run-time error.

Note that the URI must be placed in quotes. This avoids any problem with determining where the URI ends. Chris and Steve are also interested in future proofing the syntax to allow for further extensions, for instance to allow for further qualifiers. As a possible motivation you could think of adding message digests (cryptographic hash values computed using MD5 or SHA) so that you can detect any tampering with the package you asked for.

1.2 The package statement

It seems reasonable to declare packages via package. It should be practical to declare more than one package per file but there is no need for nested packages. The proposed syntax declares the name of the package being defined, and any language extensions used within it:

  package package-name [version version] [with  extensions];
package-name
An ECMAScript identifier or string literal specifying the package name. It must correspond to the name declared with the package statement in the package itself (see below).
version
A string literal that allows you to specify a particular version of the package.
extensions
A string literal consisting of a comma separated list of names of language extensions.

For example:

  package forms version "1.3" with "style";

The scope of the package statement includes all subsequent statements until the end of the file or the next package statement, which ever is sooner.

This avoids the need for enclosing the package's contents within { and }, and the consequent urge to indent everything. It relies on packages not being nestable.

The extensions mechanism provides a way to add language extensions such as the Spice style rules without running into the normal problems associated with introducing new reserved words into programming languages. As an example, "style" is used in the style extension as a reserved word that starts the definition of a style rule. In the absence of the style extension on the package statement, any occurrence of "style" will be treated as a normal ECMAScript identifier. won't be recognized as a reserved word This ensures that any scripts, that have just happened to use "style" as an identifier, will continue to work, unchanged.

A single file can define more than one package, but you can't define a package from multiple files, except via the import statement. In other words, to create a package from multiple files, you first need to define each file as a separate package, and then to import them to define the aggregate package. There is, however, no support for exporting code into packages.

2 Declaring Types

Hitherto ECMAScript hasn't provided any means for programmers to declare the types of variables. Adding this capability would provide two benefits: the ablity to detect certain errors more easily, and the ability to generate more efficient code.

By declaring the type of a variable, the programmer is asserting that the values of this type or a subtype of this type will be assigned to this variable. The default type is any which can be used for any value. Expressions involving variables of this type may lead to run-time errors when a value has a type incompatible with the rest of the expression.

2.1 Core Types

Some fairly obvious core data types that programmers would like to use include : integers, floating point numbers, and booleans. Beyond these it would be valuable to allow for character strings and other classes of objects, as well as arrays (more on these later). A void type would be useful for functions which don't return anything.

2.2 Type declaration syntax

In the teleconference discussions, there was interest in building upon widespread familiarity with Java, C and C++ and to seek a way that matches this to the needs of ECMAScript. There seems to be agreement that we keep the var syntax, leading to:

  var x, int i, boolean flag, float p = 2.5857;

Now consider:

   var int x, y, z;

This raises the question as to whether y, and z are of type int or of type any? Programmers familiar with C or Java will probably expect the former interpretation. If this choice is made, you could still use any to ensure that subsequent variables accept any values, for instance:

  var int x, any y, z;

An alternative to the Java/C prefix syntax for types would be to use a postfix syntax, such as that used by Pascal, e.g.

  var x, i:int, flag:boolean, p:float = 2.5857;

where x is of type any, i of type int, etc.

2.3 Defining types before use in type declarations?

For a scripting language, it doesn't feel appropriate to require people to declare all classes and function names prior to using them in expressions. How much extra work is it to allow types to be used in declarations prior to those types being defined?

2.4 Typeof Operator

The typeof operator allows programmer to obtain the type of any expression and can be used to test the type of a variable, for instance:

  if (typeof(x) == float)
     ....

2.5 Types of function arguments

Types can also be used to constrain the arguments passed to functions, for instance:

   function foo(int i, string s)
   {
   }

2.6 Function return type

ECMAScript also permits anonymous functions, e.g.

    var f = new Function(p1, p2, ..., pn, body);

where the last argument is a string containing statements to be executed when the function is called. The preceding arguments name the function's arguments.

The general feeling was that this syntax makes it undesirable to specify the function's return type before the function's name as this would interfere with the operation of the new operator. Instead a postfix syntax is proposed:

   function foo(int i, string s) returns boolean;
   {
   }

where the semicolon after the return type is optional. For anonymous functions:

    var f = new Function(p1, p2, ..., pn, body) returns boolean;

By providing type constraints on functions, it makes it practical to allow function calls to be dispatched on the types of the arguments. The same function name can then have as many definitions as are appropriate.

Do we want to allow programmers to define functions that overload the ECMAScript operators such as "+"? What syntax is appropriate for this?

Is the default return type void ? Do we want to use a closer syntax to Java, i.e. drop the need to write "function" and allow the type as a prefix qualifier on the grounds that this is what most programmers will write?

2.7 Arrays

What should the type language say about arrays? One fairly obvious idea is to treat arrays as a mapping from integers to other types, e.g. from integers to strings. Should the size or the number of dimensions of an array be part of its type? This isn't strictly necessary given that ECMAScript will provide run-time bounds checking for arrays. However detecting errors at compile-time is often preferable.

The syntax for array declarations can specify the type of the array's items in several ways, for instance:

  int p[10];
  var int p[10];
  var a[64];                 // an array of type 'any'
  var p:array 0..9 of int;   // Pascal like syntax

The first two are more likely to be preferred by programmers coming from the C/Java world. Arrays with unknown size are useful when declaring the types of function arguments, and when you want to constrain variable to be an array of a given type, for instance:

  int p[];   // an array on unknown size?

What about arrays with more than one dimension? Arrays with two dimensions are very useful for transformations in graphics code. Programmers comfortable with languages such as Basic would presumably like a syntax such as p[i,j], where as C programmers might expect p[i][j]. The former seems better suited for a scripting language - I think it presents less mental anguish!

In our discussions, it was agreed that we limit ourselves to one dimensional arrays for now.

3 Declaring Classes of Objects

The objects ECMAScript currently supports are kinds of associative lists and preclude efficient compilation. Smaller and faster code could be generated if programmers were able to declare classes for which the inheritance path and the instance variables and methods are known at compile-time. This doesn't preclude the ability to add new properties dynamically at run-time, although these would be less efficient to access.

We are interested in supporting classes and interfaces, but want to escape the limitation in Java, where the namespaces of the interfaces implemented by an object are automatically merged in the class definition. As a result, a Java class definition is not allowed to bind the same name found on different interfaces on an object to different implementations. COM has no such restriction and the interface namespaces are truly independent.

Unlike Java, ECMAScript supports late-bound types. This means that an expression involving an access to a named property may not be resolvable until run-time. The proposal outlined below combines early and late-bound types with a means to limit visibility of properties and classes via qualifiers (private and protected). The goal is to follow Java's lead where practical.

3.1 Basic Syntax

It seems natural to build upon the syntax used in Java, e.g.

  class foo extends bar
  {
      var i, j, k;      // instance variables

      function a(b)     // instance method
      {
      }
  }

3.2 When is this needed?

The current instance of the class the method is called on is passed implicitly via the hidden argument this. I am uncertain as to whether ECMAScript currently requires programmers to write this.x when they want to access the instance variable x or whether, you can omit the "this." prefix in the body of the method's code.

3.3 Early versus late binding

Here is an example of early versus late binding:

   class Foo
   {
      var x;

      function m (Bar f, any g)  // Bar is a subclass of Foo
      {
         this.x = ...      // this is early bound
         f.x = ...         // and so is this
         g.x = ...         // this is late-bound and 
      }                    // depends on what g is
   }

   class Bar extends Foo
   {
      ...
   }

The approach applies early binding where practical and otherwise uses late binding. Type declarations and casts can be used when appropriate if greater efficiency is needed. It allows you to access properties of other objects without them being shadowed by the properties of the current object (the one identified by 'this').

3.4 Class variables and methods

Can you define variables and methods that below to the class and not instances of that class? If so what syntax is appropriate? Presumably, you could use static as a prefix for such declarations, for instance:

  class foo extends bar
  {
      var i, j, k;        // instance variables
      static var size;    // class variable

      function a(b)       // instance method
      {
      }
  }

3.5 Static code

Can classes have code statements without these being embedded within methods, as in the following:

  class foo extends bar
  {
      var i, j, k;        // instance variables

      document.write("hellow world!");

      function a(b)       // instance method
      {
      }
  }

where the statement document.write is presumably called when the program is loaded (i.e. once per class and not once per instance).

3.6 Private, Protected and Public

When building larger scale programs it is helpful to distinguish which identifiers are public and which are for internal use in some restricted part of the program. It seems reasonable to follow the lead set by Java, and to use packages and classes as the basis for controlling access and visibility of identifiers, as well as using the same syntax.

When you declare identifiers you can qualify them as public, protected or private. The following table summarizes the choices and follows Java:

Accessibility of Identifiers
Accessible in public protected default private
Same class yes yes yes yes
Same package yes yes yes no
Subclass in different package yes yes no no
Non-subclass, different package yes no no no

Here are examples showing the syntax for using the qualifiers:

  var x;                          // this package
  public var z;                   // any package

  public class foo extends bar    // any package
  {
      private var i, j, k;        // this class only

      protected var p;            // and subclasses

      public function a(b)        // any package
      {
      }
  }

Where the qualifiers have the following meaning:

default
The identifier is visible anywhere within this package, but not from within other packages.
private
Applicable to identifiers defined within classes. It limits visibility to this class.
protected
Like 'private' but also visible in subclasses.
public
can be accessed anywhere in any package that imports this one.

If a package B imports a package C, and is imported by yet another package A, then the public identifiers of C are not visible in package A unless it also imports C.

In this example, the compiler can use the early bound types to generate code for this.x and f.x. The expression involving g.x is late-bound and is likely to result in less efficient code.

3.7 Static versus Dynamic Properties

If the compiler knows that a given class cannot have new properties added at run-time, it may be able to generate more efficient code to access its properties even when the object is referenced via a variable of type any. What syntax should be used to permit or disallow dynamic properties on a given class?

Should dynamic properties be allowed to have the same name as statically declared properties, and hence to shadow them?

3.8 Visibility and early/late binding

How does early and late binding work interact with visibility? The following example is used explain this:

  class C
  {
    private var n = 42;

    function f(x)
    {
       return this.n + x.n;
    }
  }

The private var 'n' is of type 'any' and initialized to a number '42'. The function argument 'x' is of type 'any. Both of these assertions assume that variables and arguments default to the type 'any' in the absence of a type declaration. How does the compiler evaluate the following expression?

    return this.n + x.n;

x is of type any, so we compile code to get the run-time value of the property named 'n' from the object x. A run-time error is raised if 'n' is either a private property or an unknown property of x. A refinement of this is to make the run-time check take into account the class from which the check was invoked (C in the example). This makes it possible to access private variables of instances of class C for late bound variables.

The programmer can choose to tell the compiler what he or she thinks the type of x should be. This can be done by casting. One way is via assignment to a variable of a known type:

  class C
  {
    private var n = 42;

    function f(x)
    {
       var C y = x;
       return this.n + y.n;
    }
  }

Another is to simply specify the type of the function argument:

  class C
  {
    private var n = 42;

    function f(C x)
    {
       return this.n + x.n;
    }
  }

Given this information, the compiler can generate code for a run-time check that the argument is indeed of type C or can be cast to it. The code for the return expression can then use an efficient means to access the private property 'n' on the object x.

4 Interfaces

An interface represents a contract which classes that implement the interface are required to honor. An inteface specifies a set of variables and methods that a class must provide. For instance:

  public interface drawable
  {
     public function SetColor(Color c);
     public function SetPosition(double x, double y);
     public function Draw(Window w);
  }

The syntax for specifying the type of return values should be the same as that for declaring functions. Since we have already elected for a postfix notation, this looks like:

  interface foo
  {
     IsCool(x) returns boolean;
  }

See syntax questions for functions in section 2.6.

4.1 Sharing a function across several interfaces

The same functions can appear in several interfaces. Indeed it is common to create a new interface that is exactly the same as an earlier one except for some minor additions. Here is an example:

 interface foo
 {
    var int n;

    function a (int x)
      returns boolean;
 }

 interface bar
 {
    function a (int x)
      returns boolean;

    function b (int x)
      returns string;
 }

 class C implements foo, bar
 {
    var int n;

    function a (int x)
       returns boolean
    {
       ....
    }

    function b (int x)
       returns string
    {
       ....
    }
 }

Where function a appears in both interface definitions. Variable n only appears in interface foo, while function b only appears in interface bar.

4.2 Binding the same name to different functions

Sometimes you want to change the name of the functions internally but need to bind these names to the former ones used in an existing interface. How should this be represented? Note that the same function may need to be bound to different names in different interfaces.

One possibility, using only words from the reserved list in ECMA 262 rev 2, is:

  class C implements I
  {
     function a implements b in interface I;
     ...
  }

The example shows how you can bind a function defined in a class to the name used for that function in a given interface. The same syntax can be used for variables by swapping "var" for "function".

  class C implements I
  {
     var x implements y in interface I;
     ...
  }

The var and function prefix is needed since ECMAScript permits variables and functions to have the same name.

4.3 Casting an object to an interface

This allows you to constrain an object to the properties and methods specified in a given interface. The casting can be done via assignment without the need for special syntax:

  interface I
  {
    void M1();
  }

  class C implements I
  {
    public void M1() { }
    public void M2() { } 
  }

  void function(C c)
  {
    var I x = c;

    x.M1();  // succeeds since M1 is in both C and I
    x.M2();  // fails since M2 is in C but not in I
  }

or via typing the function argument:

  void function (I x)
  {
    x.M1();  // succeeds since M1 is in both C and I
    x.M2();  // fails since M2 is in C but not in I
  }

The example makes the assumption that the identifiers for interfaces fall into the same namespace as those for other types, since that makes it easy to use the interface name as a type in variable declarations.

5. Units

This is about being able to write 2.54cm, 1.0in, 3s or 10% in expressions and have the compiler sort out the results. The syntax consists of a number followed immediately (no whitespace) by an identifier designating the unit of measure.

One idea we have discussed for implementing units is to use objects. You can't simply associate each unit with a different class, since operators like times and divide return values that combine the dimensions of the arguments, leading to an explosion of types. Another idea is to use closures for the results of expressions involving units whose values are not yet fully known. The following description deliberately avoids details of possible implementations.

5.1 Declaring Units

Units must be declared before they are used. This is achieved via a library call supplying the identifier and a scaling factor relating it to another unit with the same dimensions, for instance, via something like:

   System.declareUnit(in, 2.54, cm);    // 1in == 2.54cm
   System.declareUnit(Hz, null, null);  // new dimension

These declarations makes it practical to detect expressions involving incompatible dimensions, e.g. the following is in error:

   var x = 10cm + 4Hz;   // distance vs frequency

Some expressions combine different dimensions, e.g.

   var speed = 10cm/5s;          // result is 2 cm per s
   var distance = speed * 10s;   // result is 20cm

This can be implemented by internally representing units of measure as a set of tupples of the form: <unit, signed-integer>.When adding values with the same dimensions, but different units, the compiler should generate code that minimises the loss of accuracy.

5.2 Deferred Evaluation

In some situations, expressions involving units need to be deferred until the value of some unit is known. An example is the em unit which is commonly used in style sheets and signifies the current font height.

   width = 10%;       // can't be fully evaluated at compile time
   left = 2cm + 10%;  // likewise

When the variables width and height appear on the righthand side of an assignment, they are evaluated in the current context, which may or may not provide a definition for the current value of "%".

The programmer can set and reset the context for particular units by calls to a library function, e.g. something like:

   System.setUnit("%", 6in);
   System.resetUnit("%");

To allow for future support of threading, these calls should modify the local context rather than the global context. It seems a little bit error prone to have to reset the unit back to its previous value upon returning from a function etc. This suggests it would be beneficial if the compiler could do this automatically upon returning from the function in which the unit was set. You would still need the ability to manually reset the unit if you made more than one call to set the unit's value, e.g. in a loop.

In discussion, some people have said they would like to pass the context for the current definition of units to the code that evaluates the expression, rather than have the expression make calls to access the current definitions. This is a matter for implementations to choose and should be invisible to script writers. The script writer still needs the ability to set/reset the current definition, e.g. for units like % or em.

5.3 The percentage unit

This is widely used in style sheets for things like the current line height, the width between the current left and right margins. It doesn't always signify a distance value, though, for instance it may be used to represent a value for how loud a given sound should be played.

Percent is also defined in ECMASCript as an infix operator returning the remainder from an integer division of its operands. The question arises as how to distinguish whether in any given situation % is being used as a unit or as a remainder operator.

The remainder operator interpretation applies when the % is followed by an identifier or a prefix operator, other than plus or minus. Otherwise the unit interpretation should be used.

  10%20         // 10 remainder 20
  10%-20        // 10% less 20
  30px+10%      // 30 pixels plus 10%