Subject:  Re: HP ECMAScript comments
From:  MIKE_KSAR@HP-PaloAlto-om4.om.hp.com @ D06AU010
SendTo:  e-tc39@ECMA.CH
ReplyTo:  MIKE_KSAR@HP-PaloAlto-om4.om.hp.com
PostedDate:  19.05.98 22:40:31

TC39 technical experts,

Here is some input from an experienced Unicode implementer, Ken
Whistler, at Sybase, who is also one of the technical directors of
Unicode.  Ken responds to Tom McFarland's feedback on I18N issues for
version 2.  I would recommend to TC39 technical experts to review Ken's
feedback and to retrieve and review the documents that he refers to in
his message at the Unicode website.  As Ken says, the issue is not for
ECMAScript only but for all programming languages.

Mike

______________________________ Forward Header __________________________________
Subject: Re: HP ECMAScript comments
Author:  Non-HP-kenw (kenw@sybase.com) at HP-PaloAlto,shargw3
Date:    5/19/98 3:06 PM



Mike,

I concur with most of Tom's comments. Although I am
not familiar with the details of ECMAScript, what he has
stated are all valid concerns which should be addressed
by any language standard which uses Unicode as its
reference character set.

To this I would add that one elegant way for a programming
language standard to sidestep some of the complexities
introduced by alternative representations of the same text
elements (particularly for Latin precomposed letters), is
to define valid program text on a *normalized* form of
Unicode.

Programming standards have been loath to use normalized
*decomposed* form of text, even though that is the most
elegant way to handle the problem, partly because it expands
the program text, but also for the practical reason that many
systems do not handle the combining forms correctly
as yet.

The alternative is to use a normalized *composed* form of
text. That is what Mark Davis' document:

ftp://ftp.unicode.org/WorkingGroups/Properties/wdutr-Composition1.2.html.

is trying to rigorously define. If a programming language
standard were to specify use of canonical composition per
that document as its normalized form for program text, then
many of the complications posed by Tom's comments drop away.
Binary comparison of Unicode identifiers and strings is then valid.

The committee working on ECMAScript V2 for I18N should study the
Java Collator class, of course, but also the Java resource
bundles for localized data. Also, to gain perspective on the
implementation guidelines for Unicode, they should be taking
a look at the Unicode Technical Reports published or drafted
on the Unicode website:

http://www.unicode.org/unicode/reports/techreports.html

Among these is the Draft UTR #10 on Unicode Collation, which describes
in great detail the Unicode recommendations and data files for
support of multilingual, culturally correct collation.

Re some others of Tom's comments:

11.8.5 Comparison operators

"to pay for the cost of doing a locale-sensitive sort"

I would characterize this rather as a "culturally correct sort",
so as not to beg the question of how the collation is defined.
"locale-sensitive" tends to reflect a model which presumes the
existence of a "locale" that the application queries, whereas
that is not how Java does it and might not be the way for
ECMAScript to do it.

11.9.3 Equality operators

Introduction of normalization of program text would keep the basic
equality operators simple.

This does not preclude the need for definitions of levels of
equality under collation as well, as Tom points out.

15.13.5.39 toLocaleString()

"Need some way to specify which locale will be used"

Java has a mechanism for this all spelled out, making use of a
Formatter class (and subclasses DateFormatter, NumericFormatter, etc.).
It would be good for ECMAScript to follow that well-thought-out
model, if possible.

--Ken

--------------------------------------------------------------------------------
FROM:
Non-HP-kenw/HP-PaloAlto_shargw3////////HPMEXT1/kenw#a#sybase#f#com@boi167
TO: MIKE_KSAR@hp.com
CC: tommc@hp.com,
    kenw@sybase.com