[dart-misc] Dart Language and Library Newsletter (2017-10-13)

'Florian Loitsch' via Dart Misc

2017-10-13 21:25:35 UTC

Github link for this newsletter:
https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md

Earlier newsletters: https://github.com/dart-lang/sdk/tree/master/docs/
newsletter

Dart Language and Library Newsletter

Welcome to the Dart Language and Library Newsletter.
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#follow-up>Follow
Up
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#evaluation-order>Evaluation
Order

Last week we discussed the evaluation order of method calls, like
o.foo(bar()). As some readers correctly pointed out, the section was
partially misleading and sometimes based on wrong assumptions. Thanks for
the feedback.

We claimed that there was an exception to the left-to-right evaluation in
the spec. That's misleading: the is spec is careful (and it uses a lot of
otherwise unnecessary machinery) to say that in e.foo(e1, ..., eN), e.foo is
not an expression that is evaluated. As such, it is meaningless to talk
about the "order of evaluation" of method calls, when there are no
expressions that are evaluated.

Even after our change - evaluating getters first when they are used as
call-targets - it is not meaningful to talk about the evaluation order of
expressions, since o.foo still isn't an expression.

We also claimed that the expected behavior was to evaluate from left to
right. We missed the fact that some method-based languages only look up the
method before doing the call. C# and Java, both evaluate the arguments
before reporting nullerrors. In contrast, JavaScript and C++ (at least for
virtual functions in gcc) report null errors before evaluating the
arguments to the call.

JavaScript actually changed their behavior in ECMAScript 5.0 (see es5
<http://es5.github.io/#x11.2.3>). The change was that GetValue(ref) in step
2 is now evaluated before evaluating the arguments. The reason for this
change was for similar reasons: initially, ECMAScript did not have getters,
but with getters it made sense to evaluate the GetValue(ref) first, so that
the evaluation went from left to right.

Fundamentally, the question boils down to: Is o.foo(bar()) the same as
(o.foo)(bar())?

With the proposed change it will be for most programs. There are some
issues with noSuchMethod when a function is invoked with the wrong number
of arguments, but this corner-case is only relevant in broken programs.
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#did-you-know>Did
You Know?
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#doubletostring>
double.toString

Dart has two number types: integers and floating-point numbers. For the
latter, Dart uses the IEEE double-precision floating-point type. There are
some interesting alternatives, like unums
<https://en.wikipedia.org/wiki/Unum_(number_format)>, but these are not
directly supported by our CPUs and are not yet universally available.

In this section I will focus on Dart's toString methods of double. There
are 4 methods:

- toString
- toStringAsExponential
- toStringAsFixed
- toStringAsPrecision

The double.toString method is the most commonly used way of converting a
double to a string. It uses a mixture of toStringAsExponential and
toStringAsFixed to provide the most visibly appealing output. For large
numbers (more than 20 digits) it emits an exponential representation as in
1.3e+21. The same is true for very small floating-point numbers, strictly
smaller than the one represented by 0.000001. For example 0.000000998 is
printed as 0.98e-7. All other numbers are printed in the "normal" decimal
representation: 5.0, 1.23, or 0.001.

The other methods just force a specific representation. I encourage readers
to experiment with them, but I won't explain them in this section.

This is, however, only the easy part of printing a floating-point number.
The much harder part is to compute the individual digits. Converting
floating-point numbers to strings is surprisingly hard. Not only do
floating-point numbers have multiple possible string representations,
computing them requires some thought.

In general, the minimal requirement we want is that the output of toString can
be parsed back into a floating-point number with double.parse and yields
the same number. This is called the *internal identity requirement*. For a
long time, C libraries were known to not even satisfy this requirement.
Lack of communication (no Internet) was probably a big reason why it took
the libraries so long.

The first known publication of a correct algorithm was alread in 1984. In
J. T. Coonen's there is a correct (and very efficient) algorithm for
correct conversions. That thesis (or at least the algorithm in it) went
largely unnoticed...

It took until 1990 when G. L. Steele Jr. and J. L. White published their
printing-algorithm (called "Dragon") at a big conference (PLDI) that C
libraries and languages finally started to do the "right" thing. (Note that
a draft of the paper existed for a long time and was even mentioned in
"Knuth, Volume II" in 1981).

There are two reasons why converting floating-point numbers to strings is
difficult:

1. Loss of precision doing operations.
2. Multiple possible representations.

<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#loss-of-precision>Loss
of Precision

The term "floating-point" refers to the fact that the decimal point can
"float". It's helpful to look at examples in decimal representation.
Imagine a floating-point number type that allows for 5 decimal digits, and
2 exponent digits. This would allow for numbers in the range 00001e-99 to
99999e99 (and, of course, 0 itself). This covers a really big range, but
the precision is often pretty poor. The best approximation of the number
123456789 would be 12346e04. Let's see what happens, if we add 1 to that
number. The correct result would be 123460001. Since that number doesn't
fit, we need round again, and end up with 12346e04 again. This means that
the 1 was lost due to imprecision. Readers can try the same experiment in
IEEE double floating point by writing 1e20 + 1. The 1 is simply lost.

For a long time algorithms used doubles themselves to compute the digits of
another double. However, this can lead to imprecision, because every
operation on the input introduces rounding errors.

The correct way to compute the digits of a double is to use a number type
that has a higher precision than the input. There are algorithms that
manage to compute the *correct* output with extended precision
floating-point numbers (aka "long doubles"). Other algorithms use 64 bit
integers (which is more precise than the 53 bit precision of doubles).

While a slightly bigger data type is enough to compute the *correct* result,
it doesn't always yield the *optimal* string representation. There are
multiple representations for every floating-point number and some require
more work than others.
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#multiple-possible-representations>Multiple
Possible Representations

Floating-point numbers have a precise value, but, for a specific
configuration, are spaced out, so that two consecutive numbers always have
a gap between them. Getting back to our decimal floating point numbers, we
can see that the two consecutive numbers 12345e42 and 12346e42 are at a
distance of 1e42 from each other. For simplicity we think of floating-point
numbers as representing intervals. Any real value x that lies closer to one
floating-point number f than to the neighboring numbers is part of the
interval represented by f. One of the qualities of floating-point numbers
is that the gap between two consecutive numbers (and thus the interval) is
dependent on the value of the number: the smaller the number, the smaller
the gap.

For a string-conversion this means that any output that lies within the
interval satisfies the internal identity requirement. Programming languages
then have to decide which properties they value when a user asks for the
conversion.

C libraries tend to provide the most precise representation, but let users
decide how many of those digits they want.

Languages that were designed after 1990 generally tend to prefer
representations that use the shortest amount of precision digits. This
includes Java, C#, ECMAScript, and Dart.

Let's look at an example: 0.3.

The most precise representation of this number is:
0.299999999999999988897769753748434595763683319091796875.

In C users can now ask for a specific amount of digits (6 by default):

#include <stdio.h>
int main() {
printf("%f\n", 0.3); // => 0.300000
printf("%.54f\n", 0.3); // =>
0.299999999999999988897769753748434595763683319091796875
}

Modern C libraries make it possible to print floating-point numbers with
any precision, but they don't tell users where they can stop. Modern (C11)
libraries provide macros (like DBL_DECIMAL_DIG) to give users a way to emit *at
least* enough digits to be able to read them in again, but in many cases
(like 0.3) there are shorter valid representations, and there is no way in
C to just emit the shortest valid representation.

In Dart, 0.3.toString() simply returns "0.3". That doesn't mean that Dart
(like ECMAScript) will always return the shortest representation. For
example 0.00001 is converted to "0.00001" and not "1e-5" which would be
shorter. However, the number of significant digits is chosen to be minimal.

While the algorithm for finding these digits *efficiently* is non-trivial,
the idea is straightforward: given that the floating-point number
represents an interval, one just needs to look through every possible
string representation in that interval and look for the one that is:

1. the shortest possible, and
2. the closest possible *if* there is more than one possible choice.

This sounds complicated, but is actually very simple for humans. Let's take
a few examples:

double d = 0.3.
Lower boundary: 0.2999999999999999611421941381195210851728916168212890625
Exact value: 0.299999999999999988897769753748434595763683319091796875
Upper boundary: 0.3000000000000000166533453693773481063544750213623046875

double d = 0.1234567
Lower boundary: 0.123466999999999986481480362954243901185691356658935546875
Exact value: 0.1234669999999999934203742668614722788333892822265625
Upper boundary: 0.123467000000000000359268170768700656481087207794189453125

Given the lower and upper boundary for these numbers, humans have no
trouble finding the shortest that lies inside the interval. Once the
leading digits differ, we can discard all remaining digits. The reason the
algorithms are complicated is due to corner cases (like when the shortest
representation lies exactly on the boundary), and because of the desire to
use data types that are small and efficient.
<https://github.com/dart-lang/sdk/blob/master/docs/newsletter/20171013.md#language-and-library-design>Language
and Library Design

As part of the language and library team we frequently get requests like
"can we not just simply add feature X" to the language or libraries. In
this section I will explain why even simple requests often require much
more thought and why they often take so much time.

One of the fundamental qualities of a language and library is a consistent
feel. This requires that library and language design is done holistically.
Features that work in other languages might need a different design to fit
into Dart, or should be provided through completely different means. I will
provide some examples in the remainder of this section. I want to emphasize
that these examples are chosen because they show how many features require
to think about many other unrelated and often more complex features. They
are not necessarily planned, and some might have already been rejected.

Let's start with the simple request that some classes should have a way to
serialize themselves to JSON.

There are immediately several questions:

1. Should this work for all classes or just some selected ones? Dart
could provide a new value type that is serializable and let class types
stay non-serializable.
2. Should it be limited to JSON or would the feature make it possible to
serialize to other formats, like Google's protobuffs.
3. Does the caller know which class gets serialized (the static shape),
or is the instance dynamic with many possible concrete types. Very often,
the type of the object that needs to be encoded is fully known, and the
callsite could do the serialization without any help.
4. Can a user provide a serialization scheme for classes they can't
modify?
5. Is the amount of work small?
6. Can the same instance be encoded differently for different RPC calls?
7. Is it cheap?

In some sense, Dart already supports one way of serializing to JSON: the
toJson method. For instances that the JSON encoder doesn't recognize, it
simply invokes toJson to make the instance encode itself in a way that is
suitable for JSON.

Looking at the questions above, we have:

1. Yes. Works for all classes.
2. No. Limited to JSON.
3. No. The caller doesn't need to know. JSON.encode(randomObject) works
as long as the randomObject has a toJson.
4. Yes/No. The converter has a toEncodable parameter, but that's an
additional hook and not toJson.
5. No. There is too much boilerplate code.
6. No. The same instance is always encoded the way the toJson specifies.
7. Yes. Adding a simple method doesn't cost significantly in output size
and in speed.

Dart features another way of serializing instances: mirrors. Using
reflection libraries can inspect any object. This makes it trivial to write
a library that looks at each instance, gets its fields and emits them. The
responses to the questions are now completely different:

1. Yes. Works for all classes.
2. Yes. Works for every possible serialization scheme.
3. No. The caller can encode any dynamic object.
4. Yes/No. *Every* class is serializable. If the serialization needs to
be adapted, it's up to the library to provide additional hooks.
5. Yes. The users of the library don't need to do any work. The work is
for the authors of the serialization library.
6. Yes/No. The same instance can only have different serializations if
there are hooks in the library.
7. No. The current reflection is too expensive to be usable in compiled
Dart programs (dart2js or AoT).

We could imagine a better version of mirrors, let's call it dart:reflect that
is more suitable for compilation. If it is similar to the reflect package
it would require annotations on a class to make it reflectable (and thus
serializable). Depending on the concrete implementation this could solve
the performance problem, but might come with the limitation that external
users wouldn't be able to provide serializations for classes they didn't
write. A new dart:reflect library would, in any case, be a major
undertaking.

An alternative would be to provide a new "class" type that automatically
provides serialization methods. For example, we could add a value keyword
that takes the place of class:

value Point {
int x;
int y;
}

This value classes would not need any toJson and (at least in the example)
would also come automatically with a constructor. We could even add a
hashCode and == implementation that uses the fields.

Once we have such a class, we need to think about value types. That is,
should these instances have an identity or not? Should we specify these
value classes in such a way that it would be possible to pass them on the
stack?

If yes, would these classes have mutable or non-mutable fields? If they
were mutable, then their semantic would differ from normal classes:

var p = new Point(1, 2);var p2 = p; // Value type -> copy.
p2.x = 3;print(p); // 1, 2print(p2); // 3, 2

Without mutable fields the behavior for value types and non-value types
would only be visible when using identical.

Similarly, value types have lots of questions with respect to Object (are
they Objects or not?), and how lists of them should be implemented in the
VM.

When talking about value types we should also think about tuples. Tuples
are (for this document) just a collection of values with different types.
For example a Tuple<int, String> would be a class with two fields (let's
just call them fst and snd) of types int and String. It would make sense to
make tuples value types.

One could even imagine that value is just a layer on top of tuples. That
is, that every value class is just a different view of a similarly typed
tuple.

value Point {
int x;
int y;

bool get isOrigin => x == 0 && y == 0;
}

value Vector {
int x;
int y;

int get length => sqrt(x*x + y*y);
}
var p = new Point(3, 4);print(p.isOrigin); // false.Tuple<int, int> t
= p; // It's the same thing.Vector v = t; // Still the same thing.
v.print(v.length); // 5.

This approach would reduce the output code size since many value types
could be mapped to the same tuple. However, they would make our
serialization go out of the window. The serialization function would only
be able to see that the receive object was a tuple, but not which one.

String serialize(Object o) {
if (o is Tuple) { /* now what? */ }
...
}

That doesn't mean that the value/tuple approach is doomed. We could provide
static descriptions for classes that users could pass to the serialization
function. Every value class could have a static field shape that returns
the shape of the value type (or even for every class):

var p = new Point(3, 4);var shape = Point.shape;serialize(p, shape);
String serialize(Object o, Shape shape) {
if (o is int) print(o);
if (o is String) ...;

// A non-trivial object:
print("{");
for (field in shape.fields) {
print("${field.name}: ${serialize(field.value(o), field.shape)},");
}
print("}");
}

This would require that every call to serialize knew the static type of the
object that should be serialized. It wouldn't work if the serialization
needed to do dynamic is-checks. This sounds like a big restriction, but in
practice, serialization often happens on concrete types, or the type of the
object can be determined without is checks.

In some sense this static value type would also be very similar to
extension methods. They would almost be extension classes. This means that
we should look at extension methods too, and make sure that they would fit.

The exploration doesn't really stop here, but let's have a look at another,
completely different alternative for serialization: macros.

If Dart had macros, then users could just write the value type themselves.

define-macro value ID { CLASS_BODY };
Ast value(Identifier id, ClassBody body) { ... }

This is, admittedly, just a very hypothetical macro system, but it would
immediately remove the need for a value type that provides the toJson method.
In fact, a good macro system makes many language features unnecessary,
since they can just be implemented on top of macros.

The disadvantage of macros is that they can get complicated to get right
(even Scheme has two variants) and that they are almost too powerful. They
would definitely be a major undertaking for Dart.

This section is getting long, so I won't cover other alternatives, but I
hope I showed how language features can have dependencies that overlap. For
each of the related features we would now explore other alternatives and
see if they would conflict with a specific direction.

As mentioned in the beginning of this section, we value a consistent feel
of the language. This means that these explorations are very important to
make sure that things fit together. This is also the reason that features
that work well in other languages might not be a great fit for Dart. We
need to evaluate each feature and see how it behaves with the current and
the future planned features of Dart.

--
For other discussions, see https://groups.google.com/a/dartlang.org/

For HOWTO questions, visit http://stackoverflow.com/tags/dart

To file a bug report or feature request, go to http://www.dartbug.com/new
---
You received this message because you are subscribed to the Google Groups "Dart Misc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to misc+***@dartlang.org.