@shriramk Since I know you appreciate Python pop quizzes:
my_heterogeneous_map = {
( 0.0): "positive zero",
( -0.0): "negative zero",
( 0): "integer zero",
}
print("my_heterogeneous_map=%r\n" % my_heterogeneous_map)
del my_heterogeneous_map[False]
print("my_heterogeneous_map=%r\n" % my_heterogeneous_map)
@shriramk Yeah. It's a bit of a cheap shot since every PL has historical baggage, but I have been losing myself in a maze trying to understand the semantics of `in`.
https://docs.python.org/3/reference/expressions.html#comparisons
> The built-in containers typically assume identical objects are equal to themselves.
Twisty twist: that "typically" is load-bearing.
@shriramk @cks @mvsamuel It's partly IEEE's fault. IEEE 754 says that all comparisons involving NaN must return false (for good reasons, given that NaN is basically an error return value). But IEEE does not have to deal with notions of object identity.
The "clean" solution for Python would probably be to return true for a comparison of identical NaN-valued objects, and false otherwise. That would restore substitutability, but I doubt it would reduce confusion.
@shriramk @cks @mvsamuel Indeed. But then you have to think about NaN in the early phases of language design.
An even more radical solution: don't allow floating-point numbers as keys. There are few situations where comparing floats for equality is the right thing to do, and for those you can offer a workaround (such as: conversion to a byte sequence).
@shriramk @cks @mvsamuel No float equality at all seems a bit harsh. There are some valid use cases in low-level float algorithms (where the results you are testing have not been subjected to rounding). And there are valid use cases for equality testing in unit tests (e.g. when testing your toolchain rather than your numerical algorithm).
@shriramk @khinsen @cks On float comparison vs generic programming,
in case you're unaware of the tricks the JVM folk had to do to make it easy to use generic collections and to provide generic operations on collections like <T> int firstIndexOf(Iterable<T>, target: T)
In Java, ==
applied to builtin (pass-by-copy) double
values follows IEEE-754.
But the boxing reference type, Double
, which is what is stored in ArrayList
s and other standard library collections has a different notion of equality.
Note that in most cases, for two instances of
class Double
, d1 and d2, the value ofd1.equals(d2)
istrue
if and only if
d1.doubleValue() == d2.doubleValue()
also has the value
true
. However, there are two exceptions:
- If d1 and d2 both represent Double.NaN, then the
equals
method returnstrue
, even thoughDouble.NaN==Double.NaN
has the valuefalse
.- If d1 represents +0.0 while d2 represents -0.0, or vice versa, the equal test has the value
false
, even though+0.0==-0.0
has the valuetrue
.This definition allows hash tables to operate properly.
So Java's core libraries (ab)use a distinction between reference and primitive float values to treat float equality as "same notional value" when the float is boxed which is typically the case when doing some generic operation on floats.
But when the floats are primitive, which is typically the case when a numerical methods person is doing numerical methods person things, they use 754 semantics.
This is a kludge that works out really well in practice, but I think there's a principle here.
A PL should allow type-generic function authors to write sensible code without having to know the quirks of builtin operators applied to special types.
Special semantics should require special operators or opt-in.
imo, OCaml gets this right with a different operator, <.
, for floaty comparison instead of <
.
@inthehands @shriramk @khinsen @cks
Hehe. It's not a perfect fix. You can almost get lulled into thinking you can do arithmetic with capital-D Double.
Double a = 3.14159265;
Double b = 3.14159265;
Double c = a + b;
System.out.println("a == b -> " + (a == b)); // false but actually under-specified behaviour
System.out.println("c -> " + c); // 6.28....
But in the history of PL kludges, it's a pretty great one.
@inthehands @shriramk @khinsen @cks If you consider Rust in the ML family, then its trait PartialEq
does that. And which Rust traits apply depends on what you've imported into the local scope.
OCaml doesn't because of it's super-aggressive type erasure.
https://dev.realworldocaml.org/runtime-memory-layout.html:
OCaml uses a uniform memory representation in which every OCaml variable is stored as a value. An OCaml value is a single memory word that is either an immediate integer or a pointer to some other memory.
Its structural equality (=
) is based on recursively comparing those uniform values.
Its physical equality (==
) is based on same bits, iirc.
iirc, SML is similar to that.
@inthehands @shriramk @khinsen @cks I write very little Rust in anger, but iiuc, you can wrap a value in a struct with zero abstraction overhead, and use that as the basis to apply the same traits differently.
@inthehands @mvsamuel @shriramk @khinsen @cks
Rust's `PartialEq` and `Eq` could have been used to implement "equality" and "identity" respectively, if Rust designers hadn't put those traits into a sub-typing hierarchy.
It's infuriatingly close to being right.
The often-mentioned hackaround to wrap things into newtypes is sadly neither scalable nor modular.
@inthehands @shriramk @khinsen @cks It got so difficult to back-port new functions because of the oddities of ==
that TC39 standardized Object.is to make that easier.
@soc @khinsen @shriramk @cks perhaps a source of confusion comes from languages inheriting C++-specific behaviour and people assuming those come from 754 due to that attribution appearing for other operators' documented semantics.
iiuc, operator<
and friends are typically now based on the std three-way comparison:
https://en.cppreference.com/w/cpp/language/operator_comparison
Otherwise, the operands have floating-point type, and the operator yields a prvalue of type std::partial_ordering. The expression a <=> b yields
- std::partial_ordering::less if a is less than b
- std::partial_ordering::greater if a is greater than b
- std::partial_ordering::equivalent if a is equivalent to b (-0 <=> +0 is equivalent)
- std::partial_ordering::unordered (NaN <=> anything is unordered).
@cks @shriramk Yeah, value interning is often under-specified behaviour, so you, one call to float('nan')
is not the same as another on CPython and neither is the same as math.nan
.
I don't know that it's a fast path. As it's been explained to me, if in
needs to accurately predict whether reading from a dict returns or raises, and the is
check preserves that relationship.
@shriramk @cks Sounds like a good time to collect use cases for dicts.
Relations: the keys are abstract mathematical objects.
So if users are likely to craft custom abstract mathematical types, then allowing those as keys is nice. NaN seems a legit choice for a key there.
Custom hashing allows you to support mathematical objects for which it's inconvenient to provide a canonicalizing function.
Side tables: keys are aribtrary values that you want to associate extra info with.
In this case, keying off identity is important, but identity is a giant can of worms.
Often you want keys to weakly refer.
Memo tables: keys are inputs to a function so need to use some notion of equivalence that meshes with the function.
Often weakly/softly referencing keys is good here, though LRU can work in a pinch.
Grouping: As an intermediate step in a larger computation processing a series, I want to bucket like things together so I can operate on them as a group. ime, keys here are often derived from the group items and may often be expressed as tuples of values that have their own notion of equivalence.
Others?