I'm reading a small section of the Unicode standard and who gets to decide how the standard works and how do I become an apprentice of this accursèd standard

Also apparently combining diacritics have a canonical form that they normalize, which means that it's a programming language

That is, U+2261 ≡ (identical to) followed by U+20D2 ⃒ (combining long vertical line overlay) is equivalent to U+2262 ≢ (not identical to).

Reduction rule

Follow

unicode.org/notes/tn27/
Unicode is kind of a disaster and I think it's charming

A few highlights:

U+200B ZERO WIDTH SPACE
This isn't a "space". It is an invisible character that can be used to provide line break opportunities.

U+2118 SCRIPT CAPITAL P
Should have been called calligraphic small p or Weierstrass elliptic function symbol, which is what it is used for. It is not a capital "P" at all.

These two are my favourite:

U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET
A spelling error: "brakcet" should be "bracket". A formal alias correcting this error has been defined.

U+FEFF ZERO WIDTH NO-BREAK SPACE
Byte Order Mark (Naming it ZWNBSP was a mistake from the start.)

This one is interesting:

U+262B FARSI SYMBOL
This symbol is so named because as symbol of Iran it cannot be encoded in ISO standards.

I'm not sure what prevents it from being named that way, maybe the ISO standards don't let you have countries in codepoint names unless it's like a flag or country code?

@ionchy i assume because they're so unique the weirdness with Iran comes from UN sanctions?? no clue how that would change standards though

Sign in to participate in the conversation
types.pl

A Mastodon instance for programming language theorists and mathematicians. Or just anyone who wants to hang out.