I was doing some Unicode stuff in JavaScript today. I needed to extract the code points from a string. You might think that the way to get the code point at a given position in a string is:
cp = str.codePointAt( i );
Hah hah, no. That only works for points in the 16-bit Basic Multilingual Plane. If you have characters outside there, such as many emoji, you get garbage.
To handle points past 16 bits you instead do like so:
chars = Array.from( str ); cp = chars[i].codePointAt( 0 );
Array.from() knows how to correctly split a string into individual code points. Why does something as generically named as Array.from() have intimate knowledge of Unicode? ¯\_(ツ)_/¯
And why does codePointAt() correctly handle high-plane code points here but not before? Maybe the single-character strings produced by Array.from() have a different invisible encoding flag? Again, ¯\_(ツ)_/¯
Anyway, that's how you extract Unicode code points in JavaScript. Thank you for coming to my TED talk.