Kuhn's idea is also used in in Python 3, so that garbage bytes can (optionally!)...

kazinator · on Aug 15, 2015

Interesting; I implemented exactly the same thing in the TXR language. I can read an arbitrary file in /bin/ as UTF-8 to a string, and when that string is converted to UTF-8, it reproduces that file exactly. All invalid bytes go to DCXX, including the null character. The code U+DC00 is called "pnul" (pseudo-null) and can even be written like #\pnul in the language as a character constant. Thanks to pnul, you can easily manipulate data that contains nulls, like /proc/<pid>/environ, or null-delimited strings from GNU "xargs -0". The underlying C strings are nicely null-terminated with the real U+0000 NUL, and everything is cool.

kragen · on Aug 15, 2015

You and the Python guys should get together and make your hack compatible, and then pressure everyone else to standardize on it, instead of the horrible nightmare where every string input and output operation potentially corrupts data or crashes.

joeyh · on Aug 15, 2015

Same in haskell (ghc haskell at least).