* It doesn't have "true" types in the sense that Ion does. It's basically just a binary serialization of JSON, with extra stuff.
* Despite being a binary format, it's actually bulkier than JSON in most situations.
* It removes any semblance of canonicity from many representations. A number, for instance, can potentially be represented by any of at least 3 types (double, int32, and int64).
* It has signed 32-bit length limits all over the place. Not that I'd want to be storing 2GB of data in a single JSON document either, but it's not even possible to do so with BSON!
* It requires redundant null bytes in unpredictable places. For instance, all strings must be stored with a trailing null byte, which is included in their length. There's also a trailing null byte at the end of a document for no reason at all.
* It is unabashedly Javascript-specific, containing types like "JavaScript code with scope" which are meaningless to other languages.
* It also contains some MongoDB-specific cruft, such as the "ObjectID" and "timestamp" types (the latter of which, despite its name, cannot actually be used to store time values).
* It contains numerous "deprecated" and "old" features (in version 1.0!) with no guidance as to how implementations should handle them.
Yes, and not only that. It also inherently insecure, while JSON is together with msgpack the only fast and secure serialization format out there. The problem is the encoding of objects and code without any checksumming, so it can be trivially tampered with, leading to very nice exploits, mostly remotely.
YAML does most of those and does more and can be made quite secure by limiting the allowed types to the absolute and trusted minimum, but this e.g. not implemented in the perl, only the python backend. By default YAML is extremely insecure.
There are more new readable and typed JSON variants out there. E.g. jzon-c should be faster than ion, but there are also Hjson and SJSON. See https://github.com/KarlZylinski/jzon-c
Most of this comes from BSON also being the internal storage format for a database server. For example, at least the redundant string NULs make it possible to use C library functions without copying, the unpacked ints allow direct dereferencing, etc.
I've no clue about the trailing NUL on the record itself, perhaps a safety feature?
> I've no clue about the trailing NUL on the record itself, perhaps a safety feature?
Could be. Or perhaps there's enough code paths in common between string parsing and document parsing that they decided to put a trailing null byte on both.
Stepping back a bit, though, the fact that BSON is optimized for "direct" use in C code is really scary. That suggests that any failure to completely validate BSON data could open up vulnerabilities in C code manipulating it.
* It doesn't have "true" types in the sense that Ion does. It's basically just a binary serialization of JSON, with extra stuff.
* Despite being a binary format, it's actually bulkier than JSON in most situations.
* It removes any semblance of canonicity from many representations. A number, for instance, can potentially be represented by any of at least 3 types (double, int32, and int64).
* It has signed 32-bit length limits all over the place. Not that I'd want to be storing 2GB of data in a single JSON document either, but it's not even possible to do so with BSON!
* It requires redundant null bytes in unpredictable places. For instance, all strings must be stored with a trailing null byte, which is included in their length. There's also a trailing null byte at the end of a document for no reason at all.
* It is unabashedly Javascript-specific, containing types like "JavaScript code with scope" which are meaningless to other languages.
* It also contains some MongoDB-specific cruft, such as the "ObjectID" and "timestamp" types (the latter of which, despite its name, cannot actually be used to store time values).
* It contains numerous "deprecated" and "old" features (in version 1.0!) with no guidance as to how implementations should handle them.