This seems on its face to be a significant improvement on the goals of YAML, but I think the tradeoffs it makes will likely move YAML’s problems into a different place, creating a whole different set of difficulties understanding what a given data is, means, or does.
The problem with human friendly formats is that the thing that typically makes them human friendly is removing things that make reading and editing difficult, but make disambiguation possible. If the format ever needs to be read by a machine, something has to do that disambiguation.
If it’s not provided by the format, you’ve turned every usage into a potential source of bugs that would otherwise be restricted to interchange/stack implementation incompatibilities. In other words, now your format can have a different set of expectations even on the same system.
The natural response to that problem will be to bolt on validation, types, and documentation that is provided arbitrarily (and with varying quality).
IMO, efforts in human friendly formats should focus less on stripping out funny characters, and more on which minimal set of funny characters provide:
- Good readability
- Good editability
- Clarity of structure
- Clarity of data types
- Reasonable tolerance and flexibility for variance in arbitrary formatting/style preference (particularly in delimiting long form/multiline text and annotations), because no one can agree what good readability or editability means
- A flexible type system that allows machines and humans to know what a given datum is without variation or surprises
I generally find that the biggest problem with human friendly formats like YAML, which I think this also has, is that they tend to decouple readability from writability, and this encourages all sorts of complexity and polymorphism that seem superficially expressive, but end up just being difficult to work with. I've seen so many cases where YAML schemas turn into a quasi-DSL, because the developer thought that it was more important to have a clean looking configuration than one that is easy to edit. The result is that things like indentation get really weird, because the developer didn't optimize for having a sane underlying model.
A great comparison for this is CircleCI's config syntax and that used by GitHub Actions. The Circle format is extremely error prone; about half the time when I'm modifying a Circle config, I'll end up pushing a broken config, even though the YAML syntax itself is valid. With the GitHub Actions format, I almost never screw it up. I don't think it's a coincidence that if you convert a Circle configuration to JSON, it looks twisted and bizarre, whereas if you do the same with a GHA config, it looks perfectly ordinary and sensible.
If you think of YAML as "a prettier version of JSON", and design as if your users will work primarily with JSON, you can do fine with it. If you think of it as a medium for building your own configuration language, you'll make something awful. The problem is that any human friendly format is going to inherently encourage the latter.
See also the travesty that is Ansible's YAML-based DSL, which includes fun stuff like an in-line replacement language with tokens enclosed in braces, which of course you have to quote in some cases so that pyyaml doesn't think they are dicts.
The problem with human friendly formats is that the thing that typically makes them human friendly is removing things that make reading and editing difficult, but make disambiguation possible. If the format ever needs to be read by a machine, something has to do that disambiguation.
If it’s not provided by the format, you’ve turned every usage into a potential source of bugs that would otherwise be restricted to interchange/stack implementation incompatibilities. In other words, now your format can have a different set of expectations even on the same system.
The natural response to that problem will be to bolt on validation, types, and documentation that is provided arbitrarily (and with varying quality).
IMO, efforts in human friendly formats should focus less on stripping out funny characters, and more on which minimal set of funny characters provide:
- Good readability
- Good editability
- Clarity of structure
- Clarity of data types
- Reasonable tolerance and flexibility for variance in arbitrary formatting/style preference (particularly in delimiting long form/multiline text and annotations), because no one can agree what good readability or editability means
- A flexible type system that allows machines and humans to know what a given datum is without variation or surprises
- Maybe humans should just use a GUI?