Zig-NestedText release 0.1.0

This seems to be a place for sharing Zig projects, so in case anyone’s interested…

I’ve recently released the first version of zig-nestedtext, which implements the NestedText spec - see NestedText: A Human Friendly Data Format — NestedText 1.3.1 documentation (note that there are some changes to the spec in the works).

NestedText is a simple human-friendly markup language based on YAML (which makes it YAYAML…?).

You can include zig-nestedtext as a package dependency (it’s up on astrolabe.pm) or download the lib and CLI (for converting between NestedText and JSON) from the GitHub releases page.

Any suggestions/bug reports welcome, please direct to GitHub issues :slight_smile:

5 Likes

I haven’t heard of nestedtext but it looks cool!

it is similar to JSON, YAML and TOML, but without the complexity and risk of YAML and without the syntactic clutter of JSON and TOML

Maybe I’ll have to consider it in one of my projects.

2 Likes

Very cool!

I didn’t know about NestedText but it looks really good and having a Zig implementation is even better!

Well done!

1 Like

Update on this: I recently released v0.2.0 (see the README), implementing the new NestedText spec, v2.0 and updating to use Zig 0.8 (thanks to @jeang3nie for the contribution here!). Available from GitHub releases, from gyro, or just a git clone :slight_smile:

This release adds support for multiline object keys and inline flow-style objects/lists, meaning any UTF-8 data can now be represented (without the need for any quoting or escaping).

I also have a WIP for adding support for parsing NestedText input directly into Zig types, as per std.json.parse().

2 Likes

I wanted to take the opportunity to request some feedback on the implementation of one detail of the NestedText format.

This is something that I’ve been trying to work out with the creators without yet managing to reach a consensus. (Apologies if this is considered slightly off-topic - the relevance is that it could well affect how I implement the zig-nestedtext package!).

First, a brief background/introduction:

  • NestedText syntax is heavily inspired by YAML
  • Flow-style lists/objects are now accepted in NestedText, e.g. [1, 2, 3]["1", "2", "3"] and {foo: 1, bar: 2}{"foo": "1", "bar": "2"}
  • This provides a way to represent empty lists/objects: [] and {}
  • NestedText flow-style is restricted to single-line representation, to preserve the property of NestedText that the type of each line can be determined without the context of surrounding lines
  • Empty strings can be represented in general by the absence of a value (quotes are always treated as character literals), e.g. >"", -[""] and foo:{"foo": ""}

The design decision in question is: what to do with empty values in flow-style?

Suppose we support empty values. The problem we then have is that it’s ambiguous whether [] is an empty list or contains a single empty string. We certainly want to be able to represent empty lists/objects with this syntax (because it’s otherwise not possible to), so it then becomes impossible to represent a list containing a single empty string in flow-style.

One possible solution to this is to allow trailing commas in flow-style, and require a trailing comma to mark an empty value at the end of a flow-style list (or object). That is, [] is an empty list, [,] is a list containing a single empty value ([""]), [1,]["1"], etc.

The argument against this goes as follows:

  • If you accept that empty values are permitted, at first glance [,] looks like a list containing two empty values (since the main purpose of the comma is as a separator), and similar for larger lists
  • The common reason for allowing trailing commas (to minimise diffs when adding elements over multiple lines) does not apply in NestedText, since flow-style is restricted to a single line - i.e. trailing commas would only be serving the purpose of delimiting empty values
  • One of the main goals of NestedText is to be “easily understood and used by both programmers and non-programmers” and that “people can understand it by looking at a few examples and they can use it without without needing to remember any arcane rules”
  • Therefore we should prioritise readability and not overload the comma separator as an end-of-list delimiter

If we decide that trailing commas should be disallowed (and in absence of the introduction of another way to indicate empty values at the end of lists), we’re left with two options:

  • Don’t allow empty strings in flow-style at all
  • Allow empty strings in flow-style, but accept that there’s no way to represent [""] because [] is empty and [,]["", ""]

An example where allowing empty strings in flow-style can be significantly more compact is below (inspired by freezerbox/test_fields.nt at master · kalekundert/freezerbox · GitHub):

given: ""
expected:
    {by_index: [,], by_name: {}}

otherwise having to be written as

given: ""
expected:
    by_index:
        -
    by_name:
        {}

Similarly, representing a medium-long list of short values (which may be empty strings) is well-suited to flow-style, e.g. a list of ‘falsey’ values: [ , 0, null, false, [], {}].

Any thoughts?

Yeah, so a primary use is config files. My config struct for Zterm uses multiple taggged unions, several of which have a field which can be void, while also having fields which can hold a value. Someone else might very much want to use a simple optional type.

And this isn’t Zig specific, a lot of languages have optional types, as well as enums types that may or may not hold a value. Rust has optionals as well as enums with a value, for one. So I think those use cases should definitely be supported.

I’m not personally thrown by the trailing comma, but a new user might be. I find it better than no solution at all, however.

Another thought is that here is where a keyword such as null or void might be useful. But I would hesitate to throw in extra syntax without the consensus of the authors.

Re: the request to have an optional/null/void value - this can be handled in the application (or parsing library such as zig-nestedtext).

For example, in my implementation of parsing into Zig types, I plan on supporting interpreting ‘null’, ‘NULL’ or the empty string as a null value, see here. This gives a way to represent null without needing to use the empty string, but is intentionally not built into the NestedText spec.

I’m not personally thrown by the trailing comma, but a new user might be. I find it better than no solution at all, however.

Note that it would still be possible to represent empty strings inside lists or objects, the discussion is only over whether to allow them in flow-style or not. (Perhaps you were aware of this and referring specifically to preferring allowing this in flow-style?)

Would that hold as well for a tagged union?

const Foo = enum {
    bar,
    baz,
};

const FooType = union(Foo) {
    bar: u8,
    baz: void, // no value
};

There’s my use case in a nutshell. By doing that, I can go from having a bool and a value to just the union, eliminating a struct field. But it does no good if it’s tricky to get on and off disk.

Yes. Your example looks concise to me, and easy enough to interpret.

I’m totally new to NestedText and haven’t used it, but looking at the issue as a newcomer, I find it intuitive if the rules stated that [] is always an empty list and to have a list of only 1 empty string, you have to do this [""] as a special case. The [,] empty list solution is counter intuitive to me.

Would that hold as well for a tagged union?

I anticipate the answer to that being ‘yes’, I just haven’t got to that bit of the implementation yet :slight_smile:

1 Like

Thanks for your thoughts. The problem with using [""] to represent a list containing a single empty string, is then how do you represent a list containing a string of those two quotes? You start going down the path of disallowing/escaping special characters. A lack of quotation marks is a feature of NestedText, so I don’t think this is really an option sadly.

I have yet to see a language or format that didn’t struggle with strings at some point. I think you’re completely right that you don’t want to go down the rabbit hole with escaping/special chars etc. All languages have compromises, markup languages included. But this should be good for it’s intended problem space I think, without having perfect string handling.

1 Like

I’ve just released v0.3.0 (although having problems publishing to gyro), which adds support for parsing into a comptime zig type, such as structs/unions.

This should allow the use-case above of parsing into something like the following union, based on element types.

union(enum) {
    bar: u8,
    baz, // no value
};

Any feedback welcome.

2 Likes

Oh, I also created a zig-master branch, which has the tests running daily against a nightly Zig build via GitHub actions. It’s currently failing the new Zig AST check performed during stage1. This is due to a single error in the zig-clap dependency, but this should be quite easy to fix upstream, if anyone’s particularly driven to do so :slight_smile:

1 Like

How do I replicate this? I’m on branch zig-master, with zig compiler dated 2021-06-23. Running zig build test results in 93 passes, 2 skips and 0 fails.

The pull request got merged a few hours ago, make sure it is the NEWEST.

I seem to be having some trouble with parseTyped.

const Kind = union(enum) {
    color: []const u8,
    plain,
};
pub fn main() !void {
    var p = nestedtext.Parser.init(allocator, .{});
    const plain_kind = try p.parseTyped(Kind, "> null");
    std.debug.print("{s}\n", .{plain_kind});
}
zig build run
Kind{ .color = { 110, 117, 108, 108 } }

I might very well be doing something incorrect, but it seems to always return .color instead of .plain.

Ah, I see the problem - any value can be parsed into a string, so you’re just getting ‘null’ as a string (since the ‘color’ field comes before the ‘plain’ field). As a workaround you can move ‘plain’ above ‘color’ in the union. Perhaps this should just be the recommended solution… I’ve created Typed parsing should prefer non-string types? · Issue #18 · LewisGaul/zig-nestedtext · GitHub for this. Thoughts?

Note I’ve just pushed a fix for a bug this uncovered in parseTypedFree() (to main and zig-master branches).

Sorry - this typed parsing stuff is a bit rough-and-ready at this point, thanks for giving it a try and letting me know about any issues!

It’s fine. If I can help by testing it’s still help, right? I’ll pull your changes and check again.

Ok, so putting plain on top works. Thanks for the workaround.

Any plans on goind the other direction? IE from Zig struct to NestedText. Just curious, but this is already looking good.