ZOMB File Format + Zig Implementation (WIP)

Hey there!

I’ve been working on a new file format I call ZOMB, which (so far) I describe as a mix between JSON and TOML, and with macros. Long story short, I wanted a file format to help me make maintaining my JSON color schemes for various editors less painful. As I started defining this format, I realized it might be useful for other’s as well…we’ll see I guess.

I’ve also been trying to learn Zig (which I am enjoying thoroughly), so at the same time I decided to start implementing zomb-zig, a library for the ZOMB file format in Zig.

Both of these projects are very much in their infancy - they’re basically in-progress prototypes (especially the Zig library), but the file format specification does feel pretty close to graduating from this prototype phase, so I figured it might be a good time to open things up and get some feedback if anyone is so inclined.

Thanks for your time!


Edit: I forgot to clearly mention that one of the reasons I started this was just for fun – I think that’s an important point to make. I have no idea if this file format will end up being useful, but so far it feels like it has potential.


Update: Since posting this, I’ve been working on convincing myself that it actually solves my original problem, but I realized it did not (at least not entirely). As a result, I’ve added a new feature I’m calling “macro batching” which does indeed seem to solve my original repetition problem. I’ve also cleaned up the README a bit. Check it out!

2 Likes

As I understand the key difference of this format from others is its macro system. Would you mind sharing some use cases that made you devise such a format?

The reason I ask is because I have an impression that there are better ways of handling repetition. It can be in the program which consumes the file. Or data repetition could be a sign that data structure might need refactoring. Sometimes it is just a generated config file where we don’t care about repetition. I would be curious to learn where macros are at their best.

A note on arrays and objects. As I understand, it is inspired a lot by json. But looking at it, it feels like you can just remove all the brackets and it will still make sense. What do you think about using indentation-based semantics, as inspired by yaml?

@greenfork - Thank you for taking a look and for the great questions! It really helps me think more about the design and purpose. I’ve tried my best to address your questions and comments below.


So this idea originally came from maintaining color schemes for VS Code and Sublime Text. They both follow the same basic pattern (more or less) which looks something like this:

"tokenColors": [
    {
        "name": "basic comment",
        "scope": "comment",
        "foreground": "#8a8a8a"
    },
    {
        "name": "message comment",
        "scope": "comment.line.message",
        "foreground": "#8eb1e5"
    },
    {
        "name": "git commit comment",
        "scope": "comment.line.number-sign.git-commit",
        "foreground": "#ff43a1"
    },
    // ... and many more of the same "name, scope, 'settings'" structure
    // for each thing you want to colorize...
]

This is fine, but it becomes difficult to tweak these values when you’re changing things. For example, if I want to change a group of these colors, I have to go through the file and change each one. With the ZOMB format, I could just use a macro, and change it in one place.

$color = {
    basic_grey = #8a8a8a
    bluish_grey = #8eb1e5
    hot_pink = #ff43a1
}
$colorize(name, scope, color) = {
    name = %name
    scope = %scope
    color = %color
}
tokenColors = [
    $colorize("basic comment", comment, $color.basic_grey)
    $colorize("message comment", "comment.line.message", $color.bluish_grey)
    $colorize("git commit comment", "comment.line.number-sign.git-commit", $color.hot_pink)
]

You certainly could be right. I tried to find some tools to handle this, but I came up short for one reason or another – nothing solved my problem the way I wanted it to.

The other thing is that I don’t control the end file format – meaning the JSON file that must be provided to the code editor in the specified format to do syntax highlighting. Sublime Text’s format (still just JSON) tries to help with the colors by defining a “colors” object, which you can then reference, but you still have to copy and paste the basic “name, scope, foreground, …” objects everywhere.

To me if felt natural to just add a variable/macro-like syntax to define those objects/arrays/strings which I’m going to have to copy and paste everywhere, in one place and then reuse them.


So, I though about (and experimented with) many different style-related ideas – including indentation-based semantics like YAML has. Here are some design choices I made (and why):

  • I chose to keep JSON’s object/array delimiters []{} and not use indentation as a delimiter for two main reasons: (1) the square brackets and curly braces are well known and are a decent visual indicator of their respective types, and (2) using indentation as a delimiter removes the user’s ability to visually organize/style the file. That second point isn’t all that compelling, but there have been many times where I just wished a certain file format wouldn’t dictate how many/few spaces/tabs I wanted to place between things in a certain scenario – so I didn’t want to burden anyone else with that style limitation.
  • I chose to make commas optional mostly for styling – sometimes it’s nice not to need to place a comma like when the next thing is on the next line, a comma (at least as a visual indicator) is redundant. I did go through one phase where I tried to make commas required if you wanted things on the same line, but it felt like one extra rule that wasn’t really necessary. The way I designed ZOMB files, commas would only be necessary as a visual element – i.e., I didn’t need them for parsing.
  • I chose to reduce the number of double-quotes necessary. This is one thing that I get annoyed by in large JSON files…so many double quotes. Don’t get me wrong, I get the double-quote design – it’s super simple for both humans and parsers, but as I was prototyping, it became apparent that allowing bare strings was also pretty simple.
1 Like

Thanks for such a detailed answer!

The case with color schemes seems rather interesting. For my personal purposes I would use a scripting language like Ruby or Python but having a file format is better for security, I can safely use it in other software, not just for personal use.

A note on the format of color schemes - another solution is to encode everything in positional arguments, for example kakoune does it like this:

add-highlighter global/ regex '\b(TODO|FIXME|NOTE|REVIEW)\b' 0:default+rb

where you have “command” - “scope” - “type” - “value” - “colors”. Interesting bit is default+rb which means use default color (can be any named color or straight rgb value) and add to it reverse (r) and bold (b).

For my text editor I’m trying to write I chose zzz file format, also with new ideas. Interesting choice compared to ZOMB: same-line array in zzz doesn’t use square brackets [] but require commas, complete opposite of ZOMB. I think that duplicated keys in zzz can be used as an advantage for a general config file.

Color scheme could be a different case though. From my understanding, portable color schemes are provided in some kind of a parsable format. So in order to import it into some editor, you would need to do parsing and recombination of the info to the new format. ZOMB could be a good intermediate representation format, or just a format. Would be interesting to see the API with macros, since it is a partly-programmable file format.

Thanks for your work, this looks very interesting, I have a lot of new ideas.

2 Likes

So I wanted to showcase the new “macro batching” feature I’ve added.

Let’s say I’ve defined a couple of macros like this:

$color = {
    black = #000000
    red = #ff0000
}
$colorize(scope, color, alpha) = {
    scope = %scope
    settings = { foreground = %color + %alpha }
}

Obviously, I’m going to need that $colorize object many times (otherwise I wouldln’t need a macro for it), so here’s one way I might use it:

tokenColors = [
    // all the scopes I want colored with black
    $colorize("editor.background", $color.black, 55)
    $colorize("editor.border", $color.black, 66)
    // ... many more

    // all the scopes I want colored with red
    $colorize("editor.foreground", $color.red, 7f)
    $colorize("editor.highlightBorder", $color.red, ff)
    // ... many more
]

Now, this is slightly better than having to write the entire $colorize object over and over, but I think we can do better:

tokenColors =
    $colorize(?, $color.black, ?) % [
        [ "editor.background" 55 ]
        [ "editor.border"     66 ]
        // ... many more
    ] +
    $colorize(?, $color.red, ?) % [
        [ "editor.foreground"      7f ]
        [ "editor.highlightBorder" ff ]
        // ... many more
    ]

To me it’s clear that this will end up being much easier to maintain in the long run, and there’s a nice grouping structure that still has some flexibility.

2 Likes

Not to discourage you from learning and playing around! but I thought I’d mention Dhall here (https://dhall-lang.org/), which seems fairly mature and fit your bill: it’s current tagline is “Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports”.

So having Zig bindings for Dhall (and quite more challenging: a Zig native implementation) might be useful.

1 Like

Thanks for the suggestion! Dhall looks really interesting. It does seem to address my repetition problem (I knew I wasn’t the only one), but it does so in a more complicated way than I’d like. They’ve made some interesting design choices (like the in scoping keyword) but Dhall is clearly leaning towards more of a programming language than a configuration format. One of the goals I have for the ZOMB format is to keep the complexity low, and so far I think I’ve managed to achieve that with macros (_macro batching deviates slightly from this simplicity goal, but so far I think it’s rather intuitive…still a WIP though).

Thanks again! I’m really happy to have learned about Dhall. I’ll certainly poke around it’s documentation to learn more.

I’m back with an update - zomb-zig is working!

There’s an example in the example directory with a zomb2json converter, a ZOMB file, and the resulting JSON file to show that it works.

I still have a lot of work to do like a bunch of code cleanup, getting a good API for the library, more tests, more comments, and all that.

Working with Zig has been a blast! I can’t wait to get this thing finished up so I can use it for my other projects.

3 Likes