A few questions on Zig language design and implementation

Although I have many questions, let me start with simple ones:

  • Is @This() enough to guarantee that recursive types are finite?
  • Can I still create non-trivial recursive types, e.g. a pair with a swap: test.conformance/mar.tyr at master · tyr-lang/test.conformance · GitHub
  • How is evaluation order of functions with comptime(=CT) arguments determined?
  • Can I use a CT function at runtime(=RT)?
  • Is the result the same at RT and at CT?
  • Is the result of a function taking CT arguments the same for the same arguments or is it identical?

Kind regards,
Timm

  • Is the result of a function taking CT arguments the same for the same arguments or is it identical?

Functions that only have comptime arguments are memoized and won’t be executed a second time when you invoke them with the same arguments. This means that they return the exact same thing:

const std = @import("std");

pub fn UniqueEmpty(comptime id: type) type {
    _ = id; // discard the parameter value
    return struct {};
}

pub fn main() void {
    const E_1 = UniqueEmpty(1);
    const E_2 = UniqueEmpty(1);
    const E_3 = UniqueEmpty(2);

    if(E_1 != E_2)  @compileError("memoization");
    if(E_1 == E_3)  @compileError("memoization");
}

  • Can I still create non-trivial recursive types, e.g. a pair with a swap: …

See this:

const std = @import("std");

pub fn Swap(comptime A: type, comptime B: type) type {
    return struct {
        a: A,
        b: B,

        pub fn swap(self: @This()) Swap(B,A) {
            return Swap(B,A){
                .a = self.b,
                .b = self.a,
            };
        }
    };
}

pub fn main() void {
  var swp_f_i = Swap(f32, i32) { .a = 3.14, .b = 42 };
  var swp_i_f = swp_f_i.swap();

  std.debug.print("{}\n", .{ swp_f_i });
  std.debug.print("{}\n", .{ swp_i_f });
}

  • Can I use a CT function at runtime(=RT)?

There is now way to declare comptime functions. Only parameters can be forced to be comptime known. There are some types that are only allowed at comptime though (examples: comptime_int, type) and if you return such a type, the function is forced to be only used in a comptime context (such as a type declaration, top-level constant/variable assignment.

Important: Functions with only comptime parameters, but a runtime return type might still be forced to evaluate at runtime, as they can still be impure and do things like invoking syscalls and such:

pub fn readFromStream(comptime T: type) !T {
    var val: T = undefined;
    try stream.readAll(std.mem.asBytes(&val)); // access a global variable
    return val;
}

Every function can be called at comptime that has no side effects (mutates global variable or does inline assembly):

var glob: u32 = 0;

fn CountType(comptime T: type)  type {
    glob += 1; // error: unable to evaluate constant expression
    return T;
}

  • Is the result the same at RT and at CT?

Yes, comptime emulates the target architecture, up to architectural bugs like fdiv (at least, that’s the plan).


  • Is @This() enough to guarantee that recursive types are finite?

@This() has nothing to do with recursive types. It is just required for types that have no name assigned to them:

try readFromStream(&stream, struct {
    count: u32,

    // there is no way to refer to the struct declaration otherwise.
    pub fn hasCapacity(self: @This(), at_least: u32) bool {
        return self.count >= at_least;
    }
});

finite (type) recursion is enforced by doing a graph analysis with lazy evaluation:

// error: struct 'A' depends on itself
const A = struct {
    a: A, // note: while checking this field
};

Note that using a pointer type *T can be resolved lazily, and thus isn’t required to compute the size of the struct => No type recursive dependency exists, as pointers have known size without knowing the type.


  • How is evaluation order of functions with comptime(=CT) arguments determined?

everything at a non-function scope has no defined evaluation order and will be lazily evaluated when referenced. You have no guarantee about both invocation and sequence of invocation of functions called at comptime. But, as they cannot access/modify any globals, and thus, be pure functions, it doesn’t matter.

On function scope, they follow the same top-down evaluation order which normal statements have:

fn CountType(comptime count: *u32, comptime T: type)  type {
    count.* += 1;
    return T;
}

pub fn main() void {
    comptime var count: u32 = 0; 
    var a: CountType(&count, u32) = 0;
    var b: CountType(&count, u32) = 0;
    var c: CountType(&count, u32) = 0;
    _=a;_=b;_=c;

    // | 3
    // ./example.zig:14:5: error: found compile log statement
    @compileLog(count);
}

comptime var is a function-scoped variable that can be mutated at comptime, but acts as a runtime constant. This allows nice things as the one displayed above

1 Like

OK, thanks a lot for the details.

Based on your first example:
Is there no way to check for type equivalence?

Furthermore:
Maybe a step back: Maybe I did not get comptime right. Is a comptime paramater basically a value whose bitpattern is known by the compiler before evaluating the respective function?

If a function can create a type and that function can be RT-executed, will there be any optimization?
Also, does this imply that the first thing a Zig application does is to reconstruct its type context, because it might be required?

I’m still not sure if I understood the answer to the Swap/evaluation order topic.
Let me add some spice here.
Let’s assume we had a container library offering Array(T), Set(T) and Iterator(T).
I want Array and Set to offer Iterators and I want Iterator to be able to transform into Array and Set.
But, I do not want to declare Array and Set in the same compilation unit.
Can this be done?
The issue with the Swap response is that the strategy would also work with Ada which is known not to scale to real world complexity wrt. container libraries and other CT- or template-related issues. Its still fine for its target niche:)

Also, regarding the A/A example. If the general strategy is lazy evaluation + abusing the name as elaboration marker, does this imply that a definition like:

A { a: *B}
B { a : A}

Is not allowed?

Can a function definition depend on its own parameters, e.g.

fn Add(comptime T: type, comptime a: T, comptime b: T) T { return a + b; }

Lastly, I got another graph-related question:
I noticed undefined. Is there a way to check initialization of undefined values?

First of all: Sorry for the late answer!

Based on your first example:
Is there no way to check for type equivalence?

What is “type equivalence” here? The same type is simple done by TypeA == TypeB. For structural equivalence, you have to implement that in userland yourself, as there are no clear rules for that.

Furthermore:
Maybe a step back: Maybe I did not get comptime right.

I get that feeling that this might be the case. comptime is basically just C++ templates with template parameters and function parameters intermixed into a unified syntax. Functions will just get one instantiation per comptime parameter set like C++ templates get one instance per template parameter set. The optimizations on runtime code apply in the same way. This means that a division by comptime known value will be optimized into something more efficient when possible. The same goes for branches of any kind

But, I do not want to declare Array and Set in the same compilation unit.
Can this be done?
No, as zig itself can only interact with C ABI between compilation units.

Can a function definition depend on its own parameters, e.g.

fn Add(comptime T: type, comptime a: T, comptime b: T) T { return a + b; }

Yes, this is totally possible.

Lastly, I got another graph-related question:
I noticed undefined. Is there a way to check initialization of undefined values?

No, undefined is literally the message to the compiler to not initialize something. There are no additional bits stored. The compiler will just not emit any initialization code for that variable. In Debug builds, undefined is filled with the bit pattern 0xAA…AA to make it easier to find undefined content.

Also, regarding the A/A example. If the general strategy is lazy evaluation + abusing the name as
elaboration marker, does this imply that a definition like:

Assuming you meant this:

const A = struct { a: *B };
const B = struct { a : A };

It’s 100% possible, as *B is a pointer type, thus B can be lazily resolved. Type names are not involved in lazy/deferred type resolution, only in lazy declaration evaluation (Decls not referenced are not evaluated. Once referenced, they get fully evaluated.)