-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lambdas, Closures and everything in between #1048
Comments
Trampolines require that the stack is marked executable. That's not the default on most systems and it reduces security, it makes injecting shellcode much easier. I have some ideas on how to tackle closures but I don't want to derail this issue with counterproposals. |
How the closures is alloc? |
@bnoordhuis Huh, you got me there, completely forgot that damn. I guess the big challenge with this problem is trying to figure out an efficient solution that isn't just 'every closure gets a struct with a function ptr' since that just deliberately hides information from the coder. Counter proposals are fine since its more an implementation detail. The other idea I had was a 'fat ptr' like solution however that is just bloating a lot of calls, and forces us to use C style function calls (that allow extra parameters). |
@bronze1man sorry I don't understand your questions? No need to free closures as we are talking about stack allocations, you can read up on trampolining if you really want the nitty gritty implementation detail. |
Could we use @newStackCall as part of the allocation. var needle : i32 = 10;
var lambda = fn [needle] (x: var) bool { return x >needle; }; // yes cpp capture syntax What I image is lambda is a struct, that has captured init stack instance and a function pointer: struct {
stack: Struct { needle : i32 },
func: (x: var) bool,
} Calling the lambda would use newStackCall, I am just not sure how to put the stack into scope for the call. |
Actually I think you are onto something, originally I dismissed your idea since I was like "nah that can't work", however upon considering it I definitely think it could work. What if you could bind a stack to a function, regardless of the functions purpose such as;
var stack : [8]u8;
var x : i32 = &stack[0]; // skipping cast to make it easier to understand
x.* = 10;
var lambda = fn (a: var) bool { return a < @stack(0, i32); }; // Longer syntax for clearness, @stack giving you the stackptr, then asking for an 'i32' integer
doWhatever(@bindstack(stack[4..], lambda, i32)); // @bindstack returns a function ptr, 4... to indicate stack begins from the '4'th position Basically the language would have the concept of allowing you to bind a stack to a function, so there is no need to 'implicitly' cast. In effect yes it would look like that struct you gave (kinda) but instead of requiring some weird obscure cast or changing how functions work, we just allow functions to carry a different stack then the one we give it. Of course this requires the function to have a scope within the caller, if the scope is outside perhaps we allocate at start of program then deallocate at end? As a side note I'm also not too sure about cpp syntax, it looks clunky; and is the cause of much confusion and bugs due to having some finer details that are a little odd. I prefer the more common style that is prevalent in languages like C#, Go, and Python for example since it is much clearer. |
I get the capture syntax is tricky when your learning but it was added to c++ for good reason. One of the good things about the capture syntax is it gives you control over what goes into the closure. This fits nicely with the zen and my personal rule of the principle of least surprise (ie, I have no idea what is getting capture in the closure and I have run into performance issues and bugs in D where it was capturing and trying to make deep copies of things I did not want/need.) |
Of course, however I still feel that the syntax I originally proposed is clearer i.e. Honestly a personal preference and we both agree that a clear copy/by ref is really needed which is important, which way it goes I think is not important right now; more important is figuring out how the lambda will be implemented. |
Ah sorry, I was not trying to defend the c++ syntax, just the concept of capturing. So yeah I am on the same page as you. |
@BraedonWooding When I say that more discussion is needed before Zig gets closures, it has nothing to do with your proposal, only with the concept in general. I just don't think that this anonymous functions and closures are a good idea in Zig. Zig aims to be a C alternative, not a C++ alternative. C does not have anonymous functions. Lambdas have not been added to C because over 45+ years, no one has ever felt they they needed them. Anonymous functions are simply not useful in imperative languages such as C, and Zig. I love Zig because it aims to be an alternative for C, and it is the only language with this goal that is on the right track. As of now, Zig is small and simple, and fit for low-level/embedded/os development. Rust is horrible for this purpose: it is endlessly bloated with features, lang items, and 3 standard libraries (core, alloc, std). But individually, these features are very small, like Rust closures. But boy, do they add up, and make for a hell on earth. So, unless you can come up with some real-life code examples of why we need lambdas, rather than just some quick convenience in sort(), I remain unconvinced. Everyone, please give your own perspectives and ideas! I don't want to be "that guy", just spouting disapproval in every Zig github issue. ;) |
this proposal seems way to complicated to me the Syntax of funcs should always be nearly the same. A normal func looks like this:
thus a lamda should look like this:
And it should be possible to have a func inside a func so that it can only be used locally. This is good for code hygiene and refactoring. The Syntax is the same as for a global function thus its very easy to move them around. closures are simple as well
that is obviously false, just cause its not in c does in no way mean no one needed it |
Here are your examples written in current zig. First:
Second:
I don't see the benefit of these functional programming features here. I'm not saying I hate closures: I just want to see an example of them being legitimately useful. Zig zen bullet 4:
|
local functions would be useful as described
Making foo available although it is just used for sorting is bad because it makes it subject to being used elsewhere thus unwanted dependency arise and refactoring becomes a problem. That is why with local functions and lambdas are useful. Of course those are contrived examples. I'm not all for closures, I was merely pointing out that they can be added to the language without fuzz/ real new syntax
If you look at how zig does OO this is not the case currently #1205 (comment) (its clearly not the obvious way but rather the hacked way). Thus some language features might indeed be needed to improve code quality. |
@monouser7dig
Ok, finished my case study of std/io.zig. It's a bit cluttered, and you're right: it's a bit hacked. But the Plan 9 extension thing is the solution: not closures. I can imagine an InputStream interface being defined as a struct with only abstract functions, which would be anonymously included in a FileInputStream. That would be a nice, clean, "OO" way to do this. (Besides, the code for FileInputStream and InputStream could just be merged into one "InputStream", everything should be abstracted as a File in the first place, but that's a discussion for another issue/MR) EDIT: |
Is a function obj one ptr size or two ptr size? If zig design a syntax that can support closure without hidden alloc. Then there may be two type for runnable type. One for no context plain function which is one pointer size. One for closure with context which is two pointer size. |
Just a note, local, anonymous functions can technically be done already, albeit with a very non-intuitive syntax.
|
then just do it like coroutines:
the syntax is already established in zig you just have to combine it if needed I put the
|
@bronze1man, why can't a closure be allocated the same way any other object on the stack is allocated? |
@binary132 |
what do you mean?
yes I think this would be possible with the proposed syntax that would mimic coroutines, right? |
Sorry, I read the document of the coroutines, and I found I am wrong. |
I have a propose of the syntax of anonymous function and closure: anonymous function:
anonymous function is a function type, it is one pointer size. It can not capture any local variables. closure:
closure is a type define in std package. it is not the same as function type.It can pass in local variables by function call. |
If the user wants to capture variables in a closure, they should be careful not to capture stack-locals by reference if that closure is going to be passed back. I guess that's why C++ lambdas default to capture by-value. Otherwise, I don't see the problem, and I definitely do not agree that Zig should be careful not to introduce a syntax that causes heap allocation implicitly or by default. On that note, I actually really like the C++ syntax for lambdas. It is maybe too featureful, but it is cool. Another point re. C++ closures: if the value is moved into the lambda, then you can just deallocate it when the lambda's lifetime ends, just as you would with any other value. I don't know if that is supported in Zig, but it would be great. |
@binary123 ... the other issue about „CPP Satax is nice“ really destroys the language because having two totally unrelated ways of defining functions/ closures makes for very bad experience as described earlier. Zig must keep the syntax coherent and not just copy paste different languages together. |
I said I personally like the C++ lambda syntax. I did not say I think Zig should use it. As a matter of fact, I criticized it. I am not at all suggesting Zig should bolt on syntax features from other languages. I am suggesting Zig should learn from the semantics of C++ lambdas. I agree, having a single consistent syntax for functions is a good thing. But if function declarations can capture variables, that syntax should take capture semantics (move, stack reference, allocation? please no. etc.) into account. This is an unending source of foot-shootings in Go. I personally like the way Rust does it where the closure context is an implicit struct having the closure as a "method". If you then disallow structs borrowing stack-local references to leave the lifetime of those references, that might be one way around it. But that would add a lot of language complication to get something which could be solved just by letting the user decide whether to capture by reference or not. |
@binary132 C++ does the same thing as Rust. If you know C++03, there were only hand-written functor structs. C++11 just added compiler-generator anonymous structs. |
@monouser7dig borrowing from other languages can be an invaluable tool. Programming languages have been around for a lot longer than Zig, and many encountered similar issues. Looking at how they solved certain problems should be encouraged. |
You could just have a comptime function to create a struct type, and construct the struct, from some names in local scope, optionally by reference, and a function that would become an "eval" method on that struct. That new struct type would then be your "functor".... To evaluate the functor, you'd need to then call That might fit nicely with the "everything is a struct" issue. |
AFAIK the idea of comptime does not extend to complete code generation. It isn't the same as a preprocessor. From what I understand, there is no way to generate a struct on the fly from a set of identifiers using comptime. I came across a simpler instance of this recently. As far as I know, there is no way to generate an array of strings corresponding to enum members at compile time. This is a somewhat unrelated issue, but thought it was worth bringing up to point out comptime isn't a panacea. |
you'd need some more type/reflection capabilities in comptime, and something like a "context" object containing metadata of the local context. |
@isaachier thats no excuse for inventing new language constructs that could have been done equally well with existing syntax which does exist as shown above by me, or are there any remaining shortcomings in that syntax that you see? |
@monouser7dig 100% agree there. If there is a straightforward path from existing syntax to a certain feature, that is the best approach. |
http://number-none.com/blow/blog/programming/2014/09/26/carmack-on-inlined-code.html
making another point for local functions #1048 (comment) (and a sane syntax, not structs, that encourage usage of such) and as a further IMO very good suggestion
|
It's already possible to define a "closure" using a function within a struct. This program compiles successfully, for example:
|
@jarble Note that this only works as |
@bjornpagen this is not true. |
This works at runtime:
|
@leira This works because of first-class function support in the language. For an example of where it falls short, notice that First-class functions without closure support can be a bit frustrating, which leads me to this issue in the first place. I was hoping to build parser combinators with function composition, and lack of closure support is kinda biting me, mostly because I was really determined to try to find a workaround but have so far been unsuccessful. 😅 |
I wonder how no one mentions https://www.nmichaels.org/zig/interfaces.html |
I don't know why Andrew marked as completed. Maybe a little unrelated but it's a sticking point where someone mentions "I like to be able to create code dynamically like via macros. It dramatically eases some workflows" and points to Rust macros or even the C preprocessor and then you point out Zig's compiletime system and say "no need" but they say "for structs; what about functions?" and I really don't have much of an answer. I know being able to construct functions at compile time may be a bit much but to me it does seem very much possible with my understanding even without macros. I agree with those that say that using structs to define anonymous functions is a major weakness and easily needlessly complicated. To those who say "why add a language feature when it's already possible", I point to the entire language and say "why add a language when it's already possible via C". Having a common feature be achieved via a workaround (even if small), especially if users of all levels will have to see, do, and deal with it is, in fact, a major weakness. We shouldn't need to use the struct workaround. Honestly, when I first came to this language, I was extremely confused by the fact that functions aren't just all consts bound to anonymous functions / closures. It's very bizarre for a language seemingly on the surface as fluid and reflective as the ocean, be full of these hard, inflexible poles sticking out. |
The issue close reasons is a pretty recent GitHub feature, and didn't exist at the time this was closed. (GitHub marked all old closures as "completed" when they added this feature.) Functions returning functions is absolutely possible: comptime values can be captured into a nested struct scope, and logic in the function can use those comptime-known values to do conditional compilation. fn isMultipleOfFn(comptime n: u64) fn (u64) bool {
return struct {
fn f(x: u64) bool {
return x % n == 0;
}
}.f;
} This is a little bit clunky, but that's kind of intentional. Zig to an extent discourages this kind of metaprogramming pattern, because logic is almost always significantly easier to write, understand, and debug when functions contain "direct" logic, even if it's a little longer. If you're using callbacks, it's almost always more correct to use a context struct with methods, to avoid requiring globals. But this functionality exists and works, and you can use it if it's genuinely the right tool. For more complex code generation, the Zig build system makes it really easy to add custom build steps that generate Zig source code if required. Again, this isn't a thing you should have to do often, but if it feels like the right choice then there's nothing wrong with doing it! As for the rest of your comment, it sounds a lot like you're discussing #1717: I recommend taking a look at the last comment on that issue for a bit more context as to why functions retain the syntax they have today. |
Hm yes that does help to elaborate. I disagree but I can see a bit more. I still think that if we have Although I adore functional programming, I completely understand sticking to your imperative guns but there are many times where using callbacks is not only very warranted but necessary so I feel like punishing users for doing something they may have to do frequently if they're working with certain C libraries and many low-level APIs is not good. I'm specifically thinking about windowing and audio which tends to rely heavily on you supplying callbacks. Of course that's all very much possible with Zig and furthermore, naming those callbacks is a good idea but it makes the point that there are plenty of times that a user would have to supply a small lambda / callback and anonymous functions would make that so much easier. I also don't like how they said that "anonymous functions don't show up in call stacks" because Rust proves this (like a lot of anti-anon-func arguments) not necessarily correct. Anonymous functions may not have names but that doesn't mean they can't have ID. In rust, the name and type of closures (closure are also unique anonymous types) is something like Oh well I suppose |
In pretty much any scenario, plain callbacks actually aren't what you want: instead you want a context type. The core issue here is that Zig doesn't have closures, so you can't easily refer to locals scoped outside your callback. So let's say you're using some library which streams data to you in a callback, and you want to buffer all that data into an ArrayList. With simple function callbacks, that would look like this (in a fictional world with anon functions): test {
var data_stream = try beginDataStream();
var bytes = std.ArrayList(u8).init(std.testing.allocator);
defer bytes.deinit();
try data_stream.stream(fn (data: []const u8) !void {
// Hmm...
});
} What do we put in the function body? We can't easily refer to C libraries generally handle this by allowing the callback to receive an arbitrary user-supplied pointer as a parameter, generally called the "user pointer" or "user data" or similar. This might be e.g. an integer you've cast to a pointer, or maybe it's an actual pointer to some bigger structure in memory. This solution totally works, but it comes at the cost of type-safety and may add unnecessary pointer indirections. It makes sense for C, but in Zig we can do better! This is where context structs come in. Instead of taking a single function as our callback, we instead take a type with a method on it (we would usually put all the callbacks we need on that one type). We then take a value of that type, and call the methods on it. So to implement the above example, you do this: var data_stream = try beginDataStream();
var bytes = std.ArrayList(u8).init(std.testing.allocator);
defer bytes.deinit();
const Context = struct {
out_bytes: *std.ArrayList(u8),
pub fn process(ctx: @This(), data: []const u8) !void {
try ctx.out_bytes.appendSlice(data);
}
};
try data_stream.stream(Context{ .out_bytes = &bytes }); This solution gives us more type safety, improves clarity, and allows the context to be stored and passed directly, potentially avoiding an unnecessary pointer indirection. I don't want to assert without evidence that this pattern is always applicable where you want to use a callback, but it definitely makes sense in every case I've seen. The fact is, plain anonymous functions in a language without closures aren't actually overly useful. They sometimes make sense for, say, sorting functions (to which you are providing a comparison function), but sorting can still make use of contexts sometimes, so it still makes sense for us to provide this more powerful (and still safe) API. |
This works at runtime without context structs: const print = @import("std").debug.print;
pub fn main() void {
var j: usize = do: {
for(0..101) |i| {
if (i*2 == 100) break :do i;
}
};
procedure(j);
j = 100;
procedure(j);
}
fn procedure(j: usize) void {
const closure = (opaque {
var hidden_variable: usize = 0;
pub fn init(state: usize) *const @TypeOf(run) {
hidden_variable = state;
return &run;
}
fn run() void {
print("{}\n",.{hidden_variable});
hidden_variable += 1;
}
}).init(j);
useClosure(closure, 10);
}
fn useClosure(func: anytype, times: usize) void {
for (0..times) |_| {
func();
}
}
|
@expikr container-level variables have static lifetimes, so unfortunately that pattern doesn't have the intended effect: all the "closures" created using that method share the same const print = @import("std").debug.print;
pub fn main() void {
const counter1 = counter(0);
counter1();
const counter2 = counter(5);
counter2();
counter1();
}
fn counter(j: usize) *const fn () void {
return (opaque {
var hidden_variable: usize = 0;
pub fn init(state: usize) *const @TypeOf(run) {
hidden_variable = state;
return &run;
}
fn run() void {
print("{}\n", .{hidden_variable});
hidden_variable += 1;
}
}).init(j);
} This prints 0, 5, 6, while it would be expected to print 0, 5, 1 if these were true closures. Using a context struct avoids this by leaving the caller in control of the memory for the state data. |
one could further force a hack with const print = @import("std").debug.print;
pub fn main() void {
const counter1 = counter(0);
counter1();
const counter2 = counter(5);
counter2();
counter1();
}
inline fn counter(j: usize) *const fn () void {
return (opaque {
var hidden_variable: usize = 0;
pub fn init(state: usize) *const @TypeOf(run) {
hidden_variable = state;
return &run;
}
fn run() void {
print("{}\n", .{hidden_variable});
hidden_variable += 1;
}
}).init(j);
} Of course, this is still static instancing at compile time, but for use cases where you don't expect to have runtime-known instancing counts it could be more concise under specific circumstances. Mostly it's just for fun to figure out code golfing tricks though |
Key points of anonymous functions:
// 1
const f : fn(i32,i32) i32 = (a,b) => a+b;
// 2
const Func = fn(i32,i32)i32; // Arbitrarily complex function types
const f: Func = (a,b)=>a+b;
// 3
const f = (a,b)=> a+b; // fn(a:anytype, b:anytype) @TypeOf(a,b) {return a+b;}
// 4
var a:i32 = 123;
var b:i32 = 456;
const f = (self)=> self.a + self.b;
_ = f(.{.a = a, .b = a}); // capture var
// 5
const a = 123;
const b = 456;
const f = ()=> a + b; // capture comptime const
// 6
const f = (a,b)=>{return a+b;} // Similar to below
//const f = struct { fn anonymous(a:anytype, b:anytype) @TypeOf(a,b) {return a + b;}}.anonymous; |
Would this work? fn someFunction() void {
// some local vars to capture
var x: u32 = 0;
const y: i32 = 10;
// proposed syntax A, like C++'s auto x = [&]() -> void {...}
const lambda = .{&x, &y}() void {
...
};
// or (preferably) proposed syntax B
const lambda = @lambda(.{&x, &y}, inline fn () void {
// pointer
capture[0].* = ...
// copy
capture[1] = ...
...
});
// which expands into this, for a single @lambda() call with unique captures: (works in 0.12)
const lambda = blk:{
const _capture = .{&x, y};
const Lmb_Ptr_x_Val_y = struct {
var capture: @TypeOf(_capture) = undefined;
inline fn func() void {
...
// pointer
capture[0].* = ...
// this is a copy
capture[1] = ...
}
};
Lmb_Ptr_x_Val_y.captures = _capture;
break:blk Lmb_Ptr_x_Val_y.func; // or &Lmb_Ptr_x_Val_y.func
};
lambda(); Otherwise if more than a single lambda uses the same set of captures (and all of them being pointers), this could be expanded/optimised into: var x: u32 = 0;
var y: i32 = 10;
// inserted directly below the last capture
const Lambda_Ptr_x_Ptr_y = struct {
var capture: @TypeOf(.{&x, &y}) = undefined;
inline fn funcA() void {
...
// pointer
capture[0].* = ...
// this is a copy
capture[1] = ...
}
inline fn funcB() void {
...
}
};
Lambda_Ptr_x_Ptr_y.capture = .{&x, y};
...
const lambda = Lambda_Ptr_x_Ptr_y.funcA;
const lambda_b = Lambda_Ptr_x_Ptr_y.funcB; |
I've been thinking about this topic for a few weeks now, and after quite a bit of research I think I have a solution that fits Zig's needs :).
This is building on #229 (and similar issues), since this is talking about implementation specifically and the issue is old I felt it was worth creating a new one.
Step 1: Make all functions anonymous; I.e. you can define a function like;
const Foo = (bar: i32) i32 => { ... };
however of course we could support usingfn Foo(bar: i32) i32 { ... }
as the short form. In this way you can define functions inline, these aren't closure functions and when compiled will be placed outside the scope of course.Step 2: Lambdas; Building onto anonymous functions you can define 'inline' lambda functions which are just a single statement that returns like;
The
$
just acts as a wild card effectively matching whatever the types the input requires, if the input is a 'var' type then it makes a function def that isfn X(a: var, b: var) var
, perhaps? Or maybe that is just a compile error, I'm not sold either way.Step 3: Closures; In the case where you actually want a closure you would define it like any other function but indicate type of function input;
The above is synonymous to the following Zig code if we allow some kind of implicit cast;
HOWEVER, this is where the problem occurs, you require this pointer to exist in the definition, and so we need someway to get around this call and it's been suggested in the past that you can pass around some kind of 'closure' type that allows you to call it like a function but is really just this struct, personally this hides information from the coder and I feel goes against Zig's core, and furthermore would you allow the above 'closure' type to be passed into a function with definition
(a: i32) bool
?Instead I propose that we can use LLVM Trampolining in quite a few cases to 'excise' a parameter from the call, which would be the closure information and the call would rather become something like;
Note: of course in this case I'm using trampoline as if it was an inbuilt, I'm not actually sure if we want it as an inbuilt, but in reality it is more like generating the LLVM code that trampolines the function. A trampoline (said that word too many times now) just basically stores the value on the stack in advance to the call this would be much more efficient then a typical closure as it would avoid the need for a function pointer and would avoid the ugliness of indirection. HOWEVER, there is not much information on how this effects things like 'arrays of closures' which may instead require a different approach. Note: this is how GCC implements nested functions this is a relevant section.
So basically I propose that we approach this in the order I've given, and perhaps implement that builtin before integrating the closure
x = x
syntax.The text was updated successfully, but these errors were encountered: