Skip to content

Conversation

dra27
Copy link

@dra27 dra27 commented Oct 2, 2025

Eliminates both the need to probe AT_FDCWD and the need to use Obj.magic in the library to pass it to the C stubs.

If my understanding of the calling convention for optional arguments is correct, even with the default in the code and a non-opaque interface, the option is still allocated at call-sites, so I don't think the allocation profile changes.

Copy link
Contributor

@avsm avsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me, but not merging yet as I haven't run an e2e test on eio

Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding of the calling convention for optional arguments is correct, even with the default in the code and a non-opaque interface, the option is still allocated at call-sites

That's disappointing. But I agree the new code is cleaner anyway.

@talex5
Copy link
Collaborator

talex5 commented Oct 7, 2025

This isn't an objection to this PR, but I wanted to comment on the blog post you linked elsewhere (File descriptors are not integers):

Stepping back, let’s consider if instead of treating file descriptors as int, Thompson and Ritchie had instead used opaque pointers for open, read, and so forth in the first version of Unix.

I think the blog is trying to argue that FDs-as-ints is just a C wart and therefore OCaml code should never see them (and there should be no way of getting the int from a Unix.file_descr). But the ints don't come from libc, they come from the kernel/user-space interface. If Unix had never had a C userspace and used OCaml from the beginning, we'd still need to get the underlying ints to talk to the kernel.

For example, we could easily imagine rewriting liburing in OCaml. Then the OCaml code would certainly need to know the integer value in order to write it to the ring buffer.

The idea that OCaml shouldn't see the ints pushes us in the direction of using more C code, which I dislike. Having access to the ints is dangerous, and I'd much prefer dangerous code to be in OCaml where it's easier to keep it under control.

With all that said, I don't particularly mind this PR. The logic being moved to C is only a few lines, and I don't think it matters either way. The main advantage is it works around the lack of a Unix.int_of_file_descr, but really that should be fixed.

@avsm
Copy link
Contributor

avsm commented Oct 7, 2025

My takeaway from the blog is really that Windows file descriptors genuinely aren't ints; on Linux as @talex5 notes they are most definitely integers (actually, just offsets into a kernel-side table) on every Unix I've ever used. So for Linux-specific code, I am in favour of having a private int be their type (e.g. to check it's a non-negative number).

I'm in favour of this PR because it's patently obvious from ocaml/ocaml#14258 that any use of Obj should be avoided where possible, and shifting our representation of fds (on Linux only) will make this possible.

Perhaps we should be plumbing this deeper into Eio_linux and just not using Unix.file_descr until the last possible minute that it has to be passed into an existing Unix function...

@dra27
Copy link
Author

dra27 commented Oct 7, 2025

The point I was trying to make when considering the C type as abstract is that they are not general integers - yes, they are C integers and we then choose in the implementation in OCaml to represent those as OCaml integers an I contend exposing that is the danger. It the C fd were a pointer (or, perish the thought, if you’d had a Unix system which used the full range of int for fds!) then we wouldn’t think twice about using a different OCaml representation. (Windows - and my Windows involvement - I think is a red herring here)

I’m a pragmatist - give me an OS with OCaml back to the kernel and we’ll do that (if the bindings were written using ctypes, we’d possibly have a different story) - but while there is C code doing the glue, I think those boundaries are better respected.

On a similar pragmatic note, there will never ever be Unix.int_of_file_descr … I’m all for dreaming when I’m asleep, but otherwise I prefer to put brain cycles to “could ever happen” solutions 😊

@dra27
Copy link
Author

dra27 commented Oct 7, 2025

Oh, by the by - if it turns out that the option can be better optimised, my suggestion instead would have been to retrieve the magic value from C at program startup

@avsm
Copy link
Contributor

avsm commented Oct 8, 2025

I think I understand your point now @dra27, but let me restate it to check. It is that the OCaml representation of an integer is different from a C one (the high bit) and so we need to do a representation transformation to turn it into a C fd anyway. And to do that, we should keep the type abstract and only expose safe conversions to and from Unix.file_descr and C ints. Is that right?

I think there are three uses of an abstract unix fd:

  • pass it into C level bindings
  • print the fd number for external tools that need it (a formatter function does the trick here)
  • hand it off to the OCaml Unix module, which needs conversion to a Unix.file_descr

@dra27
Copy link
Author

dra27 commented Oct 8, 2025

Yes! And indeed in dra27/ocaml@562a497 there's a a logging function for the OCaml side. Key to all of it - and a flaw in OCaml's Unix API at the moment - is I absolutely think you should be able to keep that type abstract in OCaml without incurring any additional conversions or OCaml/C call boundaries (the latter is what I dislike about #129 - no magic, but an extra crossing from OCaml to C). The C API part of it I propose in dra27/ocaml@112b274 also very intentionally inlines down to just the Int_val(fd) you'd expect on the C side when running on Unix. It's just predicated on the assumption that you'd always be doing that immediately prior to calling a kernel (i.e. C) function anyway, so there's no extra C call incurred there.

@dra27
Copy link
Author

dra27 commented Oct 8, 2025

Incidentally, I haven't double-checked the details on clambda, but the fact that both flambda 1 and 2 optimise optional arguments properly means that the approach in this specific PR should be different, and I'll push that suggestion at some point in the coming days.

@avsm
Copy link
Contributor

avsm commented Oct 8, 2025

Thanks for thinking this through so carefully. In terms of uring, we have other allocations can easily elide more (such as the option allocation for ring responses) but am saving that for a future oxcaml port instead...

@avsm
Copy link
Contributor

avsm commented Oct 8, 2025

As an amusing idle thought, the hard fd limit ulimit -Hn shows it's roughly 2^31 on Linux, which means that an OCaml pointer would be an adequate representation of the limited range C integer ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants