Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support elixir 1.15 and OTP 26 #72

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

juanazam
Copy link

@juanazam juanazam commented Sep 5, 2023

Closes #69

Update to Elixir 1.15.2 and OTP 26.0.2
The main issue here is that Port.comand is erroring out because of a behavior change on OTP that's not going to be updated.

Related issue on Statix: #69
Related issue on top: erlang/otp#7130

In summary, Statix was using Port.command to avoid a performance issue with using :gen_udp.send.
This approach no longer works because of OTP updates happening at the networking layer.

In order to avoid this, this PR uses gen_udp.send (which has been optimized), but with this change, a lot of the existing code is no longer useful.

Port.command expects a packet containing the IP and port of the destination to which the packet will be sent.
This is no longer required with the new approach, but we need to keep the address and port on the Conn module to be able to do the :gen_udp.send calls to send Statds servers.

This patch has been tested in production, see: knocklabs#1

@juanazam juanazam changed the title Jazambuja update to elixir 1.15 Support elixir 1.15 and OTP 26 Sep 5, 2023
@Billzabob
Copy link

Any updates here? How do we get this merged? Anxiously awaiting this since it's the last thing stopping use from going to OTP 26.

@juanazam
Copy link
Author

@Billzabob I'm not sure to be 100% honest.

@lexmag Sorry to ping you directly (I know you haven't been active on this project for a bit), but is there any chance we can get this reviewed, merged, and released?

@ckoch-cars
Copy link

@lexmag
Bump

@aselder
Copy link

aselder commented Sep 25, 2023

If @lexmag has abandoned this project, does anybody want to take it over, either by getting the repo transferred or creating a canonical fork?

@andyleclair
Copy link

Any updates on this @lexmag? This is a blocker for us moving to OTP 26 as well

@dhaspden
Copy link

dhaspden commented Nov 7, 2023

This is blocking us. How can we get this published or migrated to somebody who will publish?

Thanks

@carrascoacd
Copy link

carrascoacd commented Nov 10, 2023

Since this is blocking many of us, the company I'm working on, Cabify, could try to take the temporal ownership of this repo under this fork: https://github.com/cabify/statix. We've been running this solution in production for some weeks with success.

If you are ok with this, please, react to this message and open the same PR in the fork and we will be glad to merge it and publish the library under a temporal name cabify_statix.

We want to make explicit that, in case the owner is back, we will just get rid of the fork and sync it with this repo. So if you agree, let's continue and adopt OTP 26! If you are not ok with this proposal, just let us know also 😁

@juanazam
Copy link
Author

@carrascoacd Thanks for testing the fix and taking ownership of the package!

@carrascoacd
Copy link

Before continuing, please @whatyouhide since you appear as the maintainer of the library. Could you share your thoughts?

@juanazam
Copy link
Author

Update here, I have reached Andrea and he has not been involved with the project for a while, he suggested reaching for @lexmag, now I'm trying to get a hold of him.

@lexmag
Copy link
Owner

lexmag commented Nov 18, 2023

Hi folks, I'm sorry to have kept you waiting on this PR.

@juanazam thank you again for reaching out.

First of all, I must say that Statix does not indeed get a high priority among all activities I have to address, but I still do want it to be fixed and improved, thus, during next two weeks I'm going to merge this PR, look into other issues, and level the project up in general.

That's said, if anyone is willing to push the work beyond what I'm able to dedicate this time, nothing stops you from doing that considering the license is extremely permissive.

As for this PR, I first need to read up on what specifically has changed in OPT 26.
From a quick glance, the PR unfortunately cannot be merged in its current shape as it drastically affects the performance of OTP pre-26. We need to have a conditional compilation to preserve the previous approach for older OTP versions.

@juanazam
Copy link
Author

Hi @lexmag,

Thanks for taking a look at this.
Regarding older versions of OTP, I didn't want to spend a lot of time on it without knowing what's the plan on maintenance for them.
I saw some checks for OTP 18 compatibility, which was released in June 2015.
Maybe it's work considering doing a major release and remove support for older OTP versions to keep the code simpler (this is just an idea).

If you know what's the oldest version you would want to support and have ideas on how to support it, I can help with updating this PR.

@imsoulfly
Copy link

Hey there,
kindly asking what the ETA on a release of this might be? 😃

@juanazam
Copy link
Author

Hey @lexmag do you need any help with this?

@mtrudel
Copy link

mtrudel commented Feb 12, 2024

From a quick glance, the PR unfortunately cannot be merged in its current shape as it drastically affects the performance of OTP pre-26. We need to have a conditional compilation to preserve the previous approach for older OTP versions.

Does anyone know specifically what @lexmag is referring to here? I don't immediately see anything in this change that sets up alarm bells performance wise...

@juanazam
Copy link
Author

@mtrudel I can give some context there. It seems that using :gen_udp.send instead of Port.command was less performant on older versions of OTP.

From the related OTP thread erlang/otp#7130 (comment)

In epgsql direct port call was implemented (the idea was borrowed from RabbitMQ) because epgsql tends to accumulate a large message queue (due to active=true by default). And pre-OTP26 gen_tcp:send had a performance issues when the process calling it has a long message queue.

@mtrudel
Copy link

mtrudel commented Feb 16, 2024

@mtrudel I can give some context there. It seems that using :gen_udp.send instead of Port.command was less performant on older versions of OTP.

Thanks for the context @juanazam!

Unless I'm reading this wrong, the way statix currently uses Port.command nets to the exact same thing that RabbitMQ's hack seeks to avoid. To wit:

  • RabbitMQ uses Port.command specifically to get around the fact that :gen_tcp.send selectively receives on a {:inet_reply,...} tuple to hear back from the port.
  • They're able to get the benefit of this solely because they purposely do not receive the response code at the point of call. (Separately and mostly unrelated, they drop those messages on the floor as part of their main message handling logic so as to avoid unbounded mailbox growth).
  • Again, the whole point of this approach is to avoid having to selectively receive on the {:inet_reply,...} tuple at the time you want to send data to a port.

With this in hand, let's look at how statix does this today:

defp transmit(packet, sock) do
try do
Port.command(sock, packet)
rescue
ArgumentError ->
{:error, :port_closed}
else
true ->
receive do
{:inet_reply, _port, status} -> status
end
end
end

  • On line 52, we call Port.command instead of gen_udp.send. Cool. We've avoided a selective receive so far!
  • Then, on line 59, we manually make the exact same selective receive we're trying to avoid. Not only does this net to the same amount of worst-case work as gen_udp.send, but it loses all of gen_udp's careful handling of edge cases, and even more importantly, it uses private APIs to do so.

Again, unless I'm misunderstanding this, statix's implementation here amounts to an 'adhoc implementation of half of gen_tcp.send' (with apologies to Robert Virdig). There's no point in trying to worry about preserving what's here for older OTPs because it's been slow this whole time.

What am I missing here?

@juanazam
Copy link
Author

@mtrudel Everything you mention here makes sense! I tried digging up discussions on old issues but couldn't find anything meaningful. The Port.command(sock, packet) call was added as part of the initial commit, so it has always been there.

I remember seeing a discussion about this somewhere, but I can't find it (this was months ago so my memory is failing me), sorry!

@carrascoacd
Copy link

If the performance problems only apply to older versions, we could add a conditional compilation as suggested, ie OTP < 26. It should work. It should unblock this while we find how to improve it.

@juanazam
Copy link
Author

Decided to implement backwards compat as @carrascoacd mentioned to make this even less of a concern. It's not the cleanest implementation but IMO this path will be eventually removed, so no need to keep it too fancy.

@mtrudel while implementing it, I confirmed one thing!
I had to keep this

true ->
receive do
{:inet_reply, _port, status} -> status
end
end

Because I was getting this error while running the tests:

Unexpectedly received message {:inet_reply, #Port<0.13>, :ok} (which matched _any)

I think that back then I read that Port.command immediately replied to the caller process with this type of message, but I can't seem to find the thread were I read that :(.

Anyway I hope this makes it easier for @lexmag to get it merged

@juanazam
Copy link
Author

One more finding on why this snippet was needed:

true ->
receive do
{:inet_reply, _port, status} -> status
end
end

It seems it was happening because of an internal detail on the prim_inet implementation that was removed at the same time as the braking change was added.

erlang/otp@f17f802#diff-86e7e31a3a23b92ea754ad259d8ffd6eb347deb522218fae1803878bf9d15e8bL572

There is a flow where the process running Port.command receives a reply, and that's why the error happens and the receive block is needed.

@mtrudel
Copy link

mtrudel commented Mar 12, 2024

There is a flow where the process running Port.command receives a reply, and that's why the error happens and the receive block is needed.

Yes, but my point stands: this implementation as it exists today is just an adhoc copy of what :gen_udp.send does; we're not actually saving any effort here. The whole point of the RabbitMQ hack is to avoid a receive; if we put it back then we're no better off than we are just calling :gen_udp.send and doing it the right way.

@carrascoacd
Copy link

I think we are ready to merge, right? cc/ @lexmag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken in Erlang 26 (Port.command / :port_command)
10 participants