Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jt/82 webrtcinrustping #84

Conversation

John-LittleBearLabs
Copy link

@John-LittleBearLabs John-LittleBearLabs commented Nov 29, 2022

Finally appears to be working, with proviso below.

Connection established! PeerId("12D3KooWNfGZwVxzv4Eu2RvmCrLKQibdp2U57E93WH36rwygYk8Y")=Dialer { address: "/ip4/16.13.0.2/udp/1234/webrtc/certhash/uEiBLxsx_9DM981WqZCG8H0LDEcHRzJxJM8zgkpqXhVH95Q", role_override: Dialer }

Outcome: success

Before this is merged in we'd want to undo a couple things that I have to short-circuit to quickly test this:

  • The default value for the transport testparam must be tcp, not webrtc
  • Need to re-enable the older versions in rust.toml

@p-shahi p-shahi linked an issue Dec 1, 2022 that may be closed by this pull request
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction here looks good to me.

@John-LittleBearLabs does a Testground run with these changes succeed for you locally?

Would you mind resolving the merge conflicts and marking it ready for review? That way we can test your changes on our CI.

ping/rust/src/lib.rs Outdated Show resolved Hide resolved
@John-LittleBearLabs John-LittleBearLabs marked this pull request as ready for review December 2, 2022 16:07
@p-shahi
Copy link
Member

p-shahi commented Dec 5, 2022

Approved and ran the CI workflow, I think it'll be good to add you guys as libp2p contributors in github-mgmt

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi John, this already looks very good! Left some minor nits. Btw @mxinden should we add a composition (manually for now) testing webrtc to webrtc transport?

ping/rust/Dockerfile Outdated Show resolved Hide resolved
ping/rust/src/lib.rs Outdated Show resolved Hide resolved
ping/rust/src/lib.rs Outdated Show resolved Hide resolved
ping/rust/src/bin/testplan_0440.rs Outdated Show resolved Hide resolved
ping/rust/src/bin/testplan_0500.rs Outdated Show resolved Hide resolved
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks John! @thomaseizinger maybe you also wanna take a look.

ping/rust/src/lib.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

LGTM, thanks John! @thomaseizinger maybe you also wanna take a look.

Thanks for the ping, I'll take a look tomorrow!

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good. A couple of improvement suggestions :)

ping/rust/src/bin/testplan_0500.rs Outdated Show resolved Hide resolved
ping/rust/src/bin/testplan_0500.rs Outdated Show resolved Hide resolved
ping/rust/src/bin/testplan_0500.rs Outdated Show resolved Hide resolved
Some(SwarmEvent::ConnectionEstablished {
peer_id, endpoint, ..
}) => {
info!("Connection established! {:?}={:?}", &peer_id, &endpoint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PeerId supports Display so we don't need to use Debug here. Would also be a nicer log message if we get a multi-address out of endpoint and Display log that one.

Suggested change
info!("Connection established! {:?}={:?}", &peer_id, &endpoint);
info!("Connection established! {}={:?}", &peer_id, &endpoint);

info!("Connection established! {:?}={:?}", &peer_id, &endpoint);
connected.insert(peer_id);
}
Some(event) => info!("Received event {:?}", &event), //This is useful, because it sometimes logs error messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, info is likely way too noisy.

let local_addr = match transport.as_str() {
"tcp" => format!("/ip4/{local_ip_addr}/tcp/{LISTENING_PORT}"),
"webrtc" => format!("/ip4/{local_ip_addr}/udp/{LISTENING_PORT}/webrtc"),
unhandled => unimplemented!("Transport unhandled in test: '{}'", unhandled),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of panicking, we could just return an error with anyhow::bail!().

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more comments.

Comment on lines 69 to 71
peer,
result: Ok(ping::Success::Ping { .. }),
})) = self.0.next().await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we undo these formatting changes please?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was resolved with a whole pass of cargo fmt

@@ -70,19 +73,20 @@ where
// two peers dialling each other at the same time).
//
// We can do this because sync service pubsub is ordered.
.take_while(|a| ready(a != &local_addr));
.take_while(|a| ready(a != &local_addr && a != &dialable_multiaddr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now simplify this.

Suggested change
.take_while(|a| ready(a != &local_addr && a != &dialable_multiaddr));
.take_while(|a| ready(a != &dialable_multiaddr));

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The final step is to remove the client parameter again as we no longer need it now outside the PingSwarm.

Comment on lines 36 to 38
let client = testground::client::Client::new_and_init()
.await
.expect("Unable to init testground cient.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need the client here now so I'd like to move its creation back into PingSwarm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this has been removed.

Comment on lines 90 to 93
debug!(
"NewListenAddr event: listener_id={:?}, address={:?}",
&listener_id, &address
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you deem the listener ID very important, we could move this log to the ping swarm so all tests benefit from it.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! All looks good to me. We are not enforcing it in here but can you make sure that everything is formatted with cargo fmt?

.into_iter()
.find(|iface| iface.name == "eth1")
.unwrap()
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "Can't find iface eth1"))?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "Can't find iface eth1"))?
.context("Can't find iface eth1"))?

We are using anyhow so this would work too!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! This has been updated.

.unwrap()
let client = testground::client::Client::new_and_init()
.await
.expect("Unable to init testground cient.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Unable to init testground cient.");
.context("Unable to init testground client.")?;

@ddimaria
Copy link

Thanks! All looks good to me. We are not enforcing it in here but can you make sure that everything is formatted with cargo fmt?

This is done. I'll run clippy as well shortly. Did we want to add any fmt/lint to CI @thomaseizinger?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Dec 14, 2022

Thanks! All looks good to me. We are not enforcing it in here but can you make sure that everything is formatted with cargo fmt?

This is done. I'll run clippy as well shortly. Did we want to add any fmt/lint to CI @thomaseizinger?

That would be great! If you are up for it, please send a separate PR, we don't need to block this on that :)

client
.run_parameters()
.test_instance_params
.get("transport").cloned()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure rustfmt would want that on a separate line now 🙃

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, you're too fast for me @thomaseizinger 😉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, and rust analyzer needed a restart.

@ddimaria
Copy link

Thanks for the comments and suggestions @thomaseizinger! I believe all of them have been resolved. Please let me know if there anything else you'd like done here. I left the unwraps alone that were in existing code...I'm not familiar with this codebase but figured there was a reason they were in there 🤷. Did you want those done in a separate PR?

@thomaseizinger
Copy link
Contributor

Thanks for the comments and suggestions @thomaseizinger! I believe all of them have been resolved. Please let me know if there anything else you'd like done here. I left the unwraps alone that were in existing code...I'm not familiar with this codebase but figured there was a reason they were in there 🤷. Did you want those done in a separate PR?

It doesn't matter much because either will abort the test program and report a failure. Just a matter of consistency really but nothing too important :)

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions, not pressing. Otherwise looks good to me! :)

Comment on lines 40 to 41
.map_err(|e| anyhow::anyhow!(e.to_string()))
.context("Unable to init testground cient.")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map_err(|e| anyhow::anyhow!(e.to_string()))
.context("Unable to init testground cient.")?;
.context("Unable to init testground cient.")?;

Double mapping is a bit useless :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at this in the morning (diff TZ than you), but the suggestion caused compilation errors:

error[E0599]: the method `context` exists for enum `Result<Client, Box<dyn std::error::Error>>`, but its trait bounds were not satisfied
   --> src/lib.rs:41:10
    |
41  |           .context("Unable to init testground cient.")?;
    |            ^^^^^^^ method cannot be called on `Result<Client, Box<dyn std::error::Error>>` due to unsatisfied trait bounds
    |
   ::: .rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:504:1
    |
504 |   pub enum Result<T, E> {
    |   --------------------- doesn't satisfy `_: anyhow::Context<Client, Box<dyn std::error::Error>>`
    |
   ::: .rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:194:1
    |
194 | / pub struct Box<
195 | |     T: ?Sized,
196 | |     #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global,
197 | | >(Unique<T>, A);
    | | -
    | | |
    | | doesn't satisfy `Box<dyn std::error::Error>: Sync`
    | |_doesn't satisfy `Box<dyn std::error::Error>: std::marker::Send`
    |   doesn't satisfy `_: anyhow::context::ext::StdError`
    |
    = note: the following trait bounds were not satisfied:
            `Box<dyn std::error::Error>: anyhow::context::ext::StdError`
            which is required by `Result<Client, Box<dyn std::error::Error>>: anyhow::Context<Client, Box<dyn std::error::Error>>`
            `Box<dyn std::error::Error>: std::marker::Send`
            which is required by `Result<Client, Box<dyn std::error::Error>>: anyhow::Context<Client, Box<dyn std::error::Error>>`
            `Box<dyn std::error::Error>: Sync`
            which is required by `Result<Client, Box<dyn std::error::Error>>: anyhow::Context<Client, Box<dyn std::error::Error>>`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn, that trait object is not Send + 'static! I think it would work otherwise. That is a shame. Let's leave it then!

let client = testground::client::Client::new_and_init()
.await
.map_err(|e| anyhow::anyhow!(e.to_string()))
.context("Unable to init testground cient.")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.context("Unable to init testground cient.")?;
.context("Unable to init testground client")?;

Typo.

Comment on lines 210 to 211
.cloned()
.unwrap_or_else(|| "tcp".to_owned())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.cloned()
.unwrap_or_else(|| "tcp".to_owned())
.as_deref()
.unwrap_or("tcp")

Just curious whether this would work? (Together with &str as a return type).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get("transport") returns an Option<&String> and .as_deref() has no effect on the return type. From the docs:

Converts from Option<T> (or &Option<T>) to Option<&T::Target>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking to avoid the extra allocation, we could .remove() (instead of .get()) to pull the value out of the hashmap if the value is no longer needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get("transport") returns an Option<&String> and .as_deref() has no effect on the return type. From the docs:

Converts from Option<T> (or &Option<T>) to Option<&T::Target>.

Hmm, the intention was to to convert &String to &str which should work because String implements Deref<Target = str>. Seems like we are missing something. It is not important though, I was just curious :)

@@ -94,7 +102,7 @@ where
swarm
.await_connections(client.run_parameters().test_instance_count as usize - 1)
.await;

info!("Connections awaited.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info!("Connections awaited.");
info!("Connected to {} peers", client.run_parameters().test_instance_count - 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, much more meaningful log output!

@ddimaria
Copy link

ddimaria commented Dec 15, 2022

CI is failing on run-ping-all: https://github.com/John-LittleBearLabs/tg-libp2p/actions/runs/3706321649/jobs/6282598377

LLVM ERROR: IO failure on output stream: No space left on device

It's running for 49 mins before running out, that's a long running test suite.

@ddimaria
Copy link

@thomaseizinger do you have opinions about the failing CI? More specifically, do you think tackling that is in the scope of this PR? If so, how has PL handled this in the past (disk cleanup, hosted runners, ..etc.)? There are other options to investigate, but I wanted to check in with you first.

@galargh
Copy link
Contributor

galargh commented Dec 23, 2022

@thomaseizinger do you have opinions about the failing CI? More specifically, do you think tackling that is in the scope of this PR? If so, how has PL handled this in the past (disk cleanup, hosted runners, ..etc.)? There are other options to investigate, but I wanted to check in with you first.

Thanks to https://github.com/pl-strflt/tf-aws-gh-runner we can set up self-hosted runners in no time. Once we clear this repo to use PL self-hosted runners and we prepare a runner that has all the deps testground needs, all it should require is changing labels here.

I know @laurentsenta also wanted to set up a long running testground daemon on EC2. For this, the only difference would be setting a testground daemon url that the CLI should communicate with.

The latter is a tad bit more work. I think I'd be able to get the former ready today. Worth noting, the two are not mutually exclusive. What's more, the AMI we prepare for self-hosted runners will surely be useful for the long-running daemon machine too so we won't be duplicating work in either case. Finally, since the changes required are so minimal, we're not locking ourselves in in any given solution.

I'll update once self-hosted runners are ready for use here. I'll propose the changes in a separate PR.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Dec 23, 2022

@thomaseizinger do you have opinions about the failing CI? More specifically, do you think tackling that is in the scope of this PR? If so, how has PL handled this in the past (disk cleanup, hosted runners, ..etc.)? There are other options to investigate, but I wanted to check in with you first.

Thanks for the ping and sorry for the late reply.

GH-hosted runners give us 14GB of disk space. The fact that we are exceeding this is in my opinion another pointer that the current architecture of these tests simply does not scale.

It is completely unnecessary that we rebuild all of these tests on every test run. I am of the strong opinion that these docker containers should be an artifact that is produced once (1x), uploaded to a registry and then only pulled for each test run.

It is also unnecessary that we maintain them in parallel. Old versions of the test should never need to change. The test should be co-located with the library code over at https://github.com/libp2p/rust-libp2p and versioned along side with it, producing a new docker container for every released version.

That is the proper solution in my opinion. I opened an issue about this in testground a while ago: testground/testground#1522

I have no opinion on how we fix this in the short term. Happy to throw more disk-space at the problem for now.

@ddimaria
Copy link

Thanks for the reply @thomaseizinger! It looks like @galargh will take care of CI in a separate PR, so I'll close this one out, or can wait for that to land before merging? I agree with the pre-built binaries/images approach. I'll read through that Testground issue. A possible challenge might be features in Rust, though pulling the features in this PR and moving to a rust-libp2p CI build-image job (specifically for Testground?) seems sensible.

@galargh
Copy link
Contributor

galargh commented Dec 23, 2022

Quick update: I didn't manage to get to fully setting up/testing self-hosted runners for this repo today but it's the first item on my TODO list for next week.

@galargh
Copy link
Contributor

galargh commented Dec 30, 2022

Quick update: I didn't manage to get to fully setting up/testing self-hosted runners for this repo today but it's the first item on my TODO list for next week.

Ready 🥳 #92 🎉

@ddimaria
Copy link

@p-shahi I wanted to make sure this PR close was intentional? I was waiting for #92 to merge in order to validate this work before merging this PR.

@jxs
Copy link
Member

jxs commented Jan 12, 2023

Hi @ddimaria, this PR targets the old dir which has been superseeded by the new one introduced with #97. I started working on introducing WebRTC support for the Rust "test-plans" there, see here, is that ok with you folks?

@ddimaria
Copy link

Thanks for the info @jxs! This sounds good to me, I just didn't want any work to go to waste unintentionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add webrtc to logic of ping/rust
7 participants