Skip to content

Test TemplateProvider OS E13 #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbesraa opened this issue Dec 9, 2024 · 12 comments · Fixed by #1293, #1359 or #1527
Closed

Test TemplateProvider OS E13 #1278

jbesraa opened this issue Dec 9, 2024 · 12 comments · Fixed by #1293, #1359 or #1527

Comments

@jbesraa
Copy link
Contributor

jbesraa commented Dec 9, 2024

TemplateProvider in integration-tests fails to start(sometimes) because it cant access OS folders

https://github.com/stratum-mining/stratum/actions/runs/12231267837/job/34113994881?pr=1242#step:4:377

@plebhash
Copy link
Collaborator

plebhash commented Jan 9, 2025

re-opening as this is still happening, unfortunately #1293 didn't fix it

also, this is not particular to the macos runner, it also happens on ubuntu

some recent failed executions (which went away by re-running):

@jbesraa
Copy link
Contributor Author

jbesraa commented Jan 22, 2025

If this issue is raised again, I suggest to create a user in the CI and give it the correct permissions for writing/creating files.
I would also try to make the integration tests more versatile by trying to re-start the template provider if it is not coming up in the first try.

@plebhash
Copy link
Collaborator

If this issue is raised again, I suggest to create a user in the CI and give it the correct permissions for writing/creating files. I would also try to make the integration tests more versatile by trying to re-start the template provider if it is not coming up in the first try.

what I reported on #1374 is not unique to CI runs, I also witnessed it locally

@plebhash
Copy link
Collaborator

possibly related: rust-bitcoin/bitcoind#90

@plebhash
Copy link
Collaborator

plebhash commented Jan 22, 2025

closing #1374 and re-opening this, as this is where the issue was originally reported and #1374 seems to be just another instance of the same problem

@plebhash plebhash reopened this Jan 22, 2025
@plebhash
Copy link
Collaborator

plebhash commented Jan 22, 2025

possibly related: rust-bitcoin/bitcoind#90

looks like bitcoind crate is on it's way to being deprecated: rust-bitcoin/bitcoind#165

we should be able to get similar functionality (and ongoing support) from https://crates.io/crates/corepc-node

perhaps switching dependencies could improve things? or at least if we continue getting this problem, we could eventually report it and get support from the maintainers

@plebhash
Copy link
Collaborator

plebhash commented Mar 1, 2025

I was able to make some progress in investigating this problem.

My strategy consisted of writing the following script:

https://github.com/plebhash/stratum/blob/investigate_permission_error/roles/tests-integration/run_until_error.sh

When running this script with enabled line 30 and disabled line 31 (running one isolated integration test instead of all tests concurrently), the script loop repeated for over 24h (tens of thousands of execution loops) without ever running into the permission error.

When running the script with enabled line 31 and disabled line 30 (running all tests concurrently), after 26 loops the permission error was triggered.

This indicates that the current execution strategy, where multiple tests in the same file are executed in parallel, is likely causing some kind of race-condition under the hood.

@Shourya742
Copy link
Contributor

Shourya742 commented Mar 1, 2025

I was able to make some progress in investigating this problem.

My strategy consisted of writing the following script:

https://github.com/plebhash/stratum/blob/investigate_permission_error/roles/tests-integration/run_until_error.sh

When running this script with enabled line 30 and disabled line 31 (running one isolated integration test instead of all tests concurrently), the script loop repeated for over 24h (tens of thousands of execution loops) without ever running into the permission error.

When running the script with enabled line 31 and disabled line 30 (running all tests concurrently), after 26 loops the permission error was triggered.

This indicates that the current execution strategy, where multiple tests in the same file are executed in parallel, is likely causing some kind of race-condition under the hood.

might be related to exclusive access to resource during execution.

@jbesraa
Copy link
Contributor Author

jbesraa commented Mar 3, 2025

#1507 CI kept failing because of this but after 45d1259 it seems to be ok.

@plebhash
Copy link
Collaborator

plebhash commented Mar 3, 2025

#1507 CI kept failing because of this but after 45d1259 it seems to be ok.

tbh I don't think this is a wise way to fix this problem

I already said a few times in the past that it's a very bad idea to be executing Integration Tests in parallel

what we are witnessing here is just one of the many potential problems that can come from this

just to name another that I see coming around the corner: having CPU miners running in parallel while fine-tuning difficulty targets based on measured hashrate will create a bunch of non-deterministic behaviors that will not be fun to debug

I still don't understand the motivations for running Integration Tests in parallel... whatever optimizations we get in execution time are very marginal, and code organization is very subjective and can be achieved in different ways without parallel execution as a pre-requisite

why can't we establish that ITF can only execute one test at a time? @jbesraa is this something you are highly opinionated about? if yes, can you elaborate on why?

@jbesraa
Copy link
Contributor Author

jbesraa commented Mar 3, 2025

I'm not completely opposed. I'd prefer other options, but I'm open to trying it.

@plebhash
Copy link
Collaborator

plebhash commented Mar 4, 2025

ran the script again, now with RUST_BACKTRACE=1:

Failed to create Node: Error while executing "/home/ubuntu/stratum/roles/tests-integration/template-provider/bitcoin-sv2-tp-0.1.13/bin/bitcoind"

Caused by:
    Permission denied (os error 13)

Stack backtrace:
   0: <E as anyhow::context::ext::StdError>::ext_context
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.95/src/context.rs:27:29
   1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::with_context
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.95/src/context.rs:65:31
   2: corepc_node::Node::with_conf
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/corepc-node-0.5.0/src/lib.rs:360:27
   3: integration_tests_sv2::template_provider::TemplateProvider::start
             at ./lib/template_provider.rs:90:24
   4: integration_tests_sv2::start_template_provider
             at ./lib/mod.rs:100:29
   5: jd_integration::jds_should_not_panic_if_jdc_shutsdown::{{closure}}
             at ./tests/jd_integration.rs:18:25
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   8: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:729:57
   9: tokio::runtime::coop::with_budget
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/coop.rs:107:5
  10: tokio::runtime::coop::budget
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/coop.rs:73:5
  11: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:729:25
  12: tokio::runtime::scheduler::current_thread::Context::enter
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:428:19
  13: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:728:36
  14: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:807:68
  15: tokio::runtime::context::scoped::Scoped<T>::set
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context/scoped.rs:40:9
  16: tokio::runtime::context::set_scheduler::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context.rs:180:26
  17: std::thread::local::LocalKey<T>::try_with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:284:16
  18: std::thread::local::LocalKey<T>::with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:260:9
  19: tokio::runtime::context::set_scheduler
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context.rs:180:9
  20: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:807:27
  21: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:716:19
  22: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:196:28
  23: tokio::runtime::context::runtime::enter_runtime
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context/runtime.rs:65:16
  24: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:184:9
  25: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:368:47
  26: tokio::runtime::runtime::Runtime::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:342:13
  27: jd_integration::jds_should_not_panic_if_jdc_shutsdown
             at ./tests/jd_integration.rs:28:5
  28: jd_integration::jds_should_not_panic_if_jdc_shutsdown::{{closure}}
             at ./tests/jd_integration.rs:17:49
  29: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  30: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  31: test::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/test/src/lib.rs:621:18
  32: test::run_test_in_process::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/test/src/lib.rs:644:60
  33: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
  34: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  35: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  36: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  37: test::run_test_in_process
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/test/src/lib.rs:644:27
  38: test::run_test::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/test/src/lib.rs:567:43
  39: test::run_test::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/test/src/lib.rs:595:41
  40: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  41: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17
  42: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
  43: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  44: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  45: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  46: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30
  47: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  48: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  49: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  50: std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17
  51: <unknown>
  52: <unknown>
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1654:5
   3: core::result::Result<T,E>::expect
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1034:23
   4: integration_tests_sv2::template_provider::TemplateProvider::start
             at ./lib/template_provider.rs:90:24
   5: integration_tests_sv2::start_template_provider
             at ./lib/mod.rs:100:29
   6: jd_integration::jds_should_not_panic_if_jdc_shutsdown::{{closure}}
             at ./tests/jd_integration.rs:18:25
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   9: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:729:57
  10: tokio::runtime::coop::with_budget
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/coop.rs:107:5
  11: tokio::runtime::coop::budget
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/coop.rs:73:5
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:729:25
  13: tokio::runtime::scheduler::current_thread::Context::enter
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:428:19
  14: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:728:36
  15: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:807:68
  16: tokio::runtime::context::scoped::Scoped<T>::set
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context/scoped.rs:40:9
  17: tokio::runtime::context::set_scheduler::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context.rs:180:26
  18: std::thread::local::LocalKey<T>::try_with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:284:16
  19: std::thread::local::LocalKey<T>::with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:260:9
  20: tokio::runtime::context::set_scheduler
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context.rs:180:9
  21: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:807:27
  22: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:716:19
  23: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:196:28
  24: tokio::runtime::context::runtime::enter_runtime
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/context/runtime.rs:65:16
  25: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/scheduler/current_thread/mod.rs:184:9
  26: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:368:47
  27: tokio::runtime::runtime::Runtime::block_on
             at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:342:13
  28: jd_integration::jds_should_not_panic_if_jdc_shutsdown
             at ./tests/jd_integration.rs:28:5
  29: jd_integration::jds_should_not_panic_if_jdc_shutsdown::{{closure}}
             at ./tests/jd_integration.rs:17:49
  30: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  31: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    jds_should_not_panic_if_jdc_shutsdown

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 20.03s

error: test failed, to rerun pass `--test jd_integration`

the line where corepc-node panics is this https://docs.rs/corepc-node/0.5.0/src/corepc_node/lib.rs.html#360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment