Skip to content

Conversation

mateczagany
Copy link
Contributor

@mateczagany mateczagany commented Sep 28, 2025

What is the purpose of the change

Tasks were failing because they were not run in the main thread. This is related FLINK-38114, PR #26821

Brief change log

  • Update test so that all ExecutionGraph operations happen from the main thread

Verifying this change

Could not reproduce issue after this fix

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 28, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@davidradl
Copy link
Contributor

@mateczagany I am curious as to the cause of this issue - I assume it was introduced by another PR, but how did the CI succeed on that merge? It would be good to document our understanding either in the PR or in the Jira.

@mateczagany
Copy link
Contributor Author

Hi @davidradl , I will try to give some details. From my local machine it's quite rare to see this test fail, like the one in my other PR, but on my local machine, out of 10.000 test runs, about 100-200 fails without this fix.

If you set the rootLogger level to WARN in log4j2-test.properties, you can see these errors:

Violation of main thread constraint detected: expected <Thread[#28,ForkJoinPool-1-worker-1,5,main]> but running in <Thread[#36,pool-2-thread-1,5,main]>.

And while writing this comment, I've also looked a bit deeper and realized that my solution was not really solving the core issue. I think this might be related to FLINK-38114, PR #26821 where not all tests were adjusted properly to fit the new behavior. I have pushed a new commit that updates the test so that all operations to the ExecutionGraph will run from the main thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants