Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Eliminate AMS Full GC impact deriving from local terminal clean spark context #2969 #2973

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

nicochen
Copy link
Contributor

This pr is related to issue #2969 and has been manually tested locally.

Why are the changes needed?

Close #2969.

Brief change log

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the module:ams-server Ams server module label Jun 27, 2024
@baiyangtx
Copy link
Contributor

I think it might be more reasonable to adjust the value of spark.cleaner.periodicGC.interval

From codes of spark, the parameter spark.cleaner.referenceTracking controls whether to create the context cleaner.

image

And inside the ContextCleaner, there are two theads managed.

  • A cleanThread, will clean RDD nolonger be used.
  • A gcThread, call System.gc every spark.cleaner.periodicGC.interval period.

image
image
image

I think the role of the cleanThread is quite important, especially when we start a resident Spark Context, which can release disk resources in time.
On the other hand, GC triggered by AMS is not a big problem. The parameter spark.cleaner.periodicGC.interval can be adjusted to a very large value to avoid actively triggering GC.

@nicochen
Copy link
Contributor Author

@baiyangtx I also considered the 'spark.cleaner.periodicGC.interval ' config key before, but as you said we can only set it an extremely large number rather than disabling it. In our production case, the full GC really matters. The 1.8 JDK uses parallel gc as full gc strategy for G1,.As a result, it takes more than 30 secs and triggers zk timeout and AMS failover which is unacceptable. Also, I believe the local terminal is designed to take lightweight and less frequent sql tasks and it would not produce too much RDD \ shuffle trash. The heavy tasks should go to kyubi or spark . Thus, I choose overall stability instead of enlarge number of gc interval.

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhoujinsong
Copy link
Contributor

@baiyangtx What do you think about the PR now?

Copy link
Contributor

@baiyangtx baiyangtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Diable RDD clean does not matter for local spark session.

@baiyangtx baiyangtx merged commit fad99c5 into apache:master Jul 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-server Ams server module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Eliminate AMS Full GC impact deriving from local terminal clean spark context
4 participants