Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Supports external memory #11186

Merged
merged 28 commits into from
Feb 25, 2025
Merged

Conversation

wbo4958
Copy link
Contributor

@wbo4958 wbo4958 commented Jan 27, 2025

Supporting ext memory which is based on #11181

@wbo4958
Copy link
Contributor Author

wbo4958 commented Jan 27, 2025

Hi @trivialfis, please help review

@trivialfis
Copy link
Member

trivialfis commented Feb 8, 2025

Note:

  • The external memory is only supported by GPU at the moment; we should make this clear.
  • Must enable RMM.
  • Need global configuration.

@trivialfis
Copy link
Member

We need to let XGBoost access as many CPU threads as available.

@trivialfis
Copy link
Member

  • The default nthread parameter in the spark package XGBoost estimator is 1.
  • There are still other factors limiting the openmp threads in my test run. Perhaps Spark is setting environment variables for executors?

@trivialfis
Copy link
Member

For reference, the --conf spark.task.cpus=2 affects the global OpenMP runtime.

@trivialfis
Copy link
Member

trivialfis commented Feb 11, 2025

The two remaining items:

  • Make sure CPU code can not run into this option accidentally.
  • Expose global configuration for RMM. (java, scala)

@trivialfis trivialfis mentioned this pull request Feb 12, 2025
9 tasks
@wbo4958
Copy link
Contributor Author

wbo4958 commented Feb 22, 2025

Hi @trivialfis, please help review it.

@trivialfis
Copy link
Member

Will review. I have tested the PR on my local machine with 2 GPUs.

  • Overlapping is working.
  • Initialization is quite slow, probably due to disk write. We will need better profiling annotation in the future.

inputNextIsCalled = true
withResource(new GpuColumnBatch(iter.next())) { batch =>
if (iter.eq(input)) {
externalMemory.cacheTable(batch.table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could write data asynchronously in the future (after this PR). This way, we can let XGBoost handle the batch while it's being written simultaneously.

val path = Paths.get(dirPath)
if (!Files.exists(path)) {
Files.createDirectories(path)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty else clause, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. removed the empty else

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@trivialfis trivialfis merged commit 337ee78 into dmlc:master Feb 25, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants