Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Three-Way Partition #228

Conversation

gevtushenko
Copy link
Collaborator

Description

closes #168

This PR changes three-way partition implementation to use one decoupled look-back instead of two. The change improves performance for 32-bit offsets without tuning, but leads to significant regressions (up to 40%) for 64-bit offsets. Since three-way partition doesn't provide device API for 64-bit offsets, we can proceed with the change focusing on 32-bit offsets for now. The regression will be addressed later as part of #220.

H100/HBM3

Entropy I8 I16 I32 I64 I128 F32 F64
0 -38.61% -38.65% -34.61% -30.08% -37.93% -15.22% -30.90%
0.544 -38.59% -38.56% -36.25% -29.25% -37.84% -16.28% -29.93%
1 -38.24% -38.24% -35.97% -29.61% -37.76% -15.69% -30.33%
T{ct} OffsetT{ct} Elements{io} Entropy Cmp Noise %Diff
I8 I32 2^16 1 2.54% 5.82%
I8 I32 2^20 1 1.53% -9.53%
I8 I32 2^24 1 0.69% -31.45%
I8 I32 2^28 1 0.18% -38.24%
I8 I32 2^16 0.544 2.31% 0.42%
I8 I32 2^20 0.544 2.05% -11.28%
I8 I32 2^24 0.544 0.73% -32.28%
I8 I32 2^28 0.544 0.17% -38.59%
I8 I32 2^16 0 2.73% -1.27%
I8 I32 2^20 0 1.90% -12.70%
I8 I32 2^24 0 0.76% -32.43%
I8 I32 2^28 0 0.15% -38.61%
I8 I64 2^16 1 1.71% 30.64%
I8 I64 2^20 1 1.69% 22.04%
I8 I64 2^24 1 0.66% 6.52%
I8 I64 2^28 1 0.20% 2.84%
I8 I64 2^16 0.544 2.04% 26.12%
I8 I64 2^20 0.544 1.79% 18.41%
I8 I64 2^24 0.544 0.65% 4.72%
I8 I64 2^28 0.544 0.21% 2.22%
I8 I64 2^16 0 1.52% 22.84%
I8 I64 2^20 0 1.60% 17.40%
I8 I64 2^24 0 0.64% 4.21%
I8 I64 2^28 0 0.21% 1.54%
I16 I32 2^16 1 2.36% 4.34%
I16 I32 2^20 1 1.70% -8.23%
I16 I32 2^24 1 0.90% -31.69%
I16 I32 2^28 1 0.22% -38.24%
I16 I32 2^16 0.544 2.52% -1.88%
I16 I32 2^20 0.544 1.49% -10.88%
I16 I32 2^24 0.544 0.90% -32.21%
I16 I32 2^28 0.544 0.21% -38.56%
I16 I32 2^16 0 2.67% -4.22%
I16 I32 2^20 0 1.99% -12.02%
I16 I32 2^24 0 0.85% -32.77%
I16 I32 2^28 0 0.20% -38.65%
I16 I64 2^16 1 1.55% 52.42%
I16 I64 2^20 1 1.90% 16.07%
I16 I64 2^24 1 0.86% -10.62%
I16 I64 2^28 1 0.29% -20.91%
I16 I64 2^16 0.544 1.76% 51.82%
I16 I64 2^20 0.544 2.05% 16.65%
I16 I64 2^24 0.544 0.86% -11.43%
I16 I64 2^28 0.544 0.29% -21.32%
I16 I64 2^16 0 2.38% 49.49%
I16 I64 2^20 0 1.76% 14.63%
I16 I64 2^24 0 0.85% -13.30%
I16 I64 2^28 0 0.28% -22.89%
I32 I32 2^16 1 1.80% -10.60%
I32 I32 2^20 1 1.52% -14.66%
I32 I32 2^24 1 1.22% -29.94%
I32 I32 2^28 1 0.29% -35.97%
I32 I32 2^16 0.544 1.73% -9.31%
I32 I32 2^20 0.544 1.63% -12.20%
I32 I32 2^24 0.544 1.18% -30.12%
I32 I32 2^28 0.544 0.29% -36.25%
I32 I32 2^16 0 2.06% -9.51%
I32 I32 2^20 0 1.87% -11.53%
I32 I32 2^24 0 1.06% -28.54%
I32 I32 2^28 0 0.25% -34.61%
I32 I64 2^16 1 1.69% 25.42%
I32 I64 2^20 1 1.69% 18.76%
I32 I64 2^24 1 1.23% 3.39%
I32 I64 2^28 1 0.38% -1.54%
I32 I64 2^16 0.544 1.78% 23.51%
I32 I64 2^20 0.544 1.84% 19.14%
I32 I64 2^24 0.544 1.09% 2.92%
I32 I64 2^28 0.544 0.32% -1.56%
I32 I64 2^16 0 1.51% 25.89%
I32 I64 2^20 0 1.63% 17.38%
I32 I64 2^24 0 1.00% 3.47%
I32 I64 2^28 0 0.31% -0.58%
I64 I32 2^16 1 2.51% -9.20%
I64 I32 2^20 1 1.86% -10.67%
I64 I32 2^24 1 1.05% -30.86%
I64 I32 2^28 1 0.30% -29.61%
I64 I32 2^16 0.544 2.71% -8.47%
I64 I32 2^20 0.544 1.74% -12.92%
I64 I32 2^24 0.544 1.10% -30.39%
I64 I32 2^28 0.544 0.31% -29.25%
I64 I32 2^16 0 2.82% -11.54%
I64 I32 2^20 0 1.56% -14.11%
I64 I32 2^24 0 1.12% -31.52%
I64 I32 2^28 0 0.32% -30.08%
I64 I64 2^16 1 1.45% 16.71%
I64 I64 2^20 1 2.12% -3.95%
I64 I64 2^24 1 1.06% -18.66%
I64 I64 2^28 1 0.28% -21.86%
I64 I64 2^16 0.544 1.38% 15.98%
I64 I64 2^20 0.544 2.04% -5.02%
I64 I64 2^24 0.544 1.01% -18.27%
I64 I64 2^28 0.544 0.26% -21.69%
I64 I64 2^16 0 1.82% 15.99%
I64 I64 2^20 0 2.08% -4.97%
I64 I64 2^24 0 1.07% -20.57%
I64 I64 2^28 0 0.30% -24.03%
I128 I32 2^16 1 2.61% -8.39%
I128 I32 2^20 1 3.22% -19.74%
I128 I32 2^24 1 0.92% -36.30%
I128 I32 2^28 1 0.23% -37.76%
I128 I32 2^16 0.544 2.63% -9.50%
I128 I32 2^20 0.544 3.25% -19.97%
I128 I32 2^24 0.544 0.88% -36.40%
I128 I32 2^28 0.544 0.50% -37.84%
I128 I32 2^16 0 2.60% -9.86%
I128 I32 2^20 0 3.29% -21.08%
I128 I32 2^24 0 0.92% -36.43%
I128 I32 2^28 0 0.25% -37.93%
I128 I64 2^16 1 1.92% 6.66%
I128 I64 2^20 1 2.09% -4.73%
I128 I64 2^24 1 1.19% -22.12%
I128 I64 2^28 1 0.30% -25.92%
I128 I64 2^16 0.544 2.33% 8.84%
I128 I64 2^20 0.544 2.10% -4.98%
I128 I64 2^24 0.544 1.19% -22.38%
I128 I64 2^28 0.544 0.50% -26.04%
I128 I64 2^16 0 2.29% 7.97%
I128 I64 2^20 0 2.22% -6.01%
I128 I64 2^24 0 1.19% -23.13%
I128 I64 2^28 0 0.31% -26.84%
F32 I32 2^16 1 2.51% -9.32%
F32 I32 2^20 1 1.52% -5.02%
F32 I32 2^24 1 1.11% -10.23%
F32 I32 2^28 1 0.38% -15.69%
F32 I32 2^16 0.544 2.09% -8.11%
F32 I32 2^20 0.544 1.77% -5.80%
F32 I32 2^24 0.544 1.36% -10.76%
F32 I32 2^28 0.544 0.53% -16.28%
F32 I32 2^16 0 1.54% -8.92%
F32 I32 2^20 0 1.51% -5.76%
F32 I32 2^24 0 1.43% -8.33%
F32 I32 2^28 0 0.59% -15.22%
F32 I64 2^16 1 1.95% 18.90%
F32 I64 2^20 1 1.67% 16.84%
F32 I64 2^24 1 1.18% 2.87%
F32 I64 2^28 1 0.37% -1.92%
F32 I64 2^16 0.544 1.69% 20.68%
F32 I64 2^20 0.544 1.76% 15.92%
F32 I64 2^24 0.544 1.13% 2.12%
F32 I64 2^28 0.544 0.34% -1.74%
F32 I64 2^16 0 1.67% 19.46%
F32 I64 2^20 0 2.33% 17.25%
F32 I64 2^24 0 1.03% 2.94%
F32 I64 2^28 0 0.29% -0.89%
F64 I32 2^16 1 1.96% -8.65%
F64 I32 2^20 1 1.64% -12.57%
F64 I32 2^24 1 1.10% -31.88%
F64 I32 2^28 1 0.31% -30.33%
F64 I32 2^16 0.544 1.86% -8.60%
F64 I32 2^20 0.544 1.92% -13.45%
F64 I32 2^24 0.544 1.23% -31.34%
F64 I32 2^28 0.544 0.33% -29.93%
F64 I32 2^16 0 2.70% -10.08%
F64 I32 2^20 0 1.93% -14.33%
F64 I32 2^24 0 1.19% -32.73%
F64 I32 2^28 0 0.31% -30.90%
F64 I64 2^16 1 2.20% 16.84%
F64 I64 2^20 1 2.02% -4.65%
F64 I64 2^24 1 1.13% -19.54%
F64 I64 2^28 1 0.32% -23.02%
F64 I64 2^16 0.544 1.97% 19.02%
F64 I64 2^20 0.544 2.05% -4.65%
F64 I64 2^24 0.544 1.09% -19.06%
F64 I64 2^28 0.544 0.27% -22.48%
F64 I64 2^16 0 2.03% 15.75%
F64 I64 2^20 0 2.02% -6.81%
F64 I64 2^24 0 1.14% -21.69%
F64 I64 2^28 0 0.29% -25.25%

H100/HBM2e

Entropy I8 I16 I32 I64 I128 F32 F64
0 -36.92% -37.54% -30.16% -27.19% -31.41% -16.93% -27.99%
0.544 -36.66% -37.38% -31.57% -26.44% -31.44% -18.98% -27.35%
1 -36.06% -37.05% -30.36% -26.52% -31.35% -18.23% -27.44%

A100

Entropy I8 I16 I32 I64 I128 F32 F64
0 -24.98% -19.93% -5.22% -14.80% -2.78% -5.24% -20.26%
0.544 -25.44% -19.82% -5.85% -14.90% -2.89% -5.85% -20.43%
1 -24.12% -19.69% -5.97% -14.99% -2.95% -5.96% -20.44%

V100

Entropy I8 I16 I32 I64 I128 F32 F64
0 -22.54% -21.69% -12.36% -19.48% -9.62% -12.41% -19.46%
0.544 -23.11% -22.11% -10.99% -20.39% -8.83% -11.02% -20.30%
1 -22.70% -22.10% -10.92% -20.61% -10.57% -10.93% -20.54%

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gevtushenko gevtushenko added the benchmark Feature related to benchmarking our libraries label Jul 17, 2023
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits, that do not warrant to rerun CI

@gevtushenko gevtushenko requested review from a team as code owners July 18, 2023 05:58
@gevtushenko gevtushenko requested review from alliepiper and removed request for a team July 18, 2023 05:58
@gevtushenko gevtushenko merged commit 55b181e into NVIDIA:main Jul 18, 2023
368 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Feature related to benchmarking our libraries
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Combine Three-Way Partition Look-backs
3 participants