Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pyright benchmarking #942

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

qh681248
Copy link
Contributor

#912

PR Type

  • Bugfix

Description

Static type check fixes

How Has This Been Tested?

Existing tests pass as expected.

Pyright passes on benchmark directory with 0 warnings, 0 errors and 0 informations.

Does this PR introduce a breaking change?

Checklist before requesting a review

  • I have made sure that my PR is not a duplicate.
  • My code follows the style guidelines of this project.
  • I have ensured my code is easy to understand, including docstrings and comments where necessary.
  • I have performed a self-review of my code.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.
  • I have updated CHANGELOG.md, if appropriate.

Copy link
Contributor

Performance review

Commit 62fdcf8 - Merge b512f38 into cc6a510

No significant changes to performance.

@tp832944 tp832944 self-requested a review January 31, 2025 12:30
benchmark/blobs_benchmark.py Outdated Show resolved Hide resolved
benchmark/mnist_benchmark.py Outdated Show resolved Hide resolved
benchmark/mnist_benchmark.py Outdated Show resolved Hide resolved
@@ -61,9 +61,9 @@ def benchmark_coreset_algorithms(
reshaped_data = raw_data.reshape(raw_data.shape[0], -1)

umap_model = umap.UMAP(densmap=True, n_components=25)
umap_data = umap_model.fit_transform(reshaped_data)
umap_data = jnp.asarray(umap_model.fit_transform(reshaped_data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove corresponding jnp.asarray in line 75.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be data = Data(umap_data).

benchmark/pounce_benchmark.py Outdated Show resolved Hide resolved
Copy link
Contributor

Performance review

Commit 0a93416 - Merge c88b35d into 594dec1

No significant changes to performance.

@qh681248 qh681248 requested a review from tp832944 February 3, 2025 09:57
@@ -24,13 +24,11 @@ def plot_benchmarking_results(data):
"""
Visualise the benchmarking results.

:param data: A dictionary where the first key is the original sample size
and the rest of the keys are the coreset sizes (as strings) and values
:param data: A dictionary where keys are the coreset sizes (as strings) and values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make same updates (inc to example) in print_metrics_table.

@@ -66,8 +61,7 @@ def plot_benchmarking_results(data):
for i, metric in enumerate(metrics):
ax = axs[i]
ax.set_title(
f"{metric.replace('_', ' ').title()} vs "
f"Coreset Size (n_samples = {n_samples})",
f"{metric.replace('_', ' ').title()} vs Coreset Size (n_samples = {1_000})",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode values. If it would be useful to display n_samples, either include it in the results dictionary as metadata (or you could import a constant - probably not such a good idea).

General musings not requiring any action

I'm still uneasy about having different JSON formats for each benchmarking script.

The format of the JSON could change in future. A useful piece of metadata to add might be a version specifier. We do this for the performance data, although we place it in the file name rather than inside the file. I don't think it's particularly important here as both scripts will be run manually. So long as the code is in sync between the scripts, all is fine.

@@ -61,9 +61,9 @@ def benchmark_coreset_algorithms(
reshaped_data = raw_data.reshape(raw_data.shape[0], -1)

umap_model = umap.UMAP(densmap=True, n_components=25)
umap_data = umap_model.fit_transform(reshaped_data)
umap_data = jnp.asarray(umap_model.fit_transform(reshaped_data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be data = Data(umap_data).

Copy link
Contributor

github-actions bot commented Feb 4, 2025

Performance review

Commit 1b2ba9d - Merge e8a28a6 into bc7cf80

No significant changes to performance.

@qh681248 qh681248 requested a review from tp832944 February 4, 2025 15:40
Copy link
Contributor

github-actions bot commented Feb 4, 2025

Performance review

Commit 2d85bdb - Merge 98df94b into bc7cf80

No significant changes to performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix static type checker complaints for benchmarking
2 participants