Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: "smart" diff #2518

Merged
merged 9 commits into from
Feb 12, 2025
Merged

feat: "smart" diff #2518

merged 9 commits into from
Feb 12, 2025

Conversation

jqnatividad
Copy link
Collaborator

@jqnatividad jqnatividad commented Feb 12, 2025

if the stats cache is available for both CSVs, it will use the cache to:

pivotp and frequency now also use the dataset stats cache to get the row count, instead of calculating the row count.

resolves #2493

as we use it to create perfect hash functions for the stats cache
if the stats cache is available for both CSVs, it will use the cache to:
- compare fingerprint hashes, and if identical, short-circuit the diff
- fetch cardinality from stats cache
- fetch rowcount from dataset cache
- validate if there is a single --key column, ensuring that cardinality == rowcount (ALL UNIQUE values)
… function, instead of hardcoded const array
@jqnatividad jqnatividad merged commit 5041318 into master Feb 12, 2025
13 checks passed
@jqnatividad jqnatividad deleted the 2493-smart-diff branch February 12, 2025 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: "smart" diff
1 participant