Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend #6619

Merged
merged 38 commits into from
Jan 21, 2025

Conversation

rabbull
Copy link
Contributor

@rabbull rabbull commented Nov 19, 2024

This PR introduces the contains filter operator for the SQLite backend, implemented using custom functions rather than SQLAlchemy's high-level abstractions. Comprehensive tests have been included to cover all potential use cases for contains on the SQLite backend, providing a detailed demonstration of its behavior across various scenarios.

The primary goal of this PR is to replicate the functionality of the contains operator from PostgreSQL on SQLite. However, due to the complex and sometimes unexpected behavior of PostgreSQL's containment operator (#6618), a pure SQL implementation would be highly complex and difficult to maintain. To address this, custom functions were chosen, leveraging the flexibility of high-level programming languages (Python, in this case) to handle complex control flows more effectively.

While invoking functions from other languages introduces potential performance limitations, this trade-off is acceptable compared to the alternative of lacking this functionality entirely. Benchmarks will be added to assess the performance impact and ensure it remains within acceptable bounds.

This implementation serves as a proof of concept to evaluate the feasibility of using custom functions for the contains operator. While functional, the current code structure is still preliminary and requires significant cleanup. If this approach proves successful, a subsequent refactor will address technical debt and improve the maintainability of the codebase. This PR is marked as WIP while the approach is evaluated, benchmarks are added, and the necessary refactor is performed.

Closes #6494

@rabbull rabbull requested a review from GeigerJ2 November 19, 2024 17:35
@rabbull rabbull changed the title [WIP] ORM: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 19, 2024
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.06%. Comparing base (c88fc05) to head (7617861).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/storage/sqlite_zip/utils.py 91.67% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6619      +/-   ##
==========================================
+ Coverage   78.00%   78.06%   +0.07%     
==========================================
  Files         563      563              
  Lines       41766    41790      +24     
==========================================
+ Hits        32574    32618      +44     
+ Misses       9192     9172      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe shelve this for now, until we have resolved contains for PSQL? Also, just out of curiosity, any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

If we agree that the behavior for PSQL is fine, I'd vouch that we also go the SQLA route here. We just need to make sure the behavior is consistent for both backends, as well as document it somewhere.

src/aiida/storage/sqlite_zip/utils.py Show resolved Hide resolved
@rabbull
Copy link
Contributor Author

rabbull commented Nov 22, 2024

any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

The main reason for opting for a custom implementation over SQLA is it's too hard to implement such an operator in SQL, as well as in SQLA, due to the insufficient support of recursive operations and effective branching in SQL. Even if a viable solution could be carried out, it is most likely to be unreadable and unmaintainable.

@rabbull
Copy link
Contributor Author

rabbull commented Nov 26, 2024

Hi @GeigerJ2,

I’ve updated the tests for this feature by combining all tests for JSON filters dedicated on the SQLite backend into those on the PostgreSQL backend, and removed requires_psql marks for related test cases. This migration also helped uncover and fix a few bugs in the implementation of JSON filters for the PostgreSQL backend.

As for the placement of the custom function _json_contains, I believe its current position is acceptable for now. Since everything related to sqlite_dos is currently located in the sqlite_zip package, which doesn't seem a proper structure for me and is planned to be refactored in the near future (#6607), a partial refactor doesn't look worthwhile.

Please kindly review the new modification when you are available.

@rabbull rabbull changed the title [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 26, 2024
@GeigerJ2
Copy link
Contributor

GeigerJ2 commented Dec 5, 2024

Hi @rabbull, will review this PR soon. Sorry for the delay!

@rabbull
Copy link
Contributor Author

rabbull commented Dec 5, 2024

Hi @rabbull, will review this PR today. Sorry for the delay!

No worries. Take your time.

I was just trying to figure out why it is blocked by the codecov/patch. It seems including files that I didn't changed. I assumed it was because my branch is out-of-date so that comparing to the main branch a lot files looks modified, but it seems not the case because it is still blocking even if I merged main branch :(

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, great work, @rabbull! Let's figure out why codecov fails, and then I'd say we get it merged!

src/aiida/storage/psql_dos/orm/querybuilder/main.py Outdated Show resolved Hide resolved
@rabbull rabbull force-pushed the sqlite-json-contains branch from 322c3d6 to fbb7ee3 Compare December 10, 2024 16:05
@rabbull rabbull self-assigned this Dec 17, 2024
@rabbull
Copy link
Contributor Author

rabbull commented Jan 16, 2025

Hooray! The CI has passed successfully. This PR is now ready for your review and merging. Thanks in advance! cc @GeigerJ2

@GeigerJ2 GeigerJ2 merged commit aa0aa26 into aiidateam:main Jan 21, 2025
10 checks passed
@GeigerJ2
Copy link
Contributor

Thanks for the great work, @rabbull! 🚀

unkcpz pushed a commit to unkcpz/aiida-core that referenced this pull request Jan 23, 2025
…am#6619)

This commit re-introduces the `contains` filter operator for the SQLite backend. The primary goal is to replicate the functionality of `contains` from PostgreSQL to achieve feature parity between the two different storage backends.

Due to the particular (and sometimes unexpected) behavior of PostgreSQL's containment operator (aiidateam#6618), a pure SQL implementation would be highly complex and difficult to maintain. As the behavior of the `contains` operator should be _exactly_ the same, irrespective of which backend is being used, custom Python functions were chosen for the implementation (rather than SQLAlchemy's high-level abstractions).

Comprehensive tests and benchmarks that cover all potential use cases for `contains` on the SQLite backend have been added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryBuilder: 'contains' operator not implemented for SQLite storage backends
3 participants