-
Couldn't load subscription status.
- Fork 1
Gfllm search export #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Gfllm search export #215
Conversation
|
This adds the necessary parts for the GoFlowLLM search export. Built off of the dev branch, which I think needs some synchronisation, but is where this should be going I think. |
|
I think there are a bunch of unrelated changes in there. Is there anyway to focus this down to just the goflowllm bits. That would make it possible to review. |
|
Those unrelated things are from the dev branch being out of sync with main I think. I'll see if I can merge main -> dev and rebase this branch. Only the most recent 3 or so commits here are actually relevant |
Improve the docstring for the r2dt CLI Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Remove unused releases parameter Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Raise value error when no match for release number found Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
f-string is the reccomended way for exceptions, c-style for logging Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for exporting GoFlowLLM annotations in the search export. This includes new SQL queries, Rust data structures, and updates to the Nextflow workflow to process and integrate this new data source. Additionally, the PR contains several useful refactorings and fixes, such as improving the robustness of fetching the latest Ensembl release, making the R2DT sequence selection more explicit, and enhancing the idempotency of database scripts. My review focuses on improving the robustness and style of some of these refactored parts. Overall, the changes are well-structured and enhance the pipeline's functionality and reliability.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the necessary components to export GoFlowLLM annotations for search. The changes span across SQL, Python, Rust, and Nextflow files to plumb the new data source through the export pipeline. The implementation is mostly sound, but I've identified a few critical issues in the Nextflow workflow that would cause pipeline failures, and a high-severity data contract mismatch between the Rust and Python components that would lead to runtime errors. I've also included some medium-severity suggestions to improve code maintainability and correct documentation.
| path(text) | ||
| path(litsumm) | ||
| path(editing_events) | ||
| path(go_flow_annotations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go_flow_annotations input is correctly added to this process, but it is not being passed to the search-export sequences merge command within the script block. This omission will cause the pipeline to fail, as the file_joiner in the Rust application expects this file. Please add $go_flow_annotations to the arguments of the search-export command.
| edit_repeat_type, | ||
| keys="editing_events", | ||
| ), | ||
| field("has_go_flow_llm_annotation", has_go_flow_llm_annotation, keys="goflow"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key goflow used here does not match the field name go_flow_llm_annotations in the Normalized Rust struct that generates the JSON data. This will result in a KeyError when the Python script tries to access the data. To ensure consistency with other fields like litsumm and editing_events, I've suggested a change in the Rust code to rename the field to goflow during serialization. Alternatively, you could change this key to go_flow_llm_annotations.
| &self.editing_events | ||
| } | ||
|
|
||
| /// Get a reference to the raw's editing events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds the bits needed to do search export for the GoFlowLLM annotations.