Skip to content

Conversation

@afg1
Copy link
Contributor

@afg1 afg1 commented Oct 13, 2025

Adds the bits needed to do search export for the GoFlowLLM annotations.

@afg1 afg1 marked this pull request as ready for review October 13, 2025 14:12
@afg1 afg1 requested review from blakesweeney and pmb59 October 13, 2025 14:13
@afg1
Copy link
Contributor Author

afg1 commented Oct 13, 2025

This adds the necessary parts for the GoFlowLLM search export. Built off of the dev branch, which I think needs some synchronisation, but is where this should be going I think.

@blakesweeney
Copy link
Member

I think there are a bunch of unrelated changes in there. Is there anyway to focus this down to just the goflowllm bits. That would make it possible to review.

@afg1
Copy link
Contributor Author

afg1 commented Oct 13, 2025

Those unrelated things are from the dev branch being out of sync with main I think. I'll see if I can merge main -> dev and rebase this branch. Only the most recent 3 or so commits here are actually relevant

afg1 and others added 9 commits October 24, 2025 15:48
Improve the docstring for the r2dt CLI

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Remove unused releases parameter

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Raise value error when no match for release number found

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
f-string is the reccomended way for exceptions, c-style for logging

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@afg1 afg1 marked this pull request as draft October 24, 2025 15:16
@afg1
Copy link
Contributor Author

afg1 commented Oct 27, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for exporting GoFlowLLM annotations in the search export. This includes new SQL queries, Rust data structures, and updates to the Nextflow workflow to process and integrate this new data source. Additionally, the PR contains several useful refactorings and fixes, such as improving the robustness of fetching the latest Ensembl release, making the R2DT sequence selection more explicit, and enhancing the idempotency of database scripts. My review focuses on improving the robustness and style of some of these refactored parts. Overall, the changes are well-structured and enhance the pipeline's functionality and reliability.

@afg1
Copy link
Contributor Author

afg1 commented Oct 27, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the necessary components to export GoFlowLLM annotations for search. The changes span across SQL, Python, Rust, and Nextflow files to plumb the new data source through the export pipeline. The implementation is mostly sound, but I've identified a few critical issues in the Nextflow workflow that would cause pipeline failures, and a high-severity data contract mismatch between the Rust and Python components that would lead to runtime errors. I've also included some medium-severity suggestions to improve code maintainability and correct documentation.

path(text)
path(litsumm)
path(editing_events)
path(go_flow_annotations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The go_flow_annotations input is correctly added to this process, but it is not being passed to the search-export sequences merge command within the script block. This omission will cause the pipeline to fail, as the file_joiner in the Rust application expects this file. Please add $go_flow_annotations to the arguments of the search-export command.

edit_repeat_type,
keys="editing_events",
),
field("has_go_flow_llm_annotation", has_go_flow_llm_annotation, keys="goflow"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The key goflow used here does not match the field name go_flow_llm_annotations in the Normalized Rust struct that generates the JSON data. This will result in a KeyError when the Python script tries to access the data. To ensure consistency with other fields like litsumm and editing_events, I've suggested a change in the Rust code to rename the field to goflow during serialization. Alternatively, you could change this key to go_flow_llm_annotations.

&self.editing_events
}

/// Get a reference to the raw's editing events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This documentation comment appears to be a copy-paste from the editing_events function. It should be updated to describe the go_flow_llm_annotations function.

Suggested change
/// Get a reference to the raw's editing events.
/// Get a reference to the raw's go flow llm annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants