Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: remove the Rust dependency #164

Merged
merged 5 commits into from
Feb 8, 2024
Merged

Conversation

misialq
Copy link
Collaborator

@misialq misialq commented Jan 10, 2024

This PR removes the fastq_writer to get rid of the Rust dependency for our conda builds.

@misialq
Copy link
Collaborator Author

misialq commented Jan 10, 2024

Hey @ebolyen, hi @lizgehret, is this what you had in mind or can we do it better? 😆

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b418f80) 98.74% compared to head (ab2716f) 98.73%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
- Coverage   98.74%   98.73%   -0.01%     
==========================================
  Files          29       29              
  Lines        3031     3012      -19     
==========================================
- Hits         2993     2974      -19     
  Misses         38       38              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@lizgehret lizgehret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first glance, this all seems reasonable to me @misialq!

EDIT: That being said, it might be worth adding a couple of unit tests just to make sure we're getting the same output as before.

q2_fondue/sequences.py Outdated Show resolved Hide resolved
@misialq
Copy link
Collaborator Author

misialq commented Jan 16, 2024

Hey @ebolyen, thanks for your suggestions! I did try to change the code accordingly, but something strange is happening here... When I compare the file before and after compression (by "eye" and programmatically) they look identical. And yet QIIME keeps complaining that something is not quite right with the new one - here is the message I get when one of the tests fails:

Quality score length doesn't match sequence length for record beginning on line 13

I get a similar one when I just try to import that file into an artifact from the CLI. Is there anything obvious I'm missing here? I have not yet looked at what the validator does exactly...

@ebolyen
Copy link
Collaborator

ebolyen commented Jan 18, 2024

Hi @misialq,

This code is correct, although there's a tiny strangeness to either the test-file or our validator, depending on perspective.

The original code did newline processing, and so would politely terminate the file with a final newline (you can run cat on the test file to see your shell bleed onto the last line).

Our validator is comparing line length, and includes the newline in its count. So:

ACGT\n  # this is 5 chars
85)9   # no trailing newline on the last line, so 4 chars

The super easy fix is to just add one more line to your test file. I don't think SRA will give you a non-newline terminated file, so I think just adding the newline to the test file is the way to go here.

The less easy fix is to update the validator to ignore newlines without adding too much more processing time (so maybe end-of-file only).

Alternatively we update this code to tack on a final newline if it's missing pre-gzip, but I don't think there's any nice way to do that.

@misialq misialq marked this pull request as ready for review January 19, 2024 15:24
@misialq
Copy link
Collaborator Author

misialq commented Jan 19, 2024

Hey @ebolyen, I knew your keen eye would spot it! Thanks! That makes a lot of sense. I decided to go for the first and easiest solution, simply because (as you pointed out) the sequences fetched from SRA do have those newlines. Let me know whether that works for you and whether there are any other changes you'd like to see here. Otherwise, I'd be ready to merge. Thanks a lot!

Copy link
Collaborator

@ebolyen ebolyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took me a minute to get back to! I've been swamped. This looks great, and we should be able to get it into our distros now :)

@ebolyen ebolyen merged commit 90e9d6d into bokulich-lab:main Feb 8, 2024
7 checks passed
@misialq misialq deleted the no-more-rust branch February 21, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants