Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fasta_digest to accept Path or bytes #93

Open
nsheff opened this issue Feb 19, 2025 · 2 comments · Fixed by #98
Open

Allow fasta_digest to accept Path or bytes #93

nsheff opened this issue Feb 19, 2025 · 2 comments · Fixed by #98

Comments

@nsheff
Copy link
Member

nsheff commented Feb 19, 2025

Right now digest_fasta strictly requires a string.

works:

from gtars.digests import digest_fasta
path_to_fasta_file = "../gtars/gtars/tests/data/base.fa"
df = digest_fasta(path_to_fasta_file)

Doesn't work:

from pathlib import Path
p = Path(path_to_fasta_file)
digest_fasta(p)
# TypeError: argument 'fasta': 'PosixPath' object cannot be converted to 'PyString'

Also fails:

digest_fasta(path_to_fasta_file.encode('utf-8'))
TypeError: argument 'fasta': 'bytes' object cannot be converted to 'PyString'

Would it be easy/possible/advisable to make this friendly enough to convert string-like inputs to string automatically?

Raised by @stolarczyk

@nleroy917
Copy link
Member

PyO3 has a lot of native mappings from Python types to Rust types. If you want to accept Path objects and str objects from Python into Rust, I think the function argument needs to be of type PathBuf. Although, if I am reading their chart correctly, PyString should be sufficient. You currently have it at &str so maybe that's the issue.

that said... I wonder if we could leverage the .extract() API from pyo3 and do a nested if let Ok(...) pattern to accept multiple types.

Something like:

#[pyfunction]
pub fn digest_fasta(fasta: PyObject) -> PyResult<Vec<PyDigestResult>> {
  
  let fasta_file;
  if let Ok(fasta) = fasta.extract::<PyString>() {
    fasta_file = fasta;
  } else if let Ok(fasta) = fasta.extract::<PathBuf>() {
     fasta_file = fasta;
  } else {
    return Err(PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("Must be of type String or Path"))
  }

    match gtars::digests::digest_fasta(fasta) {
        Ok(digest_results) => {
            let py_digest_results: Vec<PyDigestResult> = digest_results.into_iter().map(PyDigestResult::from).collect();
            Ok(py_digest_results)
        },
        Err(e) => Err(PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("Error processing FASTA file: {}", e))),
    }
}

@nsheff
Copy link
Member Author

nsheff commented Feb 20, 2025

I went a different direction in the end.

The .extract works, but this copies the python string into a rust object. so you're duplicating data. this is very bad news for something when you're copying a huge string you're trying to digest (like a chromosome).

Instead I can use .downsample with PyAny. There's also a new way in PyO3 recommend for using PyAny, where you use a Bound type. ANyway, I got this all working and you can see this in my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants