Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows for file-like objects to be passed to read_molecule_file #84

Merged
merged 12 commits into from
Jul 16, 2020

Conversation

IAlibay
Copy link
Contributor

@IAlibay IAlibay commented Jul 14, 2020

Fixes #83

Changes:

  • open_file_for_reading: fseek -> seek
  • read_molecule_file: adds an extra argument filename which can be used to identify the file type when passing a file-like object.
  • A version of the basic tests that passes a file object (:class:_io.TextIOWrapper) instead of the actual file path (this might be excessive / in the wrong place, but I thought I'd add them in anyways as a "proof of concept").

Question:

  • Should the extra attribute checks for the file-like objects removed in De-lint PROPKA #40 be added back?

@codecov
Copy link

codecov bot commented Jul 14, 2020

Codecov Report

Merging #84 into master will increase coverage by 0.16%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   70.24%   70.41%   +0.16%     
==========================================
  Files          25       25              
  Lines        4137     4140       +3     
==========================================
+ Hits         2906     2915       +9     
+ Misses       1231     1225       -6     
Impacted Files Coverage Δ
propka/input.py 62.87% <100.00%> (+4.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01debbf...81e51c3. Read the comment docs.

Copy link
Collaborator

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. Maybe a few more docs and an explicit test with StringIO?

propka/input.py Outdated Show resolved Hide resolved
@@ -152,3 +152,66 @@ def test_regression(pdb, options, tmp_path):
run_propka(options, pdb_path, tmp_path)
if ref_path is not None:
compare_output(pdb, tmp_path, ref_path)


def run_propka_stream(options, input_file, tmp_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also work with StringIO buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :) I've moved everything over to a separate test_streamio.py file (and added an init.py), it was getting a tad bit crowded for a file that was really meant to test basic regression. I've also:
a) tests both TextIO and StringIO inputs.
b) reduced the number of test cases (all that needed to be checked was if it gave the same results and also allowed arguments).
c) switched the usage of tmp_dir over to pytest's tmpdir, which seems a lot cleaner to use.
d) added a test to capture the TypeError that should be thrown if you don't pass a value to filename in read_molecule_file.

propka/input.py Outdated Show resolved Hide resolved
@orbeckst
Copy link
Collaborator

Should the extra attribute checks for the file-like objects removed in #40 be added back?

I don't remember why I added all of them way-back-when – probably related to MDAnalysis.lib.util.isstream. It is possible that I wasn't quite sure what was needed to ultimately make it work. If your streamlined version does the job then I think it's fine. I would make sure the StringIO works.

(I wouldn't bother with streams from the network (via urllib2 or request) although it could be fun :-).)

propka/input.py Outdated Show resolved Hide resolved
@IAlibay
Copy link
Contributor Author

IAlibay commented Jul 15, 2020

Should the extra attribute checks for the file-like objects removed in #40 be added back?

I don't remember why I added all of them way-back-when – probably related to MDAnalysis.lib.util.isstream. It is possible that I wasn't quite sure what was needed to ultimately make it work. If your streamlined version does the job then I think it's fine. I would make sure the StringIO works.

(I wouldn't bother with streams from the network (via urllib2 or request) although it could be fun :-).)

As it is now, it's probably not the safest way to ensure the object will be read, but it should work for normal use cases.

In some cases e.g. BufferedIOBase derived objects, might not work (i.e. if object.seekable() returns False). Although I'm not really sure of a case where this would be useful / worth handling.

@orbeckst
Copy link
Collaborator

Very nice, thank you @IAlibay .

@sobolevnrm / @speleo3 do you want to have a quick look at the PR? (Disclosure: I am biased, I want the feature to work again as soon as possible...)

@speleo3
Copy link
Collaborator

speleo3 commented Jul 15, 2020

Looks pretty good to me, but I have one comment: I think the argument logic of read_molecule_file can be streamlined. Instead of having the filename either in the first or the third argument, and allowing the first argument to be either a path or a stream, I suggest this API:

def read_molecule_file(filename: str, mol_container, stream=None):
    """
    Args:
        filename (str): input file name
        mol_container:  MolecularContainer object
        stream: optional input file handle. If None, then open `filename` as a local file for reading.
    """

What do you think?

@sobolevnrm
Copy link
Collaborator

Very nice, thank you @IAlibay .

@sobolevnrm / @speleo3 do you want to have a quick look at the PR? (Disclosure: I am biased, I want the feature to work again as soon as possible...)

I probably won't be able to look at it until tonight. However, I'm confident in the assessment of you and @speleo3. Thanks!

@IAlibay
Copy link
Contributor Author

IAlibay commented Jul 15, 2020

Looks pretty good to me, but I have one comment: I think the argument logic of read_molecule_file can be streamlined. Instead of having the filename either in the first or the third argument, and allowing the first argument to be either a path or a stream, I suggest this API:

def read_molecule_file(filename: str, mol_container, stream=None):
    """
    Args:
        filename (str): input file name
        mol_container:  MolecularContainer object
        stream: optional input file handle. If None, then open `filename` as a local file for reading.
    """

What do you think?

That is a lot cleaner :) It makes the try/except block a lot nicer.

@speleo3 speleo3 self-requested a review July 15, 2020 17:23
Copy link
Collaborator

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor questions re: tests – that's all.

I'll let it sit for a day in case @sobolevnrm wants to have a look or merge it; otherwise I'll merge sometime later tomorrow.

tests/test_streamio.py Show resolved Hide resolved
tests/test_streamio.py Show resolved Hide resolved
@sobolevnrm
Copy link
Collaborator

This looks awesome -- thank you, all!

@sobolevnrm sobolevnrm merged commit d16257a into jensengroup:master Jul 16, 2020
@IAlibay IAlibay deleted the streams branch July 16, 2020 01:29
@IAlibay IAlibay mentioned this pull request Jul 16, 2020
orbeckst pushed a commit that referenced this pull request Jul 17, 2020
* Fixes #82
* Changes:
    * input arguments now reflects the changes made to read_molecule_file in #84
    * Writing of pKa file is now optional (default behaviour has been kept). This will be particularly 
       useful downstream where we would just want to have access to the MoleculeContainer object.
     * new test_run file specific for testing run.
* add tests
* add docs
@orbeckst orbeckst mentioned this pull request Jul 17, 2020
2 tasks
orbeckst added a commit to Becksteinlab/propka that referenced this pull request Jul 18, 2020
- docs use versioneer-based propka.__version__
- also added @IAlibay to authors (forgotten in previous
  PRs jensengroup#84 and jensengroup#85)
- generate a sitemap (add sphinx_sitemap to requirements.txt)
orbeckst added a commit that referenced this pull request Jul 18, 2020
- fix #87
- user versioneer for version management
   - use tag "vMAJOR.MINOR.PATCH" to indicate release number
   - exclude generated files from coverage;
      also exclude tests from coverage reporting;
      allow use of "# pragma: no cover" to exclude lines of code
      from coverage
   - configure coverage with entries in setup.cfg (removed
      commandline config from workflows/python-package.yml)
- related doc updates
   - make docs automatically use current version
      (docs use versioneer-based propka.__version__)
   - also added @IAlibay to authors (forgotten in previous
      PRs #84 and #85)
   - generate a sitemap (add sphinx_sitemap to requirements.txt)
- add test_version
   Note: Versioneer-generated version is “0-untagged” on the branch where
   it is tested so need to add it to a valid result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing a stream no longer works
4 participants