Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stdin bug #158

Merged
merged 8 commits into from
Mar 28, 2024
Merged

Fix stdin bug #158

merged 8 commits into from
Mar 28, 2024

Conversation

rhpvorderman
Copy link
Collaborator

@rhpvorderman rhpvorderman commented Mar 27, 2024

Fixes #157

I also added a whole lot of tests to ensure piped reading never breaks again! (Hopefully)

@rhpvorderman rhpvorderman requested a review from marcelm March 27, 2024 16:07
Copy link
Collaborator

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not so sure about this fix. Explicitly checking all the cases in which we should not re-open the file in _file_is_socket_or_pipe feels fragile. Is it correct that the two open() calls occur in _open_stdin_or_out? Then an alternative solution would be to just not use open() there, but to just return the already open sys.stdout.buffer or sys.stdin.buffer. Would that work?

@rhpvorderman
Copy link
Collaborator Author

I agree. When allowing file-like objects, my first design was to convert everything into a binary stream, then work from there.
Unfortunately, this does not work properly with context managers. gzip.open, bz2.open etc all return an object that does not close the file, when the paramater passed is a file-like object rather than a path. Hence I need to pass a path to those functions if that was provided in the xopen.xopen() function.

Multiple calls to the _open_as_binary_stream function solved this, but caused this problem.

However I think I might alleviate this by using a cache. That would be much less flaky than this current solution.

@rhpvorderman
Copy link
Collaborator Author

rhpvorderman commented Mar 28, 2024

A cache makes the code more messy. Pipes that are passed as files on linux (such as "/dev/stdin") need to be treated differently than regular files. The only way to do so is to have the stat code.
The stat.S_ISREG was used for this in xopen 1.9.0. Format detection was simply skipped for pipes. This was not fragile in that it only allowed regular files. (Are symlinks regular files?)

By doing it this way the format detection can be kept. I agree that it feels fragile in that not all edge cases are covered, but on the other hand checking for ports, sockets and pipes should be pretty much all the use cases.

EDIT: I added an explicit "/dev/stdin" test first, then started to mess around to see if there was a better way, but that kept breaking the /dev/stdin test, so I have given up on finding a better way :-).

@rhpvorderman
Copy link
Collaborator Author

I have replaced all the stat.S_ISFIFO checks with a not stat.S_ISREG check. Symbolic links that point to regular files are also ISREG, so there is no problem there. This check has been in xopen for a very long time, so I trust this will work well.
I think with all the added tests in this PR we can be quite confident that the problem should be fixed long-term.

Copy link
Collaborator

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome with the extensive tests!

@rhpvorderman rhpvorderman merged commit 3306269 into main Mar 28, 2024
18 of 19 checks passed
@rhpvorderman rhpvorderman deleted the fixbug branch March 28, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xopen 2.0.0 does not read stdin fully
2 participants