Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xopen.xopen(zst_file): results in OSError: b'/*stdin*\\ : Read error (39) : premature end \n' (exit code 1) #160

Closed
gcflymoto opened this issue Jun 4, 2024 · 3 comments

Comments

@gcflymoto
Copy link

Hi, with xopen/2.0.1 and zstandard/0.22.0 I see the following sometimes when opening a .zst file

with xopen.xopen(file_path) as fp:
  File "/usr/pkgs/python3/3.11.1/modules/r2/lib/python3.11/site-packages/xopen/__init__.py", line 352, in close
    self._raise_if_error(check_allowed_code_and_message, stderr_message)
  File "/usr/pkgs/python3/3.11.1/modules/r2/lib/python3.11/site-packages/xopen/__init__.py", line 413, in _raise_if_error
    raise OSError(f"{stderr_message!r} (exit code {retcode})")
OSError: b'/*stdin*\\ : Read error (39) : premature end \n' (exit code 1)

Do you know how I can decrypt this? I modified the script's invocation to call xopen and piped in /dev/null as the stdin, which seems to bypass it.

@marcelm
Copy link
Collaborator

marcelm commented Jun 4, 2024

Thanks! This can reliably be triggered with a compressed file that is larger than 128 KiB (131072 bytes).

As a workaround, you can add threads=0 to your xopen.xopen invocation for now.

marcelm added a commit that referenced this issue Jun 4, 2024
Previously, only the uncompressed data was "large", but issue #160 is only
triggered if the *compressed* data is large (in this case, larger than
128 kB), which apparently exceeds some input buffer.

See #160
marcelm added a commit that referenced this issue Jun 4, 2024
Previously, only the uncompressed data was "large", but issue #160 is only
triggered if the *compressed* data is large (in this case, larger than
128 kB), which apparently exceeds some input buffer.

See #160
marcelm added a commit that referenced this issue Jun 4, 2024
There was already a test for this, but only the uncompressed data was
"large". Issue #160 is only triggered if the *compressed* data is
large (in this case, larger than 128 kB), which apparently exceeds
some input buffer.

See #160
@marcelm
Copy link
Collaborator

marcelm commented Jun 4, 2024

What we do in xopen is equivalent to something like this:

$ head -c 1000 largefile.txt.zst | zstd -dc -
/*stdin*\ : Read error (39) : premature end

That is, we stop sending input data to the zstd process, which it detects and complains about.

It appears this happens:

  1. xopen.xopen() instantiates a _PipedCompressionProgram
    • which spawns an external zst process for decompression
    • to supply the zst process with data to decompress, a "feeder" thread is started that repeatedly reads 128 kiB chunks from the input file and sends them to the stdin of the process
  2. No data or only some data is read by the program
  3. _PipedCompressionProgram.close() is called before the end of the input file has been reached:
    • The feeder thread is told to stop doing its thing.
    • The feeder thread closes its output, that is, the stdin of the process.
    • This happens while zstd is in the middle of decoding compressed data
    • zstd complains as see above

@marcelm
Copy link
Collaborator

marcelm commented Jun 20, 2024

Fix merged in #161, so I think this can be closed now. Thanks again for reporting!

@marcelm marcelm closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants