no `flush()` on `PipedGzipWriter` #8

pohutukawa · 2018-05-08T02:28:10Z

It looks like the file-like object returned by xopen() for Gzip compressed files misses the flush() method.

AttributeError: 'PipedGzipWriter' object has no attribute 'flush'

This would be very helpful to have, as well as easy to implement, as gzip.open's file-like object supports the flush() operation.

The text was updated successfully, but these errors were encountered:

This doesn’t work. See #8

marcelm · 2018-05-08T09:09:15Z

It’s actually not that easy. xopen doesn’t use gzip.open if it can avoid it, but runs an external gzip or pigz process to get better speed. I would need a way to tell the running gzip/pigz to flush its currently processed block(s) to disk.

The only method I can think of would be to close the output file (which will flush everything implicitly) and then to re-open it in append mode. This would work since gzip files are allowed to be concatenated, but the problem is that the resulting file will not be the same as when flush() had not been called.

At the moment, I tend to just leave it as it is. I’d accept a PR, but then probably the method would need to be called reopen() instead of flush(). I’m open for discussion, though.

pohutukawa · 2018-05-08T21:15:57Z

Thanks for the context, I wasn't aware of the details behind, and just (blindly) assumed the gzip core Python stuff was at work in the background, and xopen was just abstracting its usage.
It's obvious the files won't be the same, as compression is in the mix, and (smaller) increments won't nearly compress as well as larger batches. Own experience in the past has shown that an decompress/compress cycle on such files has yielded a significant improvement.
For my current use case I need the option to flush, but as file i/o is not the bottle neck, I won't need the benefit of the faster pigz process, so I'll go with the vanilla gzip module for me/now (the benefit over an uncompressed file output is still large enough to walk this path).
I agree under these circumstances reopen() will probably be a better label, though indicating that this may act as a flush() workaround in the documentation. As I have no (immediate) need for it (but only limited time available) I won't be able to provide a PR for this, though. Sorry ... :-(
BTW, nonetheless many thanks for providing this module in the first place. I've learned to like and use it more and more.

marcelm · 2018-05-09T08:50:59Z

Thanks a lot for taking the time to write a detailed reply! I appreciate a lot learning which use cases exist.

I think one other option is to allow to specify that xopen should not use a pipe to an external program. I think this would be good to have anyway. The code already exists (as a fallback when gzip isn’t available), it would just need to be exposed. Let’s leave this issue open until someone finds the time to implement it.

rhpvorderman · 2023-10-18T11:22:42Z

I think one other option is to allow to specify that xopen should not use a pipe to an external program. I think this would be good to have anyway.

As far as I know we have solved this issue. Using threads=0 will always open in the main thread with open, gzip.open, lzma.open etc.

Furthermore the threaded option included in #131 does also allow the flush method. So eventually it will also become available for gzip threads.

rhpvorderman · 2024-02-16T10:55:49Z

Currently flush does nothing:

xopen/src/xopen/__init__.py

Line 398 in d98ee23

def flush(self) -> None:

What can be done for writing is that the EOF is given to the program. It will then terminate the compression block. The file can be opened again in append mode and then the writing can resume. This only works for things that support concatenated blocks though, such as gzip. I think xz, zst and bzip2 also support those? But that would need more investigation.

rhpvorderman · 2024-09-25T07:22:15Z

xz, zst and bzip2 also support mutiple compressed members in one file.

marcelm added a commit that referenced this issue May 8, 2018

Try to implement PipedGzipWriter.flush()

d7b56e3

This doesn’t work. See #8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no `flush()` on `PipedGzipWriter` #8

no `flush()` on `PipedGzipWriter` #8

pohutukawa commented May 8, 2018

marcelm commented May 8, 2018

pohutukawa commented May 8, 2018

marcelm commented May 9, 2018

rhpvorderman commented Oct 18, 2023

rhpvorderman commented Feb 16, 2024

rhpvorderman commented Sep 25, 2024

no flush() on PipedGzipWriter #8

no flush() on PipedGzipWriter #8

Comments

pohutukawa commented May 8, 2018

marcelm commented May 8, 2018

pohutukawa commented May 8, 2018

marcelm commented May 9, 2018

rhpvorderman commented Oct 18, 2023

rhpvorderman commented Feb 16, 2024

rhpvorderman commented Sep 25, 2024

no `flush()` on `PipedGzipWriter` #8

no `flush()` on `PipedGzipWriter` #8