Skip to content

Commit

Permalink
Clean up README slightly and document the xopen function parameters
Browse files Browse the repository at this point in the history
  • Loading branch information
marcelm committed Jan 15, 2024
1 parent ee5b270 commit bf347b1
Showing 1 changed file with 100 additions and 31 deletions.
131 changes: 100 additions & 31 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,43 +17,22 @@
xopen
=====

This Python module provides an ``xopen`` function that works like the
This Python module provides an ``xopen`` function that works like Python’s
built-in ``open`` function but also transparently deals with compressed files.
Supported compression formats are currently gzip, bzip2, xz and optionally Zstandard.

``xopen`` selects the most efficient method for reading or writing a compressed file.
For gzip files this means falling back on the threaded methods of the
``python-isal`` library if supported. Alternatively a pipe can be opened to
an external tool, such as `pigz <https://zlib.net/pigz/>`_, which is a parallel
version of ``gzip``.

If ``threads=0`` is passed to ``xopen()``, no external process is used.
For gzip files, this will then use `python-isal
<https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
it is installed (since ``python-isal`` is a dependency of ``xopen``,
this should always be the case).
``python-isal`` does not support compression levels
greater than 3, so if no external tool is available or ``threads`` has been set to 0,
Python’s built-in ``gzip.open`` is used.

For xz files, a pipe to the ``xz`` program is used because it has built-in support for multithreaded compression.

For bz2 files, `pbzip2 (parallel bzip2) <http://compression.ca/pbzip2/>`_ is used.
Supported compression formats are:

``xopen`` falls back to Python’s built-in functions
(``gzip.open``, ``lzma.open``, ``bz2.open``)
if none of the other methods can be used.

The file format to use is determined from the file name if the extension is recognized
(``.gz``, ``.bz2``, ``.xz`` or ``.zst``).
When reading a file without a recognized file extension, xopen attempts to detect the format
by reading the first couple of bytes from the file.
- gzip (``.gz``)
- bzip2 (``.bz2``)
- xz (``.xz``)
- Zstandard (``.zst``) (optional)

``xopen`` is compatible with Python versions 3.8 and later.


Usage
-----
Example usage
-------------

Open a file for reading::

Expand All @@ -72,6 +51,96 @@ and avoid using an external process::
f.write(b"Hello")


The ``xopen`` function
----------------------

The ``xopen`` module offers a single function named ``xopen`` with the following
signature::

xopen(
filename: str | bytes | os.PathLike,
mode: Literal["r", "w", "a", "rt", "rb", "wt", "wb", "at", "ab"] = "r",
compresslevel: Optional[int] = None,
threads: Optional[int] = None,
*,
encoding: str = "utf-8",
errors: Optional[str] = None,
newline: Optional[str] = None,
format: Optional[str] = None,
) -> IO

The function opens the file using a function suitable for the detected
file format and returns an open file-like object.

When writing, the file format is chosen based on the file name extension:
``.gz``, ``.bz2``, ``.xz``, ``.zst``. This can be overriden with ``format``.
If the extension is not recognized, no compression is used.

When reading and a file name extension is available, the format is detected
from the extension.
When reading and no file name extension is available,
the format is detected from the contents.

Parameters
~~~~~~~~~~

**filename** (str, bytes, or `os.PathLike <https://docs.python.org/3/library/os.html#os.PathLike>`_):
Name of the file to open.

If set to ``"-"``, standard output (in mode ``"w"``) or
standard input (in mode ``"r"``) is returned.

**mode**, **encoding**, **errors**, **newline**:
These parameters have the same meaning as in Python’s built-in
`open function <https://docs.python.org/3/library/functions.html#open>`_
except that the default encoding is always UTF-8 instead of the
preferred locale encoding.
``encoding``, ``errors`` and ``newline`` are only used when opening a file in text mode.

**compresslevel**:
The compression level for writing to gzip, xz and Zstandard files.
If set to None, a default depending on the format is used:
gzip: 6, xz: 6, Zstandard: 3.

This parameter is ignored for other compression formats.

**format**:
Override the autodetection of the input or output format.
Possible values are: ``"gz"``, ``"xz"``, ``"bz2"``, ``"zst"``.

**threads**:
If multi-threaded compression or decompression is available,
this parameter can be used to override the number of threads
used. It is ignored otherwise.

Set threads to 0 to force opening the file without using a subprocess.
For some compression levels,
compressed files are by default read or written
using a pipe to a subprocess running an external tool such as
``pbzip2`` or ``xz``.
With *threads* set to 0, a normal function call is used instead.


Backends
--------

Opening of gzip files is delegated to one of these programs or libraries:

* `python-isal <https://github.com/pycompression/python-isal>`_.
Supports multiple threads and compression levels up to 3.
* zlib-ng
* `pigz <https://zlib.net/pigz/>`_ (a parallel version of ``gzip``)

For xz files, a pipe to the ``xz`` program is used because it has
built-in support for multithreaded compression.

For bz2 files, `pbzip2 (parallel bzip2) <http://compression.ca/pbzip2/>`_ is used.

``xopen`` falls back to Python’s built-in functions
(``gzip.open``, ``lzma.open``, ``bz2.open``)
if none of the other methods can be used.


Reproducibility
---------------

Expand Down Expand Up @@ -270,7 +339,7 @@ Credits
-------

The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of
`utils.h file that is part of
BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.

Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
Expand All @@ -284,7 +353,7 @@ Maintainers

* Marcel Martin
* Ruben Vorderman
* For a list of contributors, see <https://github.com/pycompression/xopen/graphs/contributors>
* See also the `full list of contributors <https://github.com/pycompression/xopen/graphs/contributors>`_.


Links
Expand Down

0 comments on commit bf347b1

Please sign in to comment.