This is a DuckDB extension that adds support for reading files from within zip archives and other archive formats such as tar.
Load from the community extensions repository:
INSTALL zipfs FROM community;
LOAD zipfs;To read a file:
SELECT * FROM 'zip://examples/a.zip/a.csv';To read a file from azure blob storage (or other file system):
SELECT * FROM 'zip://az://yourstorageaccount.blob.core.windows.net/yourcontainer/examples/a.zip/a.csv';To read the table of contents of a zip file:
SELECT * FROM archive_contents('examples/a.zip');| URL quick reference | Description |
|---|---|
zip://a.zip/*.csv |
Local zip file named a.zip, containing csv files. |
zip://http://example.com/a.zip/*.csv |
Web hosted zip file named a.zip, containing csv files. |
archive://a.tar.gz!!*.csv |
Local archive file named a.tar.gz, containg csv files. |
compressed://a.jsonl.bz2 |
Local compressed ndjson file a.jsonl.bz2. |
| Function | Description |
|---|---|
zip_contents |
Read the table of contents of a zip file |
archive_contents |
Read the table of contents of an archive file |
File names passed into the zip:// URL scheme are expected to end with .zip, which indicates the end of the zip file name. The path after
that is taken to be the file path within the zip archive.
Globbing within the zip archive is supported, but see below for performance limitations. A glob query looks like:
SELECT * FROM 'zip://examples/a.zip/*.csv';Globbing for multiple zip files:
SELECT * FROM 'zip://examples/*.zip/*.csv';You may use options to turn this behavior off and instead choose some string to split on:
SET zipfs_split = "!!";
SELECT * FROM 'zip://examples/a.zip!!b.csv';Using zipfs_split also means you can read other archives supported by libarchive: (note different URL scheme, and libarchive is not available on Windows)
SET zipfs_split = "!!";
SELECT * FROM 'archive://examples/a.tar.gz!!b.csv';It is also possible to read from a variety of compressed file formats directly:
SELECT * FROM read_json('compressed://examples/a.jsonl.bz2');This extension supports both zip files and archive files. The zip file support is using miniz, the archive file support uses libarchive. libarchive supports a wider range of compression algorithms and container formats. libarchive is not available on Windows and using them there will result in an error.
This extension is intended more for convience than high performance. It does not implement a file metadata cache as tarfs (on which this
extension is based) does. As such, operations which require the central directory (index) of the zip file, such as globbing files, must
reread the central directory multiple times, once for the glob and once for each file to open.
The selected file will be read entirely into memory, not streamed. Therefore it cannot be used to read files which are larger than memory when uncompressed.
First, install vcpkg to vcpkg:
git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmakeThen:
GEN=ninja make release
make test_releaseduckdb-zipfs Copyright 2025 Isaac Brodsky. Licensed under the MIT License.
DuckDB Copyright 2018-2022 Stichting DuckDB Foundation (MIT License)
miniz Copyright 2013-2014 RAD Game Tools and Valve Software Copyright 2010-2014 Rich Geldreich and Tenacious Software LLC (MIT License)
DuckDB extension-template Copyright 2018-2022 DuckDB Labs BV (MIT License)
duckdb_tarfs (MIT license)
libarchive Copyright 2003-2018 Tim Kientzle (varying licenses, see repo)