absorb 🧽🫧🫧

absorb makes it easy to 1) collect, 2) manage, 3) query, and 4) customize datasets from nearly any data source

🚧 this is a preview release of beta software, and it is still under active development 🚧

Features

limitless dataset library: access to millions of datasets across 20+ diverse data sources
intuitive cli+python interfaces: collect or query any dataset in a single line of code
maximal modularity: built on open standards for frictionless integration with other tools
easy extensibility: add new datasets or data sources with just a few lines of code

Installation

basic installation

uv tool install paradigm_absorb

install with all extras

uv tool install paradigm_absorb[test,datasources,interactive]

install from source

git clone [email protected]:paradigmxyz/absorb.git
uv tool install --editable .[test,datasources,interactive]

Example Usage

Example Command Line Usage

# collect dataset and save as local files
absorb collect kalshi

# list datasets that are collected or available
absorb ls

# show schemas of dataset
absorb schema kalshi

# create new custom dataset
absorb new custom_dataset

# upload custom dataset
absorb upload custom_dataset

Example Python Usage

import absorb

# collect dataset and save as local files
absorb.collect('kalshi.metrics')

# get schemas of dataset
schema = absorb.get_schema('kalshi.metrics')

# query dataset eagerly, as polars DataFrame
df = absorb.query('kalshi.metrics')

# query dataset lazily, as polars LazyFrame
lf = absorb.query('kalshi.metrics', lazy=True)

# upload custom dataset
absorb.upload('source.table')

Supported Data Sources

🚧 under construction 🚧

absorb collects data from each of these sources:

4byte function and event signatures
allium crypto data platform
bigquery crypto ETL datasets
binance trades and OHLC candles on the Binance CEX
blocknative Ethereum mempool archive
chain_ids chain id's
coingecko token prices
cryo EVM datasets
defillama DeFi data
dune tables and queries
fred federal macroeonomic data
git commits, authors, and file diffs of a repo
growthepie L2 metrics
kalshi prediction market metrics
l2beat L2 metrics
mempool dumpster Ethereum mempool archive
snowflake generalized data platform
sourcify verified contracts
tic usa treasury department data
tix price feeds
vera verified contract archives
xatu many Ethereum datasets

To list all available datasets and data sources, type absorb ls on the command line.

To display information about the schema and other metadata of a dataset, type absorb help <DATASET> on the command line.

Output Format

absorb uses the filesystem as its database. Each dataset is stored as a collection of parquet files, either on local disk or in the cloud.

Datasets can be stored in any location on your disks, and absorb will use symlinks to organize those files in the ABSORB_ROOT tree.

the ABSORB_ROOT filesystem directory is organized as:

{ABSORB_ROOT}/
    datasets/
        <source>/
            tables/
                <datatype>/
                    {filename}.parquet
                table_metadata.json
            repos/
                {repo_name}/
    absorb_config.json

Configuration

absorb uses a config file to specify which datasets to track.

Schema of absorb_config.json:

{
    'version': str,
    'tracked_tables': list[TableDict],
    'use_git': bool,
    'default_bucket': {
        'rclone_remote': str | None,
        'bucket_name': str | None,
        'path_prefix': str | None,
        'provider': str | None,
    },
}

schema of dataset_config.json:

{
    'source_name': str,
    'table_name': str,
    'table_class': str,
    'parameters': dict[str, JSONValue],
    'table_version': str,
}

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
absorb		absorb
docs		docs
tests		tests
.gitignore		.gitignore
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

absorb 🧽🫧🫧

Features

Contents

Installation

Example Usage

Example Command Line Usage

Example Python Usage

Supported Data Sources

Output Format

Configuration

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

paradigmxyz/absorb

Folders and files

Latest commit

History

Repository files navigation

absorb 🧽🫧🫧

Features

Contents

Installation

Example Usage

Example Command Line Usage

Example Python Usage

Supported Data Sources

Output Format

Configuration

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages