target-adbc

A Singer target for loading data into ADBC-compatible databases.

ADBC (Arrow Database Connectivity) is a database access API that uses Apache Arrow for data interchange, providing efficient columnar data transfer between applications and databases.

Features

Universal Database Support: Works with any ADBC-compatible database driver (DuckDB, SQLite, PostgreSQL, etc.)
High Performance: Uses Apache Arrow for efficient columnar data transfer
Flexible Configuration: Supports various connection methods and driver-specific options
Singer Specification Compliant: Fully compatible with the Singer ecosystem
Metadata Tracking: Optional Stitch-style metadata columns for data lineage

Installation

# Clone the repository
git clone https://github.com/yourusername/target-adbc.git
cd target-adbc

# Install target-adbc
uv tool install --editable ./target-adbc

# Install ADBC drivers (see "Supported Databases" section below)
# We recommend using dbc for driver management:
curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install duckdb  # or sqlite, postgresql, etc.

# For development
uv sync --all-extras

Supported Databases

Any database with an ADBC driver is supported. Popular options include:

Database	Driver Name in Config	Install with `dbc`
DuckDB	`duckdb`	`dbc install duckdb`
SQLite	`sqlite`	`dbc install sqlite`
PostgreSQL	`postgresql`	`dbc install postgresql`
Flight SQL	`flightsql`	`dbc install flightsql`
Snowflake	`snowflake`	`dbc install snowflake`

Installing ADBC Drivers

Use dbc to install ADBC drivers. The dbc tool provides pre-built binaries and simplifies driver management across platforms, making it straightforward to get started without worrying about driver-specific package dependencies:

# Install dbc (one-time setup)
curl -LsSf https://dbc.columnar.tech/install.sh | sh

# Install the drivers you need
dbc install duckdb
dbc install postgresql
dbc install sqlite

Why dbc? The ADBC driver manager package (adbc-driver-manager) uses a single API regardless of the database you're connecting to. While users could install driver packages separately, dbc makes this experience seamless by handling driver manifests and pre-built binaries automatically.

See the ADBC documentation for a full list of available drivers.

Usage

Basic Example

The target accepts Singer messages on stdin and loads data into the configured database:

tap-something | target-adbc --config config.json

Configuration

Create a config.json file with your database connection details:

DuckDB Example

{
  "driver": "duckdb",
  "duckdb": {
    "path": "my_database.duckdb"
  },
  "batch_size": 10000,
  "overwrite_behavior": "append"
}

SQLite Example

{
  "driver": "sqlite",
  "sqlite": {
    "uri": "my_database.sqlite"
  },
  "batch_size": 5000
}

PostgreSQL Example

{
  "driver": "postgresql",
  "postgresql": {
    "uri": "postgresql://myuser:mypass@localhost:5432/mydb"
  },
  "default_target_schema": "public",
  "batch_size": 10000
}

Configuration Options

Setting	Required	Default	Description
`driver`	Yes	-	ADBC driver name (e.g., `duckdb`, `sqlite`, `postgresql`)
`duckdb.path`	No*	-	Path to DuckDB database file (required when `driver` is `duckdb`)
`sqlite.uri`	No*	-	Path to SQLite database file (required when `driver` is `sqlite`)
`postgresql.uri`	No*	-	PostgreSQL connection string (required when `driver` is `postgresql`)
`default_target_schema`	No	-	Default schema for tables
`table_prefix`	No	`""`	Prefix to add to all table names
`table_suffix`	No	`""`	Suffix to add to all table names
`overwrite_behavior`	No	`append`	How to handle existing tables: `append`, `replace`, or `fail`
`batch_size`	No	`10000`	Number of rows to process per batch
`add_record_metadata`	No	`true`	Add metadata columns (`_sdc_*`) to tables
`varchar_length`	No	`255`	Default VARCHAR length when not specified

*Driver-specific settings are required based on the selected driver.

Overwrite Behaviors

append (default): Add new data to existing tables
replace: Drop and recreate tables before loading
fail: Raise an error if the table already exists

Metadata Columns

When add_record_metadata is enabled (default), the following columns are added:

_sdc_extracted_at: Timestamp when the record was extracted from the source
_sdc_received_at: Timestamp when the record was received by the target
_sdc_batched_at: Timestamp when the record was batched for loading
_sdc_sequence: Sequence number for ordering
_sdc_table_version: Table version number

Using with Meltano

Install ADBC Drivers First

Before using target-adbc with Meltano, install the required ADBC driver(s):

# Install dbc
curl -LsSf https://dbc.columnar.tech/install.sh | sh

# Install the driver(s) you need
dbc install duckdb

Add to meltano.yml

Add the target to your meltano.yml:

plugins:
  loaders:
  - name: target-adbc
    namespace: target_adbc
    pip_url: -e .
    executable: target-adbc
    settings:
    - name: driver
      kind: options
      description: ADBC driver name (e.g., duckdb, sqlite, postgresql)
      options:
      - label: DuckDB
        value: duckdb
      - label: SQLite
        value: sqlite
      - label: PostgreSQL
        value: postgresql

    # DuckDB settings
    - name: duckdb.path
      kind: string
      description: Path to the DuckDB database file

    # SQLite settings
    - name: sqlite.uri
      kind: string
      description: URI to the SQLite database file

    # PostgreSQL settings
    - name: postgresql.uri
      kind: string
      description: PostgreSQL connection string

    # General settings
    - name: default_target_schema
      kind: string
      description: Default schema to use for tables if not specified in stream name
    - name: table_prefix
      kind: string
      description: Prefix to add to all table names
    - name: table_suffix
      kind: string
      description: Suffix to add to all table names
    - name: batch_size
      kind: integer
      description: Maximum number of rows to process in a single batch
    - name: overwrite_behavior
      kind: options
      description: Behavior when table already exists
      options:
      - label: Append
        value: append
      - label: Replace
        value: replace
      - label: Fail
        value: fail
    - name: add_record_metadata
      kind: boolean
      description: Add metadata columns to the output tables
    - name: varchar_length
      kind: integer
      description: Default length for VARCHAR columns when not specified

  # Configured variant for DuckDB
  - name: target-adbc-duckdb
    inherit_from: target-adbc
    config:
      driver: duckdb
      overwrite_behavior: replace
      duckdb:
        path: ${MELTANO_PROJECT_ROOT}/output/warehouse.duckdb

  # Configured variant for SQLite
  - name: target-adbc-sqlite
    inherit_from: target-adbc
    config:
      driver: sqlite
      overwrite_behavior: replace
      sqlite:
        uri: ${MELTANO_PROJECT_ROOT}/output/warehouse.sqlite

Run with Meltano

meltano run tap-something target-adbc

Examples

Example 1: Load CSV to DuckDB

# Create a simple tap
cat << 'EOF' > sample_data.jsonl
{"type": "SCHEMA", "stream": "users", "schema": {"properties": {"id": {"type": "integer"}, "name": {"type": "string"}, "email": {"type": "string"}}, "type": "object"}, "key_properties": ["id"]}
{"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "Alice", "email": "[email protected]"}}
{"type": "RECORD", "stream": "users", "record": {"id": 2, "name": "Bob", "email": "[email protected]"}}
EOF

# Create config
cat << 'EOF' > config.json
{
  "driver": "duckdb",
  "duckdb": {
    "path": "users.duckdb"
  }
}
EOF

# Load data
cat sample_data.jsonl | target-adbc --config config.json

# Query the result
duckdb users.duckdb -c "SELECT * FROM users"

Example 2: Stream Data from API to PostgreSQL

# Configure PostgreSQL target
cat << 'EOF' > pg_config.json
{
  "driver": "postgresql",
  "postgresql": {
    "uri": "postgresql://postgres:secret@localhost:5432/analytics"
  },
  "default_target_schema": "raw_data",
  "overwrite_behavior": "append"
}
EOF

# Run with any Singer tap
tap-github --config tap_config.json | target-adbc --config pg_config.json

Development

Setup

# Clone the repository
git clone https://github.com/yourusername/target-adbc
cd target-adbc

# Install in development mode
uv sync --all-extras

# Install ADBC drivers for testing
dbc install duckdb sqlite

Testing

# Run tests
uv run pytest

# Run with coverage
uv run coverage run -m pytest

# Type checking
uv run mypy target_adbc tests

# Linting
uv run ruff check target_adbc tests
uv run ruff format target_adbc tests

Architecture

The target follows the Singer specification and uses the Meltano SDK:

Target (target.py): Main entry point that orchestrates the data loading process
Sink (sinks.py): Handles batch processing and ADBC interactions
Settings (settings.py): Configuration schema and validation

Data Flow

Singer Messages � Target � Sink � ADBC Connection � Database
                           �
                    PyArrow Tables

The sink:

Receives batches of records from the target
Converts records to PyArrow tables using the Singer schema
Uses ADBC's adbc_ingest for efficient bulk loading
Handles table creation, schema evolution, and error handling

Singer Specification Compliance

This target implements the Singer specification:

Accepts SCHEMA, RECORD, and STATE messages
Outputs STATE messages for checkpoint management
Handles schema evolution
Supports batch processing for performance
Validates configuration

Performance Tips

Batch Size: Increase batch_size for larger datasets (e.g., 50000-100000 rows)
Metadata: Disable add_record_metadata if you don't need lineage tracking
Schema: Specify schema explicitly to avoid inference overhead
Indexes: Create indexes after loading large datasets, not before

Troubleshooting

Connection Issues

If you encounter connection errors:

Ensure the driver is installed with dbc install <driver-name> (e.g., dbc install duckdb)
Check your connection parameters match the driver's requirements
Test the connection separately using the ADBC Python API

Schema Errors

If you see schema-related errors:

Ensure your Singer schema is valid JSON Schema
Check for unsupported data types
Set default_target_schema for databases that require it (PostgreSQL)

Performance Issues

If loading is slow:

Increase batch_size in your config
Disable metadata columns with "add_record_metadata": false
Use overwrite_behavior: "replace" instead of truncating manually

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Resources

Acknowledgements

Built with:

Singer SDK by Meltano
Apache Arrow ADBC by the Apache Arrow community

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
examples		examples
plugins/extractors		plugins/extractors
target_adbc		target_adbc
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
FAILING_TESTS.md		FAILING_TESTS.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
catalog.json		catalog.json
dbc.lock		dbc.lock
dbc.toml		dbc.toml
meltano.yml		meltano.yml
pyproject.toml		pyproject.toml
test_duckdb_config.json		test_duckdb_config.json
test_fail_config.json		test_fail_config.json
test_replace_config.json		test_replace_config.json
uv.lock		uv.lock

Uh oh!

License

edgarrmondragon/target-adbc

Folders and files

Latest commit

History

Repository files navigation

target-adbc

Features

Installation

Supported Databases

Installing ADBC Drivers

Usage

Basic Example

Configuration

DuckDB Example

SQLite Example

PostgreSQL Example

Configuration Options

Overwrite Behaviors

Metadata Columns

Using with Meltano

Install ADBC Drivers First

Add to meltano.yml

Run with Meltano

Examples

Example 1: Load CSV to DuckDB

Example 2: Stream Data from API to PostgreSQL

Development

Setup

Testing

Architecture

Data Flow

Singer Specification Compliance

Performance Tips

Troubleshooting

Connection Issues

Schema Errors

Performance Issues

Contributing

License

Resources

Acknowledgements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages