A Singer target for loading data into ADBC-compatible databases.
ADBC (Arrow Database Connectivity) is a database access API that uses Apache Arrow for data interchange, providing efficient columnar data transfer between applications and databases.
- Universal Database Support: Works with any ADBC-compatible database driver (DuckDB, SQLite, PostgreSQL, etc.)
- High Performance: Uses Apache Arrow for efficient columnar data transfer
- Flexible Configuration: Supports various connection methods and driver-specific options
- Singer Specification Compliant: Fully compatible with the Singer ecosystem
- Metadata Tracking: Optional Stitch-style metadata columns for data lineage
# Clone the repository
git clone https://github.com/yourusername/target-adbc.git
cd target-adbc
# Install target-adbc
uv tool install --editable ./target-adbc
# Install ADBC drivers (see "Supported Databases" section below)
# We recommend using dbc for driver management:
curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install duckdb # or sqlite, postgresql, etc.
# For development
uv sync --all-extrasAny database with an ADBC driver is supported. Popular options include:
| Database | Driver Name in Config | Install with dbc |
|---|---|---|
| DuckDB | duckdb |
dbc install duckdb |
| SQLite | sqlite |
dbc install sqlite |
| PostgreSQL | postgresql |
dbc install postgresql |
| Flight SQL | flightsql |
dbc install flightsql |
| Snowflake | snowflake |
dbc install snowflake |
Use dbc to install ADBC drivers. The dbc tool provides pre-built binaries and simplifies driver management across platforms, making it straightforward to get started without worrying about driver-specific package dependencies:
# Install dbc (one-time setup)
curl -LsSf https://dbc.columnar.tech/install.sh | sh
# Install the drivers you need
dbc install duckdb
dbc install postgresql
dbc install sqliteWhy dbc? The ADBC driver manager package (adbc-driver-manager) uses a single API regardless of the database you're connecting to. While users could install driver packages separately, dbc makes this experience seamless by handling driver manifests and pre-built binaries automatically.
See the ADBC documentation for a full list of available drivers.
The target accepts Singer messages on stdin and loads data into the configured database:
tap-something | target-adbc --config config.jsonCreate a config.json file with your database connection details:
{
"driver": "duckdb",
"duckdb": {
"path": "my_database.duckdb"
},
"batch_size": 10000,
"overwrite_behavior": "append"
}{
"driver": "sqlite",
"sqlite": {
"uri": "my_database.sqlite"
},
"batch_size": 5000
}{
"driver": "postgresql",
"postgresql": {
"uri": "postgresql://myuser:mypass@localhost:5432/mydb"
},
"default_target_schema": "public",
"batch_size": 10000
}| Setting | Required | Default | Description |
|---|---|---|---|
driver |
Yes | - | ADBC driver name (e.g., duckdb, sqlite, postgresql) |
duckdb.path |
No* | - | Path to DuckDB database file (required when driver is duckdb) |
sqlite.uri |
No* | - | Path to SQLite database file (required when driver is sqlite) |
postgresql.uri |
No* | - | PostgreSQL connection string (required when driver is postgresql) |
default_target_schema |
No | - | Default schema for tables |
table_prefix |
No | "" |
Prefix to add to all table names |
table_suffix |
No | "" |
Suffix to add to all table names |
overwrite_behavior |
No | append |
How to handle existing tables: append, replace, or fail |
batch_size |
No | 10000 |
Number of rows to process per batch |
add_record_metadata |
No | true |
Add metadata columns (_sdc_*) to tables |
varchar_length |
No | 255 |
Default VARCHAR length when not specified |
*Driver-specific settings are required based on the selected driver.
append(default): Add new data to existing tablesreplace: Drop and recreate tables before loadingfail: Raise an error if the table already exists
When add_record_metadata is enabled (default), the following columns are added:
_sdc_extracted_at: Timestamp when the record was extracted from the source_sdc_received_at: Timestamp when the record was received by the target_sdc_batched_at: Timestamp when the record was batched for loading_sdc_sequence: Sequence number for ordering_sdc_table_version: Table version number
Before using target-adbc with Meltano, install the required ADBC driver(s):
# Install dbc
curl -LsSf https://dbc.columnar.tech/install.sh | sh
# Install the driver(s) you need
dbc install duckdbAdd the target to your meltano.yml:
plugins:
loaders:
- name: target-adbc
namespace: target_adbc
pip_url: -e .
executable: target-adbc
settings:
- name: driver
kind: options
description: ADBC driver name (e.g., duckdb, sqlite, postgresql)
options:
- label: DuckDB
value: duckdb
- label: SQLite
value: sqlite
- label: PostgreSQL
value: postgresql
# DuckDB settings
- name: duckdb.path
kind: string
description: Path to the DuckDB database file
# SQLite settings
- name: sqlite.uri
kind: string
description: URI to the SQLite database file
# PostgreSQL settings
- name: postgresql.uri
kind: string
description: PostgreSQL connection string
# General settings
- name: default_target_schema
kind: string
description: Default schema to use for tables if not specified in stream name
- name: table_prefix
kind: string
description: Prefix to add to all table names
- name: table_suffix
kind: string
description: Suffix to add to all table names
- name: batch_size
kind: integer
description: Maximum number of rows to process in a single batch
- name: overwrite_behavior
kind: options
description: Behavior when table already exists
options:
- label: Append
value: append
- label: Replace
value: replace
- label: Fail
value: fail
- name: add_record_metadata
kind: boolean
description: Add metadata columns to the output tables
- name: varchar_length
kind: integer
description: Default length for VARCHAR columns when not specified
# Configured variant for DuckDB
- name: target-adbc-duckdb
inherit_from: target-adbc
config:
driver: duckdb
overwrite_behavior: replace
duckdb:
path: ${MELTANO_PROJECT_ROOT}/output/warehouse.duckdb
# Configured variant for SQLite
- name: target-adbc-sqlite
inherit_from: target-adbc
config:
driver: sqlite
overwrite_behavior: replace
sqlite:
uri: ${MELTANO_PROJECT_ROOT}/output/warehouse.sqlitemeltano run tap-something target-adbc# Create a simple tap
cat << 'EOF' > sample_data.jsonl
{"type": "SCHEMA", "stream": "users", "schema": {"properties": {"id": {"type": "integer"}, "name": {"type": "string"}, "email": {"type": "string"}}, "type": "object"}, "key_properties": ["id"]}
{"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "Alice", "email": "[email protected]"}}
{"type": "RECORD", "stream": "users", "record": {"id": 2, "name": "Bob", "email": "[email protected]"}}
EOF
# Create config
cat << 'EOF' > config.json
{
"driver": "duckdb",
"duckdb": {
"path": "users.duckdb"
}
}
EOF
# Load data
cat sample_data.jsonl | target-adbc --config config.json
# Query the result
duckdb users.duckdb -c "SELECT * FROM users"# Configure PostgreSQL target
cat << 'EOF' > pg_config.json
{
"driver": "postgresql",
"postgresql": {
"uri": "postgresql://postgres:secret@localhost:5432/analytics"
},
"default_target_schema": "raw_data",
"overwrite_behavior": "append"
}
EOF
# Run with any Singer tap
tap-github --config tap_config.json | target-adbc --config pg_config.json# Clone the repository
git clone https://github.com/yourusername/target-adbc
cd target-adbc
# Install in development mode
uv sync --all-extras
# Install ADBC drivers for testing
dbc install duckdb sqlite# Run tests
uv run pytest
# Run with coverage
uv run coverage run -m pytest
# Type checking
uv run mypy target_adbc tests
# Linting
uv run ruff check target_adbc tests
uv run ruff format target_adbc testsThe target follows the Singer specification and uses the Meltano SDK:
- Target (
target.py): Main entry point that orchestrates the data loading process - Sink (
sinks.py): Handles batch processing and ADBC interactions - Settings (
settings.py): Configuration schema and validation
Singer Messages � Target � Sink � ADBC Connection � Database
�
PyArrow Tables
The sink:
- Receives batches of records from the target
- Converts records to PyArrow tables using the Singer schema
- Uses ADBC's
adbc_ingestfor efficient bulk loading - Handles table creation, schema evolution, and error handling
This target implements the Singer specification:
- Accepts
SCHEMA,RECORD, andSTATEmessages - Outputs
STATEmessages for checkpoint management - Handles schema evolution
- Supports batch processing for performance
- Validates configuration
- Batch Size: Increase
batch_sizefor larger datasets (e.g., 50000-100000 rows) - Metadata: Disable
add_record_metadataif you don't need lineage tracking - Schema: Specify schema explicitly to avoid inference overhead
- Indexes: Create indexes after loading large datasets, not before
If you encounter connection errors:
- Ensure the driver is installed with
dbc install <driver-name>(e.g.,dbc install duckdb) - Check your connection parameters match the driver's requirements
- Test the connection separately using the ADBC Python API
If you see schema-related errors:
- Ensure your Singer schema is valid JSON Schema
- Check for unsupported data types
- Set
default_target_schemafor databases that require it (PostgreSQL)
If loading is slow:
- Increase
batch_sizein your config - Disable metadata columns with
"add_record_metadata": false - Use
overwrite_behavior: "replace"instead of truncating manually
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with:
- Singer SDK by Meltano
- Apache Arrow ADBC by the Apache Arrow community