Skip to content

Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.

Notifications You must be signed in to change notification settings

flynn3103/loadhouse-toolkit

Repository files navigation

Loadhouse

A powerful ETL (Extract, Transform, Load) tool designed for data lakehouse architectures with JSON-based configuration.

Overview

Loadhouse is a flexible data processing tool that simplifies ETL operations through JSON configuration. It supports various data sources and provides robust data transformation capabilities using Apache Spark.

Features

  • Configurable Data Sources

    • File-based (CSV, Delta, etc.)
    • JDBC connections
    • SQL queries
    • DataFrame operations
  • Data Transformations

    • Expression filtering
    • Custom transformations
    • Data quality validation
  • Multiple Output Formats

    • Delta Lake
    • File formats (CSV, Parquet, etc.)
    • Console output for debugging

About

Loading data into the Lakehouse using JSON configuration and utilities for ETL tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published