Skip to content

jordandelbar/go-polars

Repository files navigation

go-polars

This project creates Go bindings for the Polars data manipulation library!

πŸ»β€β„οΈ What is Polars?

Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.

https://github.com/pola-rs/polars

πŸ“¦ Installation

Note

Build Process & Security Considerations

The GitHub Actions runners cannot compile the Polars Rust bindings due to resource constraints, so binaries are currently compiled on a local development machine. While this isn't ideal from a security perspective, we've implemented several measures to ensure transparency and verifiability:

  • πŸ” Reproducible builds: All build scripts are available in ./scripts for review
  • πŸ” Checksum verification: Each binary release includes SHA256 and MD5 checksums
  • πŸ“‹ Build transparency: Release notes include build environment details and dependency versions
  • πŸ—οΈ Self-compilation option: You can always build from source using ./build.sh

To verify a binary download:

# Download the checksum file and verify
sha256sum -c libpolars_go-linux-amd64-v0.1.0.so.sha256

Quick Start

For the easiest setup experience, use our setup script that downloads both the package and precompiled binary:

curl -sSL https://raw.githubusercontent.com/jordandelbar/go-polars/main/scripts/setup.sh | sh

This script will:

  • Download and set up the polars package in your project
  • Download the precompiled binary for your platform
  • Configure your Go module with the necessary replace directives
  • Create an example file to test your installation

Example

package main

import (
    "fmt"
    "github.com/jordandelbar/go-polars/polars"
)

func main() {
    df, err := polars.ReadCSV("data.csv")
    if err != nil {
        panic(err)
    }
    fmt.Println(df.String())
}

Pre-compiled Binaries

βœ… Available for:

  • Linux x86_64

🚧 Coming soon:

  • macOS x86_64 and ARM64
  • Windows x86_64

Alternative: Build from Source

If pre-compiled binaries aren't available for your platform:

Prerequisites:

  • Rust: Install from rustup.rs
  • Build tools: build-essential (Ubuntu) or equivalent
git clone https://github.com/jordandelbar/go-polars
cd go-polars
./build.sh

✨ Features

Expression Operations

go-polars supports a comprehensive set of expression operations for data manipulation:

Comparison Operations

  • Gt(value) - Greater than
  • Lt(value) - Less than
  • Eq(value) - Equal to
  • Ne(value) - Not equal to
  • Ge(value) - Greater than or equal to
  • Le(value) - Less than or equal to

Mathematical Operations

  • Add(expr) / AddValue(value) - Addition
  • Sub(expr) / SubValue(value) - Subtraction
  • Mul(expr) / MulValue(value) - Multiplication
  • Div(expr) / DivValue(value) - Division

Logical Operations

  • And(expr) - Logical AND
  • Or(expr) - Logical OR
  • Not() - Logical NOT

GroupBy and Aggregation Operations

go-polars provides powerful GroupBy functionality for data aggregation:

GroupBy Operations

  • GroupBy(columns...) - Group data by one or more columns
  • Count() - Count rows per group
  • Sum(column) - Sum values per group
  • Mean(column) - Calculate mean per group
  • Min(column) - Find minimum per group
  • Max(column) - Find maximum per group
  • Std(column) - Calculate standard deviation per group
  • Agg(expressions...) - Custom aggregations with multiple expressions

Aggregation Expressions

  • Col("column").Sum() - Sum aggregation expression
  • Col("column").Mean() - Mean aggregation expression
  • Col("column").Min() - Minimum aggregation expression
  • Col("column").Max() - Maximum aggregation expression
  • Col("column").Std() - Standard deviation aggregation expression
  • Count() - Count aggregation expression

Basic Usage Examples

import "github.com/jordandelbar/go-polars/polars"

// Load data
df, err := polars.ReadCSV("data.csv")

// Comparison operations
filtered := df.Filter(polars.Col("age").Gt(25))
equals := df.Filter(polars.Col("score").Eq(100))

// Mathematical operations
df = df.WithColumns(
    polars.Col("price").MulValue(1.1).Alias("price_with_tax"),
    polars.Col("length").Add(polars.Col("width")).Alias("perimeter"),
)

// Logical operations
complex := df.Filter(
    polars.Col("age").Gt(18).And(polars.Col("score").Ge(80)),
)

// Chaining operations
result := df.
    Filter(polars.Col("age").Gt(18).And(polars.Col("score").Ge(80))).
    WithColumns(polars.Col("salary").MulValue(1.05).Alias("new_salary")).
    Select(polars.Col("name"), polars.Col("new_salary"))

// GroupBy operations
groupedData := df.GroupBy("department")
countResult := groupedData.Count()
avgSalary := groupedData.Mean("salary")

// Complex aggregations
stats := df.GroupBy("department").Agg(
    polars.Col("salary").Mean().Alias("avg_salary"),
    polars.Col("salary").Max().Alias("max_salary"),
    polars.Col("salary").Min().Alias("min_salary"),
    polars.Count().Alias("employee_count"),
)

πŸš€ Examples & Quick Start

Basic Example

Get started with simple DataFrame operations:

make run-basic-example

Expression Example

Run the full-featured example with complex operations:

make run-expressions-example

GroupBy Example

Run the GroupBy and aggregation operations demo:

make run-groupby-example

Available Make Commands

  • make local-build - Build the library from source (smart build)
  • make force-build - Force rebuild even if up to date
  • make quick-build - Smart build (only rebuilds if needed)
  • make run-basic-example - Run basic DataFrame demo
  • make run-expressions-example - Run expression operations demo
  • make run-groupby-example - Run GroupBy and aggregation demo
  • make run-all-examples - Run all examples

πŸ§ͺ Testing

# Run all tests
make test

# Quick test run
make test-short

# Test specific functionality
make test-groupby

# Performance benchmarks
make test-bench

# Generate coverage report
make test-coverage

# View coverage in browser
make view-coverage

# Development cycle (quick build + short tests)
make dev

πŸ“‹ To do

  • Join operations
  • Data type conversions: Cast()
  • Schema inspection
  • Null handling: IsNull(), IsNotNull(), FillNull()
  • Advanced Aggregations: Median(),...
  • Window functions
  • Pivot & Reshape options
  • Additional I/O Formats: ReadJSON(), WriteJSON(),...
  • When/Otherwise logic
  • Data Quality & Validation: IsEmpty(),...

🀝 Contributing

  1. Fork the repository
  2. Build locally: ./build.sh
  3. Test your changes: make test
  4. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.