AutoThink

Throw any data, get a working model.

One-click AutoML for tabular data.
Auto-detects task type • Engineers features • Trains LightGBM + XGBoost + CatBoost • Optimizes blend weights
All in a single function call.

Quickstart

pip install autothink

import pandas as pd
from autothink import fit

df = pd.read_csv("train.csv")
model = fit(df, target="price")
predictions = model.predict(pd.read_csv("test.csv"))

That's it. Three lines.

How It Works

Step	What happens
Task detection	Determines binary, multiclass, or regression from the target column
Data validation	Checks for leakage, class imbalance, and quality issues
Preprocessing	Handles missing values, one-hot / target-encodes categoricals, scales numerics
Feature engineering	Learns optimal split thresholds and feature interactions from data
Ensemble training	Trains LightGBM, XGBoost, and CatBoost with adaptive hyperparameters
Blend optimization	Finds optimal ensemble weights via scipy on out-of-fold predictions
Calibration	Platt scaling for well-calibrated probabilities
Verification	Post-training diagnostics: fold variance, leakage, feature importance

Installation

From PyPI:

pip install autothink

From source:

git clone https://github.com/ranausmanai/autothink.git
cd autothink
pip install -e .

With optional extras:

pip install autothink[dev]   # pytest
pip install autothink[api]   # FastAPI serving
pip install autothink[onnx]  # ONNX export

API Reference

`fit(df, target, **kwargs)`

One-line AutoML. Returns a fitted AutoThinkV4 instance.

Parameter	Type	Default	Description
`df`	`DataFrame`	required	Training data (features + target)
`target`	`str`	required	Name of the target column
`time_budget`	`int`	`600`	Maximum training time in seconds
`verbose`	`bool`	`True`	Log progress to console

`AutoThinkV4`

from autothink import AutoThinkV4

model = AutoThinkV4(time_budget=300, verbose=True)
model.fit(df, target_col="price")
preds = model.predict(test_df)

Attributes after fitting:

Attribute	Description
`model.cv_score`	Mean cross-validation score
`model.cv_std`	CV score standard deviation
`model.task_info`	Detected task type, metric, class info
`model.verification_report`	Post-training diagnostics

Logging

AutoThink uses Python's logging module. The library is silent by default.

import autothink
autothink.setup_logging()  # Enable INFO-level output to stderr

Or just use verbose=True (the default) which auto-configures a console handler.

Benchmarks

Benchmark run date: February 16, 2026.

Credible Matrix (3 seeds x 3 budgets)

Seeds: 42, 1337, 2025
Budgets: 10s, 30s, 60s
Total fits: 81 (3 datasets x 3 budgets x 3 seeds x 3 tools)
All runs completed: 81/81

Artifacts:

Raw runs: benchmarks/credible/benchmark_raw.csv
Aggregated summary: benchmarks/credible/benchmark_summary.csv
Full markdown report: benchmarks/credible/benchmark_report.md
Pareto chart (time vs quality): benchmarks/credible/pareto_by_budget.png

Winner Counts (by dataset-budget cell)

Tool	Wins (out of 9)
AutoThink V4	4
AutoGluon	3
FLAML	2

Detailed Means (+/-95% CI)

Budget	Dataset	AutoThink V4	FLAML	AutoGluon
10s	Heart (AUC ↑)	0.95299 +/- 0.00445 (9.32s)	0.95245 +/- 0.00596 (10.07s)	0.95245 +/- 0.00524 (7.95s)
10s	Loan (AUC ↑)	0.91236 +/- 0.01778 (16.30s)	0.90902 +/- 0.02191 (10.46s)	0.91165 +/- 0.01683 (10.49s)
10s	House (RMSE ↓)	31627.97 +/- 321.55 (11.39s)	30917.18 +/- 260.50 (10.26s)	30589.44 +/- 381.04 (6.57s)
30s	Heart (AUC ↑)	0.95299 +/- 0.00445 (9.71s)	0.95254 +/- 0.00518 (30.06s)	0.95245 +/- 0.00524 (8.78s)
30s	Loan (AUC ↑)	0.91236 +/- 0.01778 (15.18s)	0.91191 +/- 0.01884 (30.42s)	0.91165 +/- 0.01683 (11.23s)
30s	House (RMSE ↓)	31627.97 +/- 321.55 (11.49s)	30743.65 +/- 474.66 (30.99s)	30589.44 +/- 381.04 (6.76s)
60s	Heart (AUC ↑)	0.95299 +/- 0.00445 (11.20s)	0.95335 +/- 0.00478 (60.24s)	0.95245 +/- 0.00524 (10.02s)
60s	Loan (AUC ↑)	0.91236 +/- 0.01778 (14.46s)	0.91387 +/- 0.01779 (60.58s)	0.91165 +/- 0.01683 (10.97s)
60s	House (RMSE ↓)	31627.97 +/- 321.55 (11.64s)	30623.40 +/- 476.28 (61.98s)	30589.44 +/- 381.04 (6.76s)

_{Reproduce with: python benchmark_matrix.py --budgets 10,30,60 --seeds 42,1337,2025 --outdir benchmarks/credible}
_{Note: AutoGluon ran without FastAI extras (autogluon.tabular[fastai]), so optional NN models were skipped.}
_{Note: Some tools may slightly exceed nominal budget due to setup/cleanup and internal training loops.}

Examples

See the examples/ directory:

Example	Description
`quickstart.py`	Minimal 15-line fit/predict on sklearn data
`kaggle_competition.py`	Full Kaggle pipeline with CLI and submission output
`benchmark.py`	Compare AutoThink against FLAML

Project Structure

autothink/
  __init__.py            # Public API: fit(), setup_logging()
  core/
    autothink_v4.py      # Main engine (TaskDetector, IntelligentEnsemble, AutoThinkV4)
    autothink_v3.py      # V3 engine (Kaggle-optimized)
    autothink_v2.py      # V2 engine (meta-learning)
    preprocessing.py     # IntelligentPreprocessor, FeatureEngineer
    feature_engineering_general.py  # Adaptive, data-driven feature engineering
    validation.py        # DataValidator, LeakageDetector
    meta_learning.py     # MetaLearningDB, dataset fingerprinting
    production.py        # ModelExporter, ModelCard, DriftDetector, APIGenerator
    advanced.py          # CausalAutoML, ExplanationEngine, SmartEnsemble
    kaggle_beast.py      # Competition-grade ensemble mode
    kaggle_fast.py       # Fast Kaggle mode
tests/                   # 25 tests (pytest)
examples/                # Quickstart, Kaggle, benchmark

Contributing

Contributions are welcome! Please open an issue or submit a PR.

# Development setup
git clone https://github.com/ranausmanai/autothink.git
cd autothink
pip install -e ".[dev]"
pytest tests/

License

Apache 2.0 — see LICENSE.

_{Built with scikit-learn, LightGBM, XGBoost, and CatBoost.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoThink

Quickstart

How It Works

Installation

API Reference

`fit(df, target, **kwargs)`

`AutoThinkV4`

Logging

Benchmarks

Credible Matrix (3 seeds x 3 budgets)

Winner Counts (by dataset-budget cell)

Detailed Means (+/-95% CI)

Examples

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
assets		assets
autothink		autothink
benchmarks/credible		benchmarks/credible
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
benchmark_matrix.py		benchmark_matrix.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

AutoThink

Quickstart

How It Works

Installation

API Reference

fit(df, target, **kwargs)

AutoThinkV4

Logging

Benchmarks

Credible Matrix (3 seeds x 3 budgets)

Winner Counts (by dataset-budget cell)

Detailed Means (+/-95% CI)

Examples

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`fit(df, target, **kwargs)`

`AutoThinkV4`

Packages