Here we share examples for various use cases of TabArena's code, a benchmarking framework for tabular data.
You can use TabArena for various benchmarking tasks, such as benchmarking TabArena's models on new data (including private offline data) or benchmarking your models on TabArena's data.
- Folder:
benchmarking/ - Use Cases:
run_quickstart_tabarena.py- Reproduce running LightGBM and RealMLP on three datasets from TabArenacustom_tabarena_model/- Implement your own model for TabArena and benchmark it on TabArena-Literun_get_tabarena_datasets_from_openml.py- Get the data used by TabArena from OpenML, without the TabArena frameworkrun_quickstart_tabarena_custom_datasets.py- Benchmarking models with TabArena on your custom (private) datasets.run_quickstart_tabarena_one_datasets.py- Benchmarking and evaluating models with TabArena on one dataset.run_beta_tabpfn_end_to_end.py- An example of how to load and compare new results artifacts using TabArena.
All models in TabArena are open-source and can be used directly on your own data. They are implemented in production-ready code and can be easily integrated into your ML pipelines.
- Folder:
running_tabarena_models/ - Use Cases:
run_default_model.py- Minimal example for running a default model (with cross-validation bagging).run_tuned_ensemble_model.py- Minimal example for running a tuned (+ ensembled) model.run_tabarena_realmlp.py- Simple example for running TabArena's version of RealMLP.run_autogluon_on_openml_task.py- Minimal example for running AutoGluon on any OpenML task.
The metadata generated by prior TabArena experiments can be used for more than just comparing new models. You can inspect the rich metadata we collected for each dataset and use it for insightful studies or meta-learning.
- Folder:
meta/ - Use Cases:
inspect_processed_data.py- Inspect the processed data from prior benchmarks.inspect_raw_data.py- Inspect the raw data from prior benchmarks.run_download_raw.py- Download all the raw data from prior benchmarks.
To inspect the results of TabArena benchmarks, we can use various plots and leaderboards. We share code to generate these visualizations from the raw benchmark results.
- Folder:
plots/ -
- Use Cases:
run_generate_main_leaderboard.py- Generate the main leaderboard from TabArena results.run_generate_paper_figures.py- Generate the figures used in the TabArena NeurIPS'2025 paper.run_plot_pareto_over_tuning_time.py- Generate the plots showing trade-offs of predictive performance and efficiency.
To locally reproduce individual configurations and compare with the TabArena results of
those configurations, refer to benchmarking/run_quickstart_tabarena.py.
To locally reproduce all tables and figures in the paper using the raw results data,
run plots/run_generate_paper_figures.py.
To locally generate the latest results leaderboard, run
plots/run_generate_main_leaderboard.py.