Machine Learning Benchmarks

Machine Learning Benchmarks contains implementations of machine learning algorithms across data analytics frameworks. Scikit-learn_bench can be extended to add new frameworks and algorithms. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.

Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis. Here are our latest blogs:

Table of content

Prerequisites
How to create conda environment for benchmarking
How to enable daal4py patching for scikit-learn benchmarks
Running Python benchmarks with runner script
Supported algorithms
Algorithms parameters

How to create conda environment for benchmarking

Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.

scikit-learn

conda create -n bench -c intel python=3.7 scikit-learn daal4py pandas

daal4py

conda create -n bench -c intel python=3.7 scikit-learn daal4py pandas

cuml

conda create -n bench -c rapidsai -c conda-forge python=3.7 cuml pandas cudf

xgboost

conda create -n bench -c conda-forge python=3.7 xgboost pandas

Running Python benchmarks with runner script

Run python runner.py --configs configs/config_example.json [--output-file result.json --verbose INFO --report] to launch benchmarks.

runner options:

configs : configuration files paths
no-intel-optimized : use no intel optimized version. Now avalible for scikit-learn benchmarks. Default is intel-optimized version.
output-file: output file name for result benchmarks. Default is result.json
report: create an Excel report based on benchmarks results. Need library openpyxl.
dummy-run : run configuration parser and datasets generation without benchmarks running.
verbose : WARNING, INFO, DEBUG. print additional information during benchmarks running. Default is INFO

Level	Description
DEBUG	etailed information, typically of interest only when diagnosing problems. Usually at this level the logging output is so low level that it’s not useful to users who are not familiar with the software’s internals.
INFO	Confirmation that things are working as expected.
WARNING	An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

Benchmarks currently support the following frameworks:

scikit-learn
daal4py
cuml
xgboost

The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.

You can configure benchmarks by editing a config file. Check config.json schema for more details.

Benchmark supported algorithms

algorithm	benchmark name	sklearn	daal4py	cuml	xgboost
DBSCAN	dbscan	✅	✅	✅	❌
RandomForestClassifier	df_clfs	✅	✅	✅	❌
RandomForestRegressor	df_regr	✅	✅	✅	❌
pairwise_distances	distances	✅	✅	❌	❌
KMeans	kmeans	✅	✅	✅	❌
KNeighborsClassifier	knn_clsf	✅	❌	✅	❌
LinearRegression	linear	✅	✅	✅	❌
LogisticRegression	log_reg	✅	✅	✅	❌
PCA	pca	✅	✅	✅	❌
Ridge	ridge	✅	✅	✅	❌
SVM	svm	✅	✅	✅	❌
train_test_split	train_test_split	✅	❌	✅	❌
GradientBoostingClassifier	gbt	❌	❌	❌	✅
GradientBoostingRegressor	gbt	❌	❌	❌	✅

Algorithms parameters

You can launch benchmarks for each algorithm separately. To do this, go to the directory with the benchmark:

cd <framework>

Run the following command:

python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>

The list of supported parameters for each algorithm you can find here:

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.github		.github
configs		configs
cuml_bench		cuml_bench
daal4py_bench		daal4py_bench
datasets		datasets
modelbuilders_bench		modelbuilders_bench
report_generator		report_generator
sklearn_bench		sklearn_bench
xgboost_bench		xgboost_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
bench.py		bench.py
runner.py		runner.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Benchmarks

Follow us on Medium

Table of content

How to create conda environment for benchmarking

Running Python benchmarks with runner script

Benchmark supported algorithms

Algorithms parameters

About

Releases

Packages

Languages

License

vlad-nazarov/scikit-learn_bench

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Benchmarks

Follow us on Medium

Table of content

How to create conda environment for benchmarking

Running Python benchmarks with runner script

Benchmark supported algorithms

Algorithms parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages