daal4py - A Convenient Python API to the Intel(R) oneAPI Data Analytics Library

A simplified API to Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel(R) oneAPI Data Analytics Library for either direct usage or integration into one's own framework and extending this beyond by providing drop-in paching for scikit-learn.

Running full scikit-learn test suite with daal4py optimization patches:

when applied to scikit-learn from PyPi
when applied to build from master branch

👀 Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of daal4py. Here are our latest blogs:

🔗 Important links

💬 Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at [email protected]

🛠 Installation

daal4py is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.

# PyPi
pip install daal4py

# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install daal4py -c conda-forge

# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python)
conda install daal4py -c intel

[Click to expand] ℹ️ Supported configurations

📦 PyPi channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]
Windows	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]
OsX	[CPU]	[CPU]	[CPU]	❌

📦 Anaconda Cloud: Conda-Forge channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	[CPU]	[CPU]	[CPU]	[CPU]
Windows	[CPU]	[CPU]	[CPU]	[CPU]
OsX	❌	❌	❌	❌

📦 Anaconda Cloud: Intel channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	❌	[CPU, GPU]	❌	❌
Windows	❌	[CPU, GPU]	❌	❌
OsX	❌	[CPU]	❌	❌

You can build daal4py from sources as well.

⚡️ Get Started

Accelerate scikit-learn with the core functionality of daal4py without changing the code.

Intel CPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Intel CPU/GPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

🚀 Scikit-learn patching

Speedups of daal4py-powered Scikit-learn over the original Scikit-learn

Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10)

daal4py patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, daal4py fallbacks into stock Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.

⚠️ We support optimizations for the last four versions of scikit-learn. The latest release of daal4py-2021.1 supports scikit-learn 0.21.X, 0.22.X, 0.23.X and 0.24.X.

[Click to expand] 🔥 Applying the daal4py patch will impact the following existing scikit-learn algorithms:

Task	Functionality	Parameters support	Data support
Classification	SVC	All parameters except `kernel` = 'poly' and 'sigmoid'.	No limitations.
	RandomForestClassifier	All parameters except `warmstart` = True and `cpp_alpha` != 0, `criterion` != 'gini'.	Multi-output and sparse data is not supported.
	KNeighborsClassifier	All parameters except `metric` != 'euclidean' or `minkowski` with `p` = 2.	Multi-output and sparse data is not supported.
	LogisticRegression / LogisticRegressionCV	All parameters except `solver` != 'lbfgs' or 'newton-cg', `class_weight` != None, `sample_weight` != None.	Only dense data is supported.
Regression	RandomForestRegressor	All parameters except `warmstart` = True and `cpp_alpha` != 0, `criterion` != 'mse'.	Multi-output and sparse data is not supported.
	KNeighborsRegressor	All parameters except `metric` != 'euclidean' or `minkowski` with `p` = 2.	Sparse data is not supported.
	LinearRegression	All parameters except `normalize` != False and `sample_weight` != None.	Only dense data is supported, `#observations` should be >= `#features`.
	Ridge	All parameters except `normalize` != False, `solver` != 'auto' and `sample_weight` != None.	Only dense data is supported, `#observations` should be >= `#features`.
	ElasticNet	All parameters except `sample_weight` != None.	Multi-output and sparse data is not supported, `#observations` should be >= `#features`.
	Lasso	All parameters except `sample_weight` != None.	Multi-output and sparse data is not supported, `#observations` should be >= `#features`.
Clustering	KMeans	All parameters except `precompute_distances` and `sample_weight` != None.	No limitations.
	DBSCAN	All parameters except `metric` != 'euclidean' or `minkowski` with `p` = 2.	Only dense data is supported.
Dimensionality reduction	PCA	All parameters except `svd_solver` != 'full'.	No limitations.
	TSNE	All parameters except `metric` != 'euclidean' or `minkowski` with `p` = 2.	Sparse data is not supported.
Unsupervised	NearestNeighbors	All parameters except `metric` != 'euclidean' or `minkowski` with `p` = 2.	Sparse data is not supported.
Other	train_test_split	All parameters are supported.	Only dense data is supported.
	assert_all_finite	All parameters are supported.	Only dense data is supported.
	pairwise_distance	With `metric`='cosine' and 'correlation'.	Only dense data is supported.

Scenarios that are only available in the master branch (not released yet):

Task	Functionality	Parameters support	Data support
Other	roc_auc_score	Parameters `average`, `sample_weight`, `max_fpr` and `multi_class` are not supported.	No limitations.

📜 scikit-learn verbose

To find out which implementation of the algorithm is currently used (daal4py or stock Scikit-learn), set the environment variable:

On Linux and Mac OS: export IDP_SKLEARN_VERBOSE=INFO
On Windows: set IDP_SKLEARN_VERBOSE=INFO

For example, for DBSCAN you get one of these print statements depending on which implementation is used:

INFO: sklearn.cluster.DBSCAN.fit: uses Intel(R) oneAPI Data Analytics Library solver
INFO: sklearn.cluster.DBSCAN.fit: uses original Scikit-learn solver

Name		Name	Last commit message	Last commit date
Latest commit History 833 Commits
.ci		.ci
.circleci		.circleci
.github		.github
conda-recipe		conda-recipe
daal4py		daal4py
doc		doc
examples		examples
generator		generator
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.mergify.yml		.mergify.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
_clang-format		_clang-format
deselected_tests.yaml		deselected_tests.yaml
gen.py		gen.py
requirements-dev.txt		requirements-dev.txt
requirements-doc.txt		requirements-doc.txt
requirements-test.txt		requirements-test.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

daal4py - A Convenient Python API to the Intel(R) oneAPI Data Analytics Library

👀 Follow us on Medium

🔗 Important links

💬 Support

🛠 Installation

📦 PyPi channel

📦 Anaconda Cloud: Conda-Forge channel

📦 Anaconda Cloud: Intel channel

⚡️ Get Started

🚀 Scikit-learn patching

📜 scikit-learn verbose

About

Releases

Packages

Languages

License

vlad-nazarov/daal4py

Folders and files

Latest commit

History

Repository files navigation

daal4py - A Convenient Python API to the Intel(R) oneAPI Data Analytics Library

👀 Follow us on Medium

🔗 Important links

💬 Support

🛠 Installation

📦 PyPi channel

📦 Anaconda Cloud: Conda-Forge channel

📦 Anaconda Cloud: Intel channel

⚡️ Get Started

🚀 Scikit-learn patching

📜 scikit-learn verbose

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages