A simplified API to Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel(R) oneAPI Data Analytics Library for either direct usage or integration into one's own framework and extending this beyond by providing drop-in paching for scikit-learn.
Running full scikit-learn test suite with daal4py optimization patches:
We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of daal4py. Here are our latest blogs:
- From Hours to Minutes: 600x Faster SVM
- Improve the Performance of XGBoost and LightGBM Inference
- Accelerate Kaggle Challenges Using Intel AI Analytics Toolkit
- Accelerate Your scikit-learn Applications
- Accelerate Linear Models for Machine Learning
- Accelerate K-Means Clustering
- Documentation
- scikit-learn API and patching
- Building from Sources
- About Intel(R) oneAPI Data Analytics Library
Report issues, ask questions, and provide suggestions using:
You may reach out to project maintainers privately at [email protected]
daal4py is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.
# PyPi
pip install daal4py
# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install daal4py -c conda-forge
# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python)
conda install daal4py -c intel
[Click to expand] ℹ️ Supported configurations
OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
---|---|---|---|---|
Linux | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] |
Windows | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] |
OsX | [CPU] | [CPU] | [CPU] | ❌ |
OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
---|---|---|---|---|
Linux | [CPU] | [CPU] | [CPU] | [CPU] |
Windows | [CPU] | [CPU] | [CPU] | [CPU] |
OsX | ❌ | ❌ | ❌ | ❌ |
OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
---|---|---|---|---|
Linux | ❌ | [CPU, GPU] | ❌ | ❌ |
Windows | ❌ | [CPU, GPU] | ❌ | ❌ |
OsX | ❌ | [CPU] | ❌ | ❌ |
You can build daal4py from sources as well.
Accelerate scikit-learn with the core functionality of daal4py without changing the code.
Intel CPU optimizations patching
import numpy as np
from daal4py.sklearn import patch_sklearn
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
Intel CPU/GPU optimizations patching
import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
daal4py patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, daal4py fallbacks into stock Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.
[Click to expand] 🔥 Applying the daal4py patch will impact the following existing scikit-learn algorithms:
Task | Functionality | Parameters support | Data support |
---|---|---|---|
Classification | SVC | All parameters except kernel = 'poly' and 'sigmoid'. |
No limitations. |
RandomForestClassifier | All parameters except warmstart = True and cpp_alpha != 0, criterion != 'gini'. |
Multi-output and sparse data is not supported. | |
KNeighborsClassifier | All parameters except metric != 'euclidean' or minkowski with p = 2. |
Multi-output and sparse data is not supported. | |
LogisticRegression / LogisticRegressionCV | All parameters except solver != 'lbfgs' or 'newton-cg', class_weight != None, sample_weight != None. |
Only dense data is supported. | |
Regression | RandomForestRegressor | All parameters except warmstart = True and cpp_alpha != 0, criterion != 'mse'. |
Multi-output and sparse data is not supported. |
KNeighborsRegressor | All parameters except metric != 'euclidean' or minkowski with p = 2. |
Sparse data is not supported. | |
LinearRegression | All parameters except normalize != False and sample_weight != None. |
Only dense data is supported, #observations should be >= #features . |
|
Ridge | All parameters except normalize != False, solver != 'auto' and sample_weight != None. |
Only dense data is supported, #observations should be >= #features . |
|
ElasticNet | All parameters except sample_weight != None. |
Multi-output and sparse data is not supported, #observations should be >= #features . |
|
Lasso | All parameters except sample_weight != None. |
Multi-output and sparse data is not supported, #observations should be >= #features . |
|
Clustering | KMeans | All parameters except precompute_distances and sample_weight != None. |
No limitations. |
DBSCAN | All parameters except metric != 'euclidean' or minkowski with p = 2. |
Only dense data is supported. | |
Dimensionality reduction | PCA | All parameters except svd_solver != 'full'. |
No limitations. |
TSNE | All parameters except metric != 'euclidean' or minkowski with p = 2. |
Sparse data is not supported. | |
Unsupervised | NearestNeighbors | All parameters except metric != 'euclidean' or minkowski with p = 2. |
Sparse data is not supported. |
Other | train_test_split | All parameters are supported. | Only dense data is supported. |
assert_all_finite | All parameters are supported. | Only dense data is supported. | |
pairwise_distance | With metric ='cosine' and 'correlation'. |
Only dense data is supported. |
Scenarios that are only available in the master
branch (not released yet):
Task | Functionality | Parameters support | Data support |
---|---|---|---|
Other | roc_auc_score | Parameters average , sample_weight , max_fpr and multi_class are not supported. |
No limitations. |
To find out which implementation of the algorithm is currently used (daal4py or stock Scikit-learn), set the environment variable:
- On Linux and Mac OS:
export IDP_SKLEARN_VERBOSE=INFO
- On Windows:
set IDP_SKLEARN_VERBOSE=INFO
For example, for DBSCAN you get one of these print statements depending on which implementation is used:
INFO: sklearn.cluster.DBSCAN.fit: uses Intel(R) oneAPI Data Analytics Library solver
INFO: sklearn.cluster.DBSCAN.fit: uses original Scikit-learn solver