Skip to content

Commit e2f7924

Browse files
authored
Merge pull request #613 from yzhao062/development
V2.0.3
2 parents 64c7f05 + b919bac commit e2f7924

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+944
-74
lines changed

Diff for: CHANGES.txt

+3-1
Original file line numberDiff line numberDiff line change
@@ -196,4 +196,6 @@ v<2.0.2>, <07/01/2024> -- Add AE1SVM.
196196
v<2.0.2>, <07/04/2024> -- Moving from TF to Torch -- reimplement ALAD.
197197
v<2.0.2>, <07/04/2024> -- Moving from TF to Torch -- reimplement anogan.
198198
v<2.0.2>, <07/06/2024> -- Complete of removing all Tensorflow and Keras code.
199-
v<2.0.2>, <07/21/2024> -- Add DevNet.
199+
v<2.0.2>, <07/21/2024> -- Add DevNet.
200+
v<2.0.3>, <09/06/2024> -- Add Reject Option in Unsupervised Anomaly Detection (#605).
201+
v<2.0.3>, <12/20/2024> -- Massive documentation polish.

Diff for: README.rst

+37-10
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Python Outlier Detection (PyOD)
2-
===============================
1+
Python Outlier Detection (PyOD) V2
2+
==================================
33

44
**Deployment & Documentation & Stats & License**
55

@@ -57,26 +57,34 @@ Python Outlier Detection (PyOD)
5757
Read Me First
5858
^^^^^^^^^^^^^
5959

60-
Welcome to PyOD, a comprehensive but easy-to-use Python library for detecting anomalies in multivariate data. Whether you're tackling a small-scale project or large datasets, PyOD offers a range of algorithms to suit your needs.
60+
Welcome to PyOD, a well-developed and easy-to-use Python library for detecting anomalies in multivariate data. Whether you are working with a small-scale project or large datasets, PyOD provides a range of algorithms to fit your needs.
6161

62-
* **For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
62+
**PyOD Version 2 is now available** (`Paper <https://www.arxiv.org/abs/2412.12154>`_) [#Chen2024PyOD]_, featuring:
6363

64-
* **For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.
64+
* **Expanded Deep Learning Support**: Integrates 12 modern neural models into a single PyTorch-based framework, bringing the total number of outlier detection methods to 45.
65+
* **Enhanced Performance and Ease of Use**: Models are optimized for efficiency and consistent performance across different datasets.
66+
* **LLM-based Model Selection**: Automated model selection guided by a large language model reduces manual tuning and assists users who may have limited experience with outlier detection.
6567

66-
* **Performance Comparison & Datasets**: We have a 45-page, comprehensive `anomaly detection benchmark paper <https://openreview.net/forum?id=foA_SFQ9zo0>`_. The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.
68+
**Additional Resources**:
6769

68-
* **Learn more about anomaly detection** at `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_
70+
* **NLP Anomaly Detection**: `NLP-ADBench <https://github.com/datamllab/tods>`_ provides both NLP anonaly detection datasets and algorithms
71+
* **Time-series Outlier Detection**: `TODS <https://github.com/datamllab/tods>`_
72+
* **Graph Outlier Detection**: `PyGOD <https://pygod.org/>`_
73+
* **Performance Comparison & Datasets**: Our 45-page `anomaly detection benchmark paper <https://openreview.net/forum?id=foA_SFQ9zo0>`_ and `ADBench <https://github.com/Minqi824/ADBench>`_, comparing 30 algorithms on 57 datasets
74+
* **PyOD on Distributed Systems**: `PyOD on Databricks <https://www.databricks.com/blog/2023/03/13/unsupervised-outlier-detection-databricks.html>`_
75+
* **Learn More**: `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_
6976

70-
* **PyOD on Distributed Systems**: you can also run `PyOD on databricks <https://www.databricks.com/blog/2023/03/13/unsupervised-outlier-detection-databricks.html>`_.
77+
**Check out our latest research in 2025 on LLM-based anomaly detection** [#Yang2024ad]_: `AD-LLM: Benchmarking Large Language Models for Anomaly Detection <https://arxiv.org/abs/2412.11142>`_.
7178

7279
----
7380

81+
7482
About PyOD
7583
^^^^^^^^^^
7684

7785
PyOD, established in 2017, has become a go-to **Python library** for **detecting anomalous/outlying objects** in multivariate data. This exciting yet challenging field is commonly referred to as `Outlier Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.
7886

79-
PyOD includes more than 50 detection algorithms, from classical LOF (SIGMOD 2000) to the cutting-edge ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic research projects and commercial products with more than `22 million downloads <https://pepy.tech/project/pyod>`_. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_, `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.
87+
PyOD includes more than 50 detection algorithms, from classical LOF (SIGMOD 2000) to the cutting-edge ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic research projects and commercial products with more than `26 million downloads <https://pepy.tech/project/pyod>`_. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_, `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.
8088

8189
**PyOD is featured for**:
8290

@@ -106,7 +114,18 @@ Alternatively, explore `MetaOD <https://github.com/yzhao062/MetaOD>`_ for a data
106114

107115
**Citing PyOD**:
108116

109-
`PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in `Journal of Machine Learning Research (JMLR) <http://www.jmlr.org/>`_ (MLOSS track). If you use PyOD in a scientific publication, we would appreciate citations to the following paper::
117+
If you use PyOD in a scientific publication, we would appreciate citations to the following paper(s):
118+
119+
`PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection <https://arxiv.org/abs/2412.12154>`_ is available as a preprint. If you use PyOD in a scientific publication, we would appreciate citations to the following paper::
120+
121+
@article{zhao2024pyod2,
122+
author = {Chen, Sihan and Qian, Zhuangzhuang and Siu, Wingchun and Hu, Xingcan and Li, Jiaqi and Li, Shawn and Qin, Yuehan and Yang, Tiankai and Xiao, Zhuo and Ye, Wanghao and Zhang, Yichi and Dong, Yushun and Zhao, Yue},
123+
title = {PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection},
124+
journal = {arXiv preprint arXiv:2412.12154},
125+
year = {2024}
126+
}
127+
128+
`PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in `Journal of Machine Learning Research (JMLR) <http://www.jmlr.org/>`_ (MLOSS track).::
110129

111130
@article{zhao2019pyod,
112131
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
@@ -123,6 +142,7 @@ or::
123142

124143
Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.
125144

145+
126146
For a broader perspective on anomaly detection, see our NeurIPS papers `ADBench: Anomaly Detection Benchmark Paper <https://arxiv.org/abs/2206.09426>`_ and `ADGym: Design Choices for Deep Anomaly Detection <https://arxiv.org/abs/2309.15376>`_::
127147

128148
@article{han2022adbench,
@@ -211,6 +231,7 @@ The full API Reference is available at `PyOD Documentation <https://pyod.readthe
211231
* **predict(X)**: Determine whether a sample is an outlier or not as binary labels using the fitted detector.
212232
* **predict_proba(X)**: Estimate the probability of a sample being an outlier using the fitted detector.
213233
* **predict_confidence(X)**: Assess the model's confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.
234+
* **predict_with_rejection(X)**\ : Allow the detector to reject (i.e., abstain from making) highly uncertain predictions (output = -2) [#Perini2023Rejection]_.
214235

215236
**Key Attributes of a fitted model**:
216237

@@ -517,6 +538,8 @@ Reference
517538
518539
.. [#Cook1977Detection] Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics, 19(1), pp.15-18.
519540
541+
.. [#Chen2024PyOD] Chen, S., Qian, Z., Siu, W., Hu, X., Li, J., Li, S., Qin, Y., Yang, T., Xiao, Z., Ye, W. and Zhang, Y., 2024. PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection. arXiv preprint arXiv:2412.12154.
542+
520543
.. [#Fang2001Wrap] Fang, K.T. and Ma, C.X., 2001. Wrap-around L2-discrepancy of random sampling, Latin hypercube and uniform designs. Journal of complexity, 17(4), pp.608-624.
521544
522545
.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\ , pp.59-63.
@@ -567,6 +590,8 @@ Reference
567590
568591
.. [#Perini2020Quantifying] Perini, L., Vercruyssen, V., Davis, J. Quantifying the confidence of anomaly detectors in their example-wise predictions. In *Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)*, 2020.
569592
593+
.. [#Perini2023Rejection] Perini, L., Davis, J. Unsupervised anomaly detection with rejection. In *Proceedings of the Thirty-Seven Conference on Neural Information Processing Systems (NeurIPS)*, 2023.
594+
570595
.. [#Ramaswamy2000Efficient] Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. *ACM Sigmod Record*\ , 29(2), pp. 427-438.
571596
572597
.. [#Rousseeuw1999A] Rousseeuw, P.J. and Driessen, K.V., 1999. A fast algorithm for the minimum covariance determinant estimator. *Technometrics*\ , 41(3), pp.212-223.
@@ -587,6 +612,8 @@ Reference
587612
588613
.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.
589614
615+
.. [#Yang2024ad] Yang, T., Nian, Y., Li, S., Xu, R., Li, Y., Li, J., Xiao, Z., Hu, X., Rossi, R., Ding, K. and Hu, X., 2024. AD-LLM: Benchmarking Large Language Models for Anomaly Detection. arXiv preprint arXiv:2412.11142.
616+
590617
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
591618
592619
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.

Diff for: docs/conf.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,6 @@
5050
'sphinx.ext.imgmath',
5151
'sphinx.ext.viewcode',
5252
'sphinxcontrib.bibtex',
53-
# 'sphinx.ext.napoleon',
54-
# 'sphinx_rtd_theme',
5553
]
5654

5755
bibtex_bibfiles = ['zreferences.bib']
@@ -173,4 +171,7 @@
173171
# -- Options for intersphinx extension ---------------------------------------
174172

175173
# Example configuration for intersphinx: refer to the Python standard library.
176-
intersphinx_mapping = {'https://docs.python.org/': None}
174+
intersphinx_mapping = {
175+
'python': ('https://docs.python.org/3', None)
176+
}
177+

0 commit comments

Comments
 (0)