Merge pull request #77 from pygod-team/dev

major refactor for 0.4.0
pygod-team · May 12, 2023 · 4d9b473 · 4d9b473
2 parents d72fec7 + 2334b7c
commit 4d9b473
Show file tree

Hide file tree

Showing 122 changed files with 7,349 additions and 9,182 deletions.
diff --git a/.github/workflows/testing-cron.yml b/.github/workflows/testing-cron.yml
@@ -16,7 +16,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
-        python-version: ["3.7", "3.8", "3.9", "3.10"]
+        python-version: ["3.8", "3.9", "3.10"]
 
     steps:
     - uses: actions/checkout@v3
@@ -28,8 +28,9 @@ jobs:
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip
-        pip install -r requirements_ci.txt
-        pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cpu.html
+        pip install torch --index-url https://download.pytorch.org/whl/cpu
+        pip install torch_geometric
+        pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cpu.html
         pip install pytest
         pip install coverage
         pip install coveralls

diff --git a/.github/workflows/testing.yml b/.github/workflows/testing.yml
@@ -21,7 +21,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
-        python-version: ["3.7", "3.8", "3.9", "3.10"]
+        python-version: ["3.8", "3.9", "3.10"]
 
     steps:
     - uses: actions/checkout@v3
@@ -33,8 +33,9 @@ jobs:
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip
-        pip install -r requirements_ci.txt
-        pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cpu.html
+        pip install torch --index-url https://download.pytorch.org/whl/cpu
+        pip install torch_geometric
+        pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cpu.html
         pip install pytest
         pip install coverage
         pip install coveralls

diff --git a/.gitignore b/.gitignore
@@ -82,6 +82,8 @@ instance/
 # Sphinx documentation
 docs/_build/
 docs/tutorials/
+docs/html/
+generated/
 
 # PyBuilder
 .pybuilder/

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -3,7 +3,11 @@ version: 2
 sphinx:
   configuration: docs/conf.py
 
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.8"
+
 python:
-  version: 3.8
   install:
     - requirements: docs/requirements.txt
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -1,7 +1,7 @@
 Contribute to PyGOD
 ===================
 
-This guide will tell how to contribute to pyGOD at the beginning stage.
+This guide will tell how to contribute to PyGOD at the beginning stage.
 This guide may change subject to the development process.
 
 
@@ -45,26 +45,8 @@ Development Environment
 
 To prevent the problems induced by inconsistent versions of dependencies, following requirements are suggested.
 
-- python>=3.6
-- torch>=1.10.1
-- torch_geometry>=2.0.3
+- python>=3.8
+- torch>=2.0.0
+- torch_geometry>=2.3.0
 
 Please follow the `installation guide <https://docs.pygod.org/en/latest/install.html>`_ for more details.
-
-
-Contributing New Models
------------------------
-
-To contribute a new model, simply
-
-1. Make a new file with the name of your model (say ``awesome-gnn.py``) within the directory ``pygod/models``.
-
-2. Populate it with your work, a minimal example file to demonstrate its effectiveness, such like `dominant example <https://docs.pygod.org/en/latest/tutorials/intro.html#sphx-glr-tutorials-intro-py>`_.
-
-3. Add a corresponding test file. See `test repo <https://github.com/pygod-team/pygod/tree/main/pygod/test>`_ for example.
-
-4. Run the entire test folder to make sure nothing is broken locally.
-
-5. Make a pull request once you are done to the **dev branch**. Brief explain your development.
-
-6. We will review your PR if the tests are successful :)
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 BSD 2-Clause License
 
-Copyright (c) 2021, pygod-team
+Copyright (c) 2023, pygod-team
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without

diff --git a/README.rst b/README.rst
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -1,6 +1,6 @@
 # PyGOD Benchmark
 
-Official implementation of paper [BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs](https://arxiv.org/abs/2206.10071). Our datasets are publicly available in the [data repository](https://github.com/pygod-team/data). **Please star, watch, and fork us for the active updates!**
+Official implementation of paper [BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs](https://proceedings.neurips.cc/paper_files/paper/2022/hash/acc1ec4a9c780006c9aafd595104816b-Abstract-Datasets_and_Benchmarks.html). Our datasets are publicly available in the [data repository](https://github.com/pygod-team/data). **Please star, watch, and fork us for the active updates!**
 
 ## Usage
 
@@ -52,21 +52,23 @@ optional arguments:
 
 For DGraph, we are not able to load the dataset automatically, because of the authors' restrictions. To reproduce the results, the dataset is publicly available [here](https://dgraph.xinye.com/dataset), and we detect the outliers on the whole graph and evaluate only on the test set. As for the GPU memory consumption experiments, we use pytorch_memlab to measure the peak of the active bytes. See [pytorch_memlab](https://github.com/Stonesjtu/pytorch_memlab) for more details.
 
-## Citing us
+## Cite us
 
-Our [paper](https://arxiv.org/abs/2206.10071) is available on arxiv. If you use PyGOD in a scientific publication, we would appreciate citations to the following paper:
+Our [benchmark paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/acc1ec4a9c780006c9aafd595104816b-Abstract-Datasets_and_Benchmarks.html) is publicly available. If you use BOND in a scientific publication, we would appreciate citations to the following paper:
 
 ```
-@article{liu2022bond,
-  author  = {Liu, Kay and Dou, Yingtong and Zhao, Yue and Ding, Xueying and Hu, Xiyang and Zhang, Ruitong and Ding, Kaize and Chen, Canyu and Peng, Hao and Shu, Kai and Sun, Lichao and Li, Jundong and Chen, George H. and Jia, Zhihao and Yu, Philip S.},
-  title   = {BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs},
-  journal = {arXiv preprint arXiv:2206.10071},
-  year    = {2022},
-}
+    @article{liu2022bond,
+      title={Bond: Benchmarking unsupervised outlier node detection on static attributed graphs},
+      author={Liu, Kay and Dou, Yingtong and Zhao, Yue and Ding, Xueying and Hu, Xiyang and Zhang, Ruitong and Ding, Kaize and Chen, Canyu and Peng, Hao and Shu, Kai and Sun, Lichao and Li, Jundong and Chen, George H. and Jia, Zhihao and Yu, Philip S.},
+      journal={Advances in Neural Information Processing Systems},
+      volume={35},
+      pages={27021--27035},
+      year={2022}
+    }
 ```
 
 or:
 
 ```
-Liu, K., Dou, Y., Zhao, Y., Ding, X., Hu, X., Zhang, R., Ding, K., Chen, C., Peng, H., Shu, K., Sun, L., Li, J., Chen, G.H., Jia, Z., and Yu, P.S. 2022. BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs. arXiv preprint arXiv:2206.10071.
+Liu, K., Dou, Y., Zhao, Y., Ding, X., Hu, X., Zhang, R., Ding, K., Chen, C., Peng, H., Shu, K. and Sun, L., Li, J., Chen, G.H., Jia, Z., and Yu, P.S. 2022. Bond: Benchmarking unsupervised outlier node detection on static attributed graphs. Advances in Neural Information Processing Systems, 35, pp.27021-27035.
 ```
diff --git a/benchmark/main.py b/benchmark/main.py
@@ -2,7 +2,7 @@
 import torch
 import argparse
 import warnings
-from pygod.metrics import *
+from pygod.metric import *
 from pygod.utils.utility import load_data
 from utils import init_model
 
@@ -24,7 +24,7 @@ def main(args):
         y = data.y.bool()
         k = sum(y)
 
-        if np.isnan(score).any():
+        if torch.isnan(score).any():
             warnings.warn('contains NaN, skip one trial.')
             continue
 
@@ -35,11 +35,15 @@ def main(args):
     print(args.dataset + " " + model.__class__.__name__ + " " +
           "AUC: {:.4f}±{:.4f} ({:.4f})\t"
           "AP: {:.4f}±{:.4f} ({:.4f})\t"
-          "Recall: {:.4f}±{:.4f} ({:.4f})".format(np.mean(auc), np.std(auc),
-                                                  np.max(auc), np.mean(ap),
-                                                  np.std(ap), np.max(ap),
-                                                  np.mean(rec), np.std(rec),
-                                                  np.max(rec)))
+          "Recall: {:.4f}±{:.4f} ({:.4f})".format(torch.mean(auc),
+                                                  torch.std(auc),
+                                                  torch.max(auc),
+                                                  torch.mean(ap),
+                                                  torch.std(ap),
+                                                  torch.max(ap),
+                                                  torch.mean(rec),
+                                                  torch.std(rec),
+                                                  torch.max(rec)))
 
 
 if __name__ == '__main__':

diff --git a/benchmark/time.py b/benchmark/time.py
@@ -4,9 +4,8 @@
 import shutil
 import argparse
 import warnings
-import numpy as np
 from utils import init_model
-from pygod.metrics import eval_roc_auc
+from pygod.metric import eval_roc_auc
 from pygod.utils import load_data
 from torch_geometric.utils import remove_isolated_nodes
 
@@ -37,7 +36,7 @@ def main(args):
         y = data.y.bool()[mask]
         auc = eval_roc_auc(y, score)
 
-        if np.isnan(score).any():
+        if torch.isnan(score).any():
             warnings.warn('contains NaN, skip one trial.')
             continue
 

diff --git a/benchmark/type.py b/benchmark/type.py
@@ -2,7 +2,7 @@
 import torch
 import argparse
 import warnings
-from pygod.metrics import *
+from pygod.metric import *
 from pygod.utils import load_data
 from utils import init_model
 
@@ -27,7 +27,7 @@ def main(args):
         ys = data.y >> 1 & 1
         kc, ks = sum(yc), sum(ys)
 
-        if np.isnan(score).any():
+        if torch.isnan(score).any():
             warnings.warn('contains NaN, skip one trial.')
             continue
 
@@ -44,12 +44,12 @@ def main(args):
           "AP: {:.4f}±{:.4f} ({:.4f})\tRecall: {:.4f}±{:.4f} ({:.4f})\n"
           "Structural: AUC: {:.4f}±{:.4f} ({:.4f})\t"
           "AP: {:.4f}±{:.4f} ({:.4f})\tRecall: {:.4f}±{:.4f} ({:.4f})"
-          .format(np.mean(aucc), np.std(aucc), np.max(aucc),
-                  np.mean(apc), np.std(apc), np.max(apc),
-                  np.mean(recc), np.std(recc), np.max(recc),
-                  np.mean(aucs), np.std(aucs), np.max(aucs),
-                  np.mean(aps), np.std(aps), np.max(aps),
-                  np.mean(recs), np.std(recs), np.max(recs)))
+          .format(torch.mean(aucc), torch.std(aucc), torch.max(aucc),
+                  torch.mean(apc), torch.std(apc), torch.max(apc),
+                  torch.mean(recc), torch.std(recc), torch.max(recc),
+                  torch.mean(aucs), torch.std(aucs), torch.max(aucs),
+                  torch.mean(aps), torch.std(aps), torch.max(aps),
+                  torch.mean(recs), torch.std(recs), torch.max(recs)))
 
 
 if __name__ == '__main__':

diff --git a/benchmark/utils.py b/benchmark/utils.py
@@ -1,6 +1,7 @@
 from random import choice
-from pygod.models import *
+from pygod.detector import *
 from pyod.models.lof import LOF
+from torch_geometric.nn import MLP
 from sklearn.ensemble import IsolationForest
 
 
@@ -103,14 +104,14 @@ def init_model(args):
                     batch_size=batch_size,
                     num_neigh=num_neigh)
     elif model_name == 'gcnae':
-        return GCNAE(hid_dim=choice(hid_dim),
-                     weight_decay=weight_decay,
-                     dropout=choice(dropout),
-                     lr=choice(lr),
-                     epoch=epoch,
-                     gpu=gpu,
-                     batch_size=batch_size,
-                     num_neigh=num_neigh)
+        return GAE(hid_dim=choice(hid_dim),
+                   weight_decay=weight_decay,
+                   dropout=choice(dropout),
+                   lr=choice(lr),
+                   epoch=epoch,
+                   gpu=gpu,
+                   batch_size=batch_size,
+                   num_neigh=num_neigh)
     elif model_name == 'guide':
         return GUIDE(a_hid=choice(hid_dim),
                      s_hid=choice([4, 5, 6]),
@@ -124,13 +125,14 @@ def init_model(args):
                      num_neigh=num_neigh,
                      cache_dir='./tmp')
     elif model_name == "mlpae":
-        return MLPAE(hid_dim=choice(hid_dim),
-                     weight_decay=weight_decay,
-                     dropout=choice(dropout),
-                     lr=choice(lr),
-                     epoch=epoch,
-                     gpu=gpu,
-                     batch_size=batch_size)
+        return GAE(hid_dim=choice(hid_dim),
+                   weight_decay=weight_decay,
+                   dropout=choice(dropout),
+                   lr=choice(lr),
+                   epoch=epoch,
+                   gpu=gpu,
+                   batch_size=batch_size,
+                   backbone=MLP)
     elif model_name == 'lof':
         return LOF()
     elif model_name == 'if':

diff --git a/docs/_templates/detector.rst b/docs/_templates/detector.rst
@@ -0,0 +1,15 @@
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+{{ name | underline}}
+
+{% if objname == "ANOMALOUS" or objname == "ONE" or objname == "Radar" or objname == "SCAN"%}
+.. autoclass:: {{ name }}
+    :show-inheritance:
+    :members: fit, predict
+{% else %}
+.. autoclass:: {{ name }}
+    :show-inheritance:
+    :members: fit, predict, emb
+{% endif %}
diff --git a/docs/_templates/nn.rst b/docs/_templates/nn.rst
@@ -0,0 +1,9 @@
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :show-inheritance:
+    :members: forward, loss_func, process_graph
diff --git a/docs/api_cc.rst b/docs/api_cc.rst
@@ -1,37 +1,40 @@
 API CheatSheet
 ==============
 
-The following APIs are applicable for all detector models for easy use.
+The following APIs are applicable for all detectors for easy use.
 
-* :func:`pygod.models.base.BaseDetector.fit`: Fit detector. y is ignored in unsupervised methods.
-* :func:`pygod.models.base.BaseDetector.decision_function`: Predict raw anomaly scores of PyG Graph G using the fitted detector
+* :func:`pygod.detector.Detector.fit`: Fit detector.
+* :func:`pygod.detector.Detector.decision_function`: Predict raw anomaly scores of PyG data using the fitted detector
 
-Key Attributes of a fitted model:
+Key Attributes of a fitted detector:
 
-* :attr:`pygod.models.base.BaseDetector.decision_scores_`: The outlier scores of the training data. The higher, the more abnormal.
-  Outliers tend to have higher scores.
-* :attr:`pygod.models.base.BaseDetector.labels_`: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
+* :attr:`pygod.detector.Detector.decision_score_`: The outlier scores of the input data. Outliers tend to have higher scores.
+* :attr:`pygod.detector.Detector.label_`: The binary labels of the input data. 0 stands for inliers and 1 for outliers.
 
 For the inductive setting:
 
-* :func:`pygod.models.base.BaseDetector.predict`: Predict if a particular sample is an outlier or not using the fitted detector.
-* :func:`pygod.models.base.BaseDetector.predict_proba`: Predict the probability of a sample being outlier using the fitted detector.
-* :func:`pygod.models.base.BaseDetector.predict_confidence`: Predict the model's sample-wise confidence (available in predict and predict_proba).
-
+* :func:`pygod.detector.Detector.predict`: Predict if a particular sample is an outlier or not using the fitted detector.
 
 **Input of PyGOD**: Please pass in a `PyTorch Geometric (PyG) <https://www.pyg.org/>`_ data object.
 See `PyG data processing examples <https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#data-handling-of-graphs>`_.
 
-* :func:`pygod.models.base.BaseDetector.process_graph` (you do not need to call this explicitly): Process the raw PyG data object into a tuple of sub data objects needed for the underlying model.
-
 
-See base class definition below:
+Base Detector
+-------------
 
-pygod.models.base module
-------------------------
+``Detector`` is the abstract class for all detectors:
 
-.. automodule:: pygod.models.base
+.. autoclass:: pygod.detector.Detector
     :members:
     :undoc-members:
     :show-inheritance:
-    :inherited-members:
+    :inherited-members:
+
+Deep Detector
+-------------
+
+By inherit ``Detector`` class, we also provide base deep detector class for deep learning based detectors to ease the implementation.
+
+.. autoclass:: pygod.detector.DeepDetector
+    :members: emb, init_model, forward_model, process_graph
+    :undoc-members: fit, decision_function, predict