Releases: ubc-provenance/PIDSMaker
Release list
PIDSMaker 2.1.1 - KDD'26 paper
State of the PIDSMaker repo at the time of paper acceptance to KDD'26 D&B.
Includes an improvement to velox: removing x_is_tuple makes velox actually better in average
PIDSMaker 2.1.0
Release Notes
- Add datasets: Carbanak v2 and Atlas v2
- Add download script for datasets
- Fix Docker installation permission errors
- Fix wandb integration with new API key format by upgrading to [wandb 0.24.1](b69ad97)
- Refine MAGIC architecture to closer match the original paper
- Minor code formatting improvements
PIDSMaker 2.0.0
Changelog
Support for FIVEDIRECTIONS and TRACE datasets (E3/E5)
These new datasets can now be used in PIDSMaker.
Best hyperparameters
We provide the hyperparameters we found through grid-search tuning for the main PIDSs. Due to instability, those not guarantee good results but help users toward their hyperparameter tuning process.
Pipeline stages simplification (comes with breaking changes)
Instead of using tasks and subtasks (e.g., detection.gnn_training), we have removed the prefixes like detection, evaluation, preprocessing, etc. as those didn't bring any value to the framework.
The overall pipeline has been simplified to:
config/
├── orthrus.yml
├── kairos.yml
└── ...
pidsmaker/
├── main.py
│── config/
│ ├── config.py
│ └── pipeline.py
├── tasks/
│ ├── construction.py
│ ├── transformation.py
│ ├── featurization.py
│ ├── batching.py
│ ├── training.py
│ ├── evaluation.py
│ └── triage.pyRenaming of argument paths:
| Before | After |
|---|---|
preprocessing.build_graphs |
construction |
preprocessing.transformation |
transformation |
featurization.feat_training |
featurization |
detection.graph_preprocessing |
batching |
detection.gnn_training |
training |
detection.evaluation |
evaluation |
detection.triage |
triage |
Docstring
Some docs has been added within the code for better understanding.
Docs
Added details to the docs, notably the pipeline, provenance basics, instability.
Also changed the logo.
Fixes
We fixed a parsing error (#28), and an important error on optc datasets (#22)
PIDSMaker 1.0.1
PIDSMaker 1.0.1
Changelog
Fix non-determinism in graph construction and in word2vec (used by Orthrus and Velox)
Before this release, running two times the same system with exact same config could lead to different results.
The first reason was that we applied sorting in build_default_graphs on a few edges with same timestamps and different attributes.
Sorting with collisions is non deterministic so some edges could be swapped, leading to radical changes in accuracy after multiple epochs of training. This kind of instability is indeed the most important limitation of current architectures, which should be fixed in future research.
Regarding word2vec, we were using num_workers>1 before this release, which led to non-determinism due to multiprocessing orchestration. Setting only one worker makes the embedding training deterministic.
Add support for the reapr labels for the E3-CADETS and E3-THEIA datasets
This release adds the reapr labels.
It will now be possible on the E3-CADETS and E3-THEIA datasets to use those labels instead of those from Orthrus.
Support for installation with Apptainer
Installation is now also possible with Apptainer (ex Singularity) instead of Docker.
Add dataset preprocessing scripts
We now provide the scripts that preprocess the raw DARPA files into the postgres databases for transparency and supporting integration of new datasets.
Remove tuned configurations
The --tuned yml files for existing PIDSs were obsolete due to the instability and non-determinism present in the framework. We remove them to make it more clear that anyone must eventually run hyperparameter tuning on its end to get an optimal model.
README update
We add more concrete examples of how using PIDSMaker to update existing systems in the README.