Skip to content

Latest commit

 

History

History
129 lines (98 loc) · 4.03 KB

README.md

File metadata and controls

129 lines (98 loc) · 4.03 KB

apriori-rs

Implementation of apriori algorithm for association rule mining. Written in Rust 🦀 with Python bindings.

Installation

First install Rust

curl https://sh.rustup.rs -sSf | sh -s -- -y

then install the package

pip install git+https://github.com/remykarem/apriori-rs.git

To compile the module yourself (macOS),

cargo rustc --release -- -C link-arg=-undefined -C link-arg=dynamic_lookup && mv target/release/libapriori.dylib ./apriori.so

Usage

Generating frequent itemsets

Prepare the data as a list of sets of strings.

>>> from apriori import generate_frequent_itemsets

>>> transactions = [
...    set(["bread", "milk", "cheese"]),
...    set(["bread", "milk"]),
...    set(["milk", "cheese", "bread"]),
...    set(["milk", "cheese", "bread"]),
...    set(["milk", "cheese", "yoghurt"]),
...    set(["milk", "bread"])]

Then

>>> itemsets, id2item = generate_frequent_itemsets(transactions, min_support=0.5, max_length=3)

>>> itemsets[1]
{frozenset({2}): 4, frozenset({0}): 5, frozenset({1}): 6}

>>> itemsets[2]
{frozenset({0, 1}): 5, frozenset({1, 2}): 4, frozenset({0, 2}): 3}

>>> itemsets[3]
{frozenset({0, 1, 2}): 3}
 
>>> id2item
{2: 'cheese', 0: 'bread', 3: 'yoghurt', 1: 'milk'}

Use generate_frequent_itemsets_id if your items are indices.

Association rules

>>> rules, counts = apriori(
...     transactions, 
...     min_support=0.3, 
...     min_confidence=0.2, 
...     max_length=3)
>>> rules
[{"cheese", "bread"} -> {"milk"},
 {"cheese"} -> {"milk"},
 {"bread"} -> {"milk"},
 {"milk"} -> {"bread"},
 {"milk", "cheese"} -> {"bread"},
 {"cheese"} -> {"bread", "milk"},
 {"cheese"} -> {"bread"},
 {"milk"} -> {"cheese"},
 {"bread", "milk"} -> {"cheese"},
 {"bread"} -> {"milk", "cheese"},
 {"bread"} -> {"cheese"},
 {"milk"} -> {"cheese", "bread"}]

Obtain confidence and lift for a rule.

>>> rules[0]
{"bread", "cheese"} -> {"milk"}

>>> rules[0].confidence
1.0

>>> rules[0].lift
1.0

Benchmarks

Time taken (s) to generate frequent itemsets for the Online Retail II dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/00502/) given minimum support and maximum length of itemset.

Min support, length apriori-rs efficient-apriori mlxtend apyori
0.100, 1 0.2s 0.1s 0.1s 0.29s
0.100, 2 0.2s 0.1s 0.1s 0.26s
0.100, 3 0.2s 0.1s 0.1s 0.25s
0.100, 4 0.2s 0.1s 0.1s 0.25s
0.100, 5 0.2s 0.1s 0.1s 0.25s
0.050, 1 0.2s 0.1s 0.1s 0.25s
0.050, 2 0.2s 0.2s 0.1s 0.25s
0.050, 3 0.2s 0.2s 0.1s 0.25s
0.050, 4 0.2s 0.2s 0.1s 0.25s
0.050, 5 0.2s 0.2s 0.2s 0.25s
0.010, 1 0.2s 0.1s 0.1s 0.32s
0.010, 2 16s 261s 73s 2.1s
0.010, 3 15s 272s 79s 2.3s
0.010, 4 17s 284s 78s 2.4s
0.010, 5 14s 279s 92s 2.4s
0.005, 1 0.2s 0.1s 0.1s 0.25s
0.005, 2 76s 1190s 327s 5.7s
0.005, 3 68s 1278s 643s 20s
0.005, 4 81s 1168s 638s 39s
0.005, 5 70s 1217s 643s 41s

Benchmark was carried out on macOS Big Sur (11.6); 2.7 GHz Quad-Core Intel Core i7. Python version 3.8.11.