Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Top-K Nearest Neighbors for Normalized Matrix Profile #592 #595

Merged
merged 453 commits into from
Nov 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
453 commits
Select commit Hold shift + click to select a range
4ec3c5a
Added test function to test TopK scrump in AB_join
NimaSarajpoor Jun 14, 2022
40132d4
Refactored
NimaSarajpoor Jun 16, 2022
b0132ca
Added definition of parameter k to docstring
NimaSarajpoor Jun 16, 2022
fdfdf07
Improved docstring
NimaSarajpoor Jun 16, 2022
7d9c76a
Removed trailing colon
NimaSarajpoor Jun 16, 2022
2674988
Cleaned code
NimaSarajpoor Jun 16, 2022
a1855a0
Avoided allocating new memory in inner for-loop
NimaSarajpoor Jun 16, 2022
5b561ff
Fixed typos
NimaSarajpoor Jun 16, 2022
3d02bf4
Improved comments
NimaSarajpoor Jun 17, 2022
551d223
Avoided allocating new memory in each iteration
NimaSarajpoor Jun 17, 2022
0de3a28
Same ndim in output regardless of value of k
NimaSarajpoor Jun 17, 2022
d1e95f6
Revised docstrings
NimaSarajpoor Jun 17, 2022
bfc4c8e
Enhanced function to perform shift left as well
NimaSarajpoor Jun 17, 2022
bbcb71f
Enhanced test function to test newly added functionality
NimaSarajpoor Jun 17, 2022
ec889b4
Fixed format
NimaSarajpoor Jun 17, 2022
f7ef962
Fixed format
NimaSarajpoor Jun 17, 2022
9288916
Removed/Renamed intermediate variables
NimaSarajpoor Jun 17, 2022
163a775
Renamed variable for the sake of consistency
NimaSarajpoor Jun 17, 2022
cf3748d
Avoided shape mismatch by reshaping ndarray
NimaSarajpoor Jun 17, 2022
467f4a3
Refactored
NimaSarajpoor Jun 17, 2022
d0f5956
Fixed comment
NimaSarajpoor Jun 17, 2022
80b8594
Refacored and Minor restructuring of lines
NimaSarajpoor Jun 17, 2022
33a96c6
Modified stimp after changing output shape in scrump
NimaSarajpoor Jun 17, 2022
41007f6
Add pragma no cover
NimaSarajpoor Jun 17, 2022
68efe20
Revised Docstrings
NimaSarajpoor Jun 17, 2022
7cbeae9
Fixed docstring
NimaSarajpoor Jun 17, 2022
2a38dbb
Revised docstring
NimaSarajpoor Jun 23, 2022
616332e
Removed unnecessary dangling else
NimaSarajpoor Jun 23, 2022
97a17ce
Removed unnecessary comment
NimaSarajpoor Jun 23, 2022
e55ee07
Revised structure of test function
NimaSarajpoor Jun 23, 2022
b177136
Replaced ravel with flatten to get copy of array
NimaSarajpoor Jun 23, 2022
fc7c210
Changed the type of input parameter and revised docstring
NimaSarajpoor Jun 23, 2022
7a4b46e
Update the value of parameter to match its type
NimaSarajpoor Jun 23, 2022
2622d13
Update the value of parameter to match its type
NimaSarajpoor Jun 23, 2022
fc3be79
Correct format
NimaSarajpoor Jun 23, 2022
6411b7a
Changed output structure of naive.scrump
NimaSarajpoor Jun 23, 2022
71d68c8
Correct format
NimaSarajpoor Jun 23, 2022
3e42343
Add test function for scrump_plus_plus for TopK
NimaSarajpoor Jun 23, 2022
e512a63
Add naive version to merge peason profiles
NimaSarajpoor Jun 24, 2022
382bda2
Add test function for merging pearson profiles
NimaSarajpoor Jun 24, 2022
b0a56f7
Corret format
NimaSarajpoor Jun 24, 2022
b5a4e15
Add performant function to merge pearson profiles
NimaSarajpoor Jun 24, 2022
d4d28fe
Optimize function
NimaSarajpoor Jun 24, 2022
99f2a57
Avoid creating new memory
NimaSarajpoor Jun 24, 2022
855c429
Improve docstring
NimaSarajpoor Jun 24, 2022
9e02bac
Refactored
NimaSarajpoor Jun 24, 2022
be44ab0
Avoid creating new memory in for-loop
NimaSarajpoor Jun 24, 2022
e0ad42a
Update test function
NimaSarajpoor Jun 24, 2022
33c2151
Revise function to make it parallelizable
NimaSarajpoor Jun 24, 2022
4a995d1
Full test and coverage in 1hr
NimaSarajpoor Jun 25, 2022
2c55c88
Revise docstrings
NimaSarajpoor Jun 25, 2022
07b83ab
Revise docstrings
NimaSarajpoor Jun 25, 2022
5be2cf6
Optimize function
NimaSarajpoor Jun 25, 2022
4896fe8
Optimize function
NimaSarajpoor Jun 25, 2022
d9a0a20
Rename variable to improve readability
NimaSarajpoor Jun 25, 2022
eabe0fb
Revise comments
NimaSarajpoor Jun 25, 2022
e298fd3
Improve comments and docstrings
NimaSarajpoor Jun 25, 2022
9afba6c
Correct naive implementation
NimaSarajpoor Jun 25, 2022
05945a0
Enhance naive function to support top matrix profile
NimaSarajpoor Jun 25, 2022
a72aeb7
Enhace performant function to support topk matrix profile
NimaSarajpoor Jun 25, 2022
c7ffdac
Update existing test functions
NimaSarajpoor Jun 25, 2022
34941c2
Correct format
NimaSarajpoor Jun 25, 2022
8cbe308
Fix shape of array
NimaSarajpoor Jun 25, 2022
2b5038d
Fix shape of array
NimaSarajpoor Jun 25, 2022
9cc800a
Add kind keyword for sorting
NimaSarajpoor Jun 25, 2022
5b0b3fe
Fix bugs
NimaSarajpoor Jun 25, 2022
0335294
Remove ineffective inner prange
NimaSarajpoor Jun 26, 2022
6c05e30
Temporarily added parameter k to avoid decorator failure
NimaSarajpoor Jun 26, 2022
23a54ba
Improve comments
NimaSarajpoor Jun 26, 2022
5af6ec0
Improve comments
NimaSarajpoor Jun 26, 2022
26cec6e
Improve docstring
NimaSarajpoor Jun 26, 2022
b177c84
Add KNN test function for stumpi
NimaSarajpoor Jun 26, 2022
c5d2345
Fix shape of output for KNN test
NimaSarajpoor Jun 26, 2022
cdc11c8
Full test and coverage 1 hr
NimaSarajpoor Jun 26, 2022
fa7fa4a
Avoid using searchsort when k is 1
NimaSarajpoor Jun 27, 2022
6a6757c
merge main and resolve conflict
NimaSarajpoor Jun 29, 2022
d41a2e9
Revise code according to top k matrix profile structure
NimaSarajpoor Jun 29, 2022
38f4c1d
Remove if condition
NimaSarajpoor Jun 30, 2022
e4fd875
Improve dosctrings
NimaSarajpoor Jul 1, 2022
0afb3ec
Avoid allocating new memory
NimaSarajpoor Jul 1, 2022
13da458
Avoid allocating new memory
NimaSarajpoor Jul 1, 2022
93b5708
Improve comments
NimaSarajpoor Jul 6, 2022
114c0cc
Remove numpy.where to avoid copying unchanged values
NimaSarajpoor Jul 6, 2022
719aefd
Remove unnecessary trailing colon
NimaSarajpoor Jul 6, 2022
528bf12
Replace negative np.inf with np.NINF
NimaSarajpoor Jul 6, 2022
ce58a59
delete a wrong file
NimaSarajpoor Jul 6, 2022
ba4986b
Avoid advance indexing by using chain slicing so it can be run by njit
NimaSarajpoor Jul 6, 2022
2f0f53c
Improve docstring
NimaSarajpoor Jul 6, 2022
6b49de8
Added gpu_searchsorted checks when GPUs unavailable
seanlaw Jul 6, 2022
0d1e482
Added error checks and pytest ignore warning
seanlaw Jul 6, 2022
9222ab6
Merge branch 'TopK_MatrixProfile' of https://github.com/NimaSarajpoor…
NimaSarajpoor Jul 6, 2022
f72ca7a
Improve docstrings
NimaSarajpoor Jul 7, 2022
af40906
minor changes in if-block and dosctring
NimaSarajpoor Jul 7, 2022
42ec617
Improve docstrings
NimaSarajpoor Jul 7, 2022
0d8f5de
Improve comments
NimaSarajpoor Jul 7, 2022
fe9c4db
minor changes
NimaSarajpoor Jul 7, 2022
9aba6d2
Correct format
NimaSarajpoor Jul 7, 2022
2565c91
Improve docstrings
NimaSarajpoor Jul 7, 2022
283c31e
Merge branch 'main' into TopK_MatrixProfile
NimaSarajpoor Jul 10, 2022
9012235
resolve conflicts and merge main
NimaSarajpoor Jul 10, 2022
0bd70aa
Merge branch 'main' into TopK_MatrixProfile
NimaSarajpoor Jul 10, 2022
a89e214
optimize functions
NimaSarajpoor Jul 10, 2022
9b845b1
Remove redundant import
NimaSarajpoor Jul 10, 2022
85f1226
minor change
NimaSarajpoor Jul 11, 2022
fb6ed07
Revise docstrings
NimaSarajpoor Jul 11, 2022
a2129c7
Merge branch 'main' into TopK_MatrixProfile
seanlaw Jul 11, 2022
9f7b6d8
Fixed black formatting after conflict resolution
seanlaw Jul 11, 2022
8da2920
Resolve conflicts and merge changes
NimaSarajpoor Jul 11, 2022
54d1d1f
Correct docstring
NimaSarajpoor Jul 11, 2022
6735167
Revise docstrings
NimaSarajpoor Jul 16, 2022
0112989
minor change
NimaSarajpoor Jul 16, 2022
ff322a0
Revise comments
NimaSarajpoor Jul 16, 2022
598fcf4
Avoid redundant allocation of memory
NimaSarajpoor Jul 16, 2022
54643e2
Revise docstrings and comments
NimaSarajpoor Jul 16, 2022
9433499
rename variables
NimaSarajpoor Jul 16, 2022
902d7ab
minor correction
NimaSarajpoor Jul 16, 2022
3c16d33
Fix indexing
NimaSarajpoor Jul 18, 2022
9bb8b16
Add new test function
NimaSarajpoor Jul 18, 2022
33c6112
Modify test function
NimaSarajpoor Jul 18, 2022
7993cc2
merge main and resolve conflicts
NimaSarajpoor Jul 18, 2022
30f4bcf
Avoid dumplicate in naive prescrump
NimaSarajpoor Jul 19, 2022
77e56f7
Add parameter assume_unique to handle duplicates
NimaSarajpoor Jul 19, 2022
7a93a7c
Add test function to test for duplicates in topk_merge
NimaSarajpoor Jul 19, 2022
fefcaa9
Add parameter assume_unique to performant merge_topk
NimaSarajpoor Jul 19, 2022
5e9c5fc
fix test function
NimaSarajpoor Jul 19, 2022
9685e44
Fix bug
NimaSarajpoor Jul 19, 2022
9dd452b
Revise prescrump to avoid duplicates
NimaSarajpoor Jul 19, 2022
3d68ae7
Avoid duplocates in scrump
NimaSarajpoor Jul 19, 2022
4c17119
Revise test function to consider new parameter
NimaSarajpoor Jul 19, 2022
2c662a9
Fix bug
NimaSarajpoor Jul 19, 2022
3a0f4da
Revise naive scrump to avoid duplicates
NimaSarajpoor Jul 19, 2022
d8728c9
Add comment
NimaSarajpoor Jul 19, 2022
561b428
minor optimization
NimaSarajpoor Jul 19, 2022
44b85a8
Correct style
NimaSarajpoor Jul 19, 2022
dbdc7c9
Correct style
NimaSarajpoor Jul 19, 2022
19129ab
increase threshold
NimaSarajpoor Jul 19, 2022
5d96bbd
Specifiy kind in sort
NimaSarajpoor Jul 19, 2022
d3a9b31
minor change
NimaSarajpoor Jul 19, 2022
970efc7
specify kind in sort
NimaSarajpoor Jul 20, 2022
cd7fe1a
minor changes
NimaSarajpoor Jul 20, 2022
6ca36d0
De-otpimize if condition
NimaSarajpoor Jul 20, 2022
b1baa76
merge so far solved conflicts
NimaSarajpoor Jul 20, 2022
5d930b2
Update scrump
NimaSarajpoor Jul 20, 2022
4b58765
minor changes
NimaSarajpoor Jul 20, 2022
aaa8ff7
add new test function
NimaSarajpoor Jul 20, 2022
6c8eddc
optimize if condition
NimaSarajpoor Jul 20, 2022
5bb6879
Give priority to PA in case of ties between IA and IB
NimaSarajpoor Jul 21, 2022
fc10e8a
Remove trailing colon
NimaSarajpoor Jul 21, 2022
ef1309b
update test function
NimaSarajpoor Jul 21, 2022
c2fe4d2
revise function to avoid adding new parameter
NimaSarajpoor Jul 21, 2022
99806a9
Update module scrump and improvee its readability
NimaSarajpoor Jul 21, 2022
b57c691
Fix syntax
NimaSarajpoor Jul 21, 2022
e499057
update test functions
NimaSarajpoor Jul 21, 2022
b811319
minor fix
NimaSarajpoor Jul 21, 2022
11ee8de
correct format
NimaSarajpoor Jul 21, 2022
7925119
Improve docstring
NimaSarajpoor Jul 23, 2022
c3b82dd
Avoid overlap while merging matrix profiles
NimaSarajpoor Jul 26, 2022
f073d6c
Add function to find overlapping values
NimaSarajpoor Jul 26, 2022
a514943
replace numpy function with our implementation
NimaSarajpoor Jul 26, 2022
edb62a2
Avoid unnecessary call of a function
NimaSarajpoor Jul 26, 2022
ec020e0
Revise docsting and comment
NimaSarajpoor Jul 26, 2022
7ab480e
Improve test function
NimaSarajpoor Jul 26, 2022
b2bc500
Remove comment
NimaSarajpoor Jul 26, 2022
c156530
Add test function to ensure duplicates are avoided
NimaSarajpoor Jul 26, 2022
5f1acae
Improve comments
NimaSarajpoor Jul 26, 2022
5f9c537
Enhance naive version to avoid duplicates while merging
NimaSarajpoor Jul 26, 2022
f37bc29
Add test function and revise naive version
NimaSarajpoor Jul 26, 2022
dc97a12
Improve code readability and comment
NimaSarajpoor Jul 26, 2022
fa340ba
Update top-k profile by getting insertion index
NimaSarajpoor Jul 28, 2022
d9a997d
Merge nested if statements into one
NimaSarajpoor Jul 28, 2022
a52564f
Remove blank lines
NimaSarajpoor Jul 28, 2022
526618c
Fix typo
NimaSarajpoor Jul 28, 2022
3607711
Improve comment
NimaSarajpoor Jul 28, 2022
1b19a45
Improve comments
NimaSarajpoor Jul 28, 2022
bba35e1
Improve docstring
NimaSarajpoor Jul 28, 2022
6d4d127
Remove unnecessary comments
NimaSarajpoor Jul 28, 2022
e531385
passing copy of variable as input
NimaSarajpoor Jul 28, 2022
8e28aeb
minor change in test functions
NimaSarajpoor Jul 28, 2022
be1d1e7
Correct style
NimaSarajpoor Jul 29, 2022
956fc31
Revise comment
NimaSarajpoor Jul 29, 2022
9b3daef
Remove comment
NimaSarajpoor Jul 29, 2022
6abd601
Revise comment
NimaSarajpoor Jul 29, 2022
680ed2a
Merge branch 'main' into TopK_MatrixProfile
NimaSarajpoor Jul 29, 2022
ff2c06c
Merge branch 'main' into TopK_MatrixProfile
seanlaw Aug 5, 2022
355c8e5
Fix format
NimaSarajpoor Aug 9, 2022
04685c7
Remove unnecessary newline
NimaSarajpoor Aug 9, 2022
ba7b6ca
Return 1D array for matrix profile when `k` is 1
NimaSarajpoor Aug 9, 2022
36a7fcb
Remove unnecessary flattening operatiton on array
NimaSarajpoor Aug 9, 2022
4f1b2dc
Fix comments
NimaSarajpoor Aug 9, 2022
aa52529
Make matrix profile and mp index 1D when k=1
NimaSarajpoor Aug 9, 2022
ab22972
Revise tests functions
NimaSarajpoor Aug 9, 2022
249d928
Improve Docstrings
NimaSarajpoor Aug 9, 2022
5e515c4
Make prescrump output 1D when k is one
NimaSarajpoor Aug 28, 2022
752a22c
minor change
NimaSarajpoor Aug 28, 2022
e1f49af
update test functions
NimaSarajpoor Aug 28, 2022
6e541ea
Modify merge_topk to support 1D input
NimaSarajpoor Aug 28, 2022
354d96f
minor change and fix conflict
NimaSarajpoor Aug 28, 2022
0bff1ae
Fix merge_topk
NimaSarajpoor Aug 28, 2022
39e5ea3
Fix shape of variables in test functions
NimaSarajpoor Aug 28, 2022
e9fd14c
Remove unnecessary flatten operation
NimaSarajpoor Aug 28, 2022
d385829
Update test function for case k=1
NimaSarajpoor Aug 28, 2022
90ab9e3
revise comment
NimaSarajpoor Aug 29, 2022
0b163eb
Avoid using return in the middle of code
NimaSarajpoor Aug 29, 2022
bf6df9b
Add new private function to get 2D ouput when k=1
NimaSarajpoor Aug 30, 2022
e6a05d6
Remove check for 1D in merge_topk
NimaSarajpoor Aug 30, 2022
fe905d2
Revise test functions
NimaSarajpoor Aug 30, 2022
8e8d48b
Revise docstring to provide description for 1D case
NimaSarajpoor Aug 30, 2022
3bebc47
Add overlap check in merge_topk with 1D input
NimaSarajpoor Aug 30, 2022
4fcf797
Add overlap check in 1D and revise docstring
NimaSarajpoor Aug 31, 2022
41097a7
Add separate test function for _merge_topk 1D case
NimaSarajpoor Aug 31, 2022
948d674
Add preprocessing function for prescrump
NimaSarajpoor Aug 31, 2022
391c97d
Update test function
NimaSarajpoor Aug 31, 2022
4d7cccf
fix missing argument
NimaSarajpoor Aug 31, 2022
e8814cf
Fix Docstring
NimaSarajpoor Aug 31, 2022
eb56346
Resolved Merge conflict
NimaSarajpoor Aug 31, 2022
3946915
Put back the missing decorator
NimaSarajpoor Aug 31, 2022
eff9ca4
Add preprocessing function in prescraamp
NimaSarajpoor Sep 1, 2022
666b93e
Revise naive function
NimaSarajpoor Sep 1, 2022
eee6d75
Fix value of imprecision in test functions
NimaSarajpoor Sep 1, 2022
27d229b
create overlaps randomly for test merge_topk in 1D case
NimaSarajpoor Sep 1, 2022
6449d4b
Merge main into this branch
NimaSarajpoor Sep 1, 2022
3b9d1de
Merge main and Resolve conflict
NimaSarajpoor Sep 1, 2022
03f19d8
Revise docstrings
NimaSarajpoor Sep 3, 2022
5907f8b
Merge branch 'main' into TopK_MatrixProfile
NimaSarajpoor Sep 14, 2022
34a5f2d
Merge branch 'TopK_MatrixProfile' of https://github.com/NimaSarajpoor…
NimaSarajpoor Sep 14, 2022
d35de3e
Fix docstrings
NimaSarajpoor Sep 14, 2022
dbf2524
merge main
NimaSarajpoor Oct 13, 2022
0c80852
minor changes
NimaSarajpoor Oct 13, 2022
2e3af6a
minor fix
NimaSarajpoor Oct 13, 2022
a646034
change variable name
NimaSarajpoor Oct 15, 2022
d6a0a3d
change variables names
NimaSarajpoor Oct 15, 2022
6ae95ec
convert attr to property attr to get 1D when k is 1
NimaSarajpoor Oct 15, 2022
73ebe40
avoid calling performant function in a naive function
NimaSarajpoor Oct 15, 2022
4719e2f
minor modification on z_norm functions
NimaSarajpoor Oct 15, 2022
308af69
merge from remote branch
NimaSarajpoor Oct 15, 2022
63b2828
fix function
NimaSarajpoor Oct 15, 2022
3787776
update local branch
NimaSarajpoor Oct 15, 2022
d1f3119
revise docstrings
NimaSarajpoor Oct 18, 2022
4a94c0e
change variable name
NimaSarajpoor Oct 18, 2022
34361f7
Relocate comment
NimaSarajpoor Oct 18, 2022
8d0258a
minor changes
NimaSarajpoor Oct 18, 2022
1b64959
Update branch
NimaSarajpoor Nov 6, 2022
428ef8c
pull latest changes and resolve conflict
NimaSarajpoor Nov 7, 2022
9ad63d3
update local branch
NimaSarajpoor Nov 8, 2022
abb4518
fix uint
NimaSarajpoor Nov 8, 2022
329889e
fixed uint
NimaSarajpoor Nov 8, 2022
6864f11
merge remote branch
NimaSarajpoor Nov 9, 2022
c0e9f74
fixed test function
NimaSarajpoor Nov 9, 2022
27c05c3
fixed calling function
NimaSarajpoor Nov 9, 2022
c45b8a4
Removed redundant return statement
NimaSarajpoor Nov 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion stumpy/aamp.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,8 @@ def _aamp(
return np.power(P[0, :, :], 1.0 / p), I[0, :, :]


def aamp(T_A, m, T_B=None, ignore_trivial=True, p=2.0):
def aamp(T_A, m, T_B=None, ignore_trivial=True, p=2.0, k=1):
NimaSarajpoor marked this conversation as resolved.
Show resolved Hide resolved
# function needs to be changed to return top-k matrix profile
"""
Compute the non-normalized (i.e., without z-normalization) matrix profile

Expand All @@ -282,6 +283,11 @@ def aamp(T_A, m, T_B=None, ignore_trivial=True, p=2.0):
p : float, default 2.0
The p-norm to apply for computing the Minkowski distance.

k : int, default 1
The number of top `k` smallest distances used to construct the matrix profile.
Note that this will increase the total computational time and memory usage
when k > 1.

Returns
-------
out : numpy.ndarray
Expand Down
8 changes: 7 additions & 1 deletion stumpy/aamped.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
logger = logging.getLogger(__name__)


def aamped(dask_client, T_A, m, T_B=None, ignore_trivial=True, p=2.0):
def aamped(dask_client, T_A, m, T_B=None, ignore_trivial=True, p=2.0, k=1):
NimaSarajpoor marked this conversation as resolved.
Show resolved Hide resolved
# function needs to be revised to return top-k matrix profile
"""
Compute the non-normalized (i.e., without z-normalization) matrix profile

Expand Down Expand Up @@ -46,6 +47,11 @@ def aamped(dask_client, T_A, m, T_B=None, ignore_trivial=True, p=2.0):
p : float, default 2.0
The p-norm to apply for computing the Minkowski distance.

k : int, default 1
The number of top `k` smallest distances used to construct the matrix profile.
Note that this will increase the total computational time and memory usage
when k > 1.

Returns
-------
out : numpy.ndarray
Expand Down
14 changes: 13 additions & 1 deletion stumpy/aampi.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@


class aampi:
# needs to be enhanced to support top-k matrix profile
"""
Compute an incremental non-normalized (i.e., without z-normalization) matrix profile
for streaming data
Expand All @@ -28,6 +29,11 @@ class aampi:
p : float, default 2.0
The p-norm to apply for computing the Minkowski distance.

k : int, default 1
The number of top `k` smallest distances used to construct the matrix profile.
Note that this will increase the total computational time and memory usage
when k > 1.

Attributes
----------
P_ : numpy.ndarray
Expand Down Expand Up @@ -62,7 +68,7 @@ class aampi:
Note that we have extended this algorithm for AB-joins as well.
"""

def __init__(self, T, m, egress=True, p=2.0):
def __init__(self, T, m, egress=True, p=2.0, k=1):
"""
Initialize the `stumpi` object

Expand All @@ -81,6 +87,12 @@ def __init__(self, T, m, egress=True, p=2.0):

p : float, default 2.0
The p-norm to apply for computing the Minkowski distance.


k : int, default 1
The number of top `k` smallest distances used to construct the matrix
profile. Note that this will increase the total computational time and
memory usage when k > 1.
"""
self._T = core._preprocess(T)
core.check_window_size(m, max_size=self._T.shape[-1])
Expand Down
216 changes: 214 additions & 2 deletions stumpy/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,20 @@ def _gpu_aamp_stimp_driver_not_found(*args, **kwargs): # pragma: no cover
driver_not_found()


def _gpu_searchsorted_left_driver_not_found(*args, **kwargs): # pragma: no cover
"""
Dummy function to raise CudaSupportError driver not found error.
"""
driver_not_found()


def _gpu_searchsorted_right_driver_not_found(*args, **kwargs): # pragma: no cover
"""
Dummy function to raise CudaSupportError driver not found error.
"""
driver_not_found()


def get_pkg_name(): # pragma: no cover
"""
Return package name.
Expand Down Expand Up @@ -240,7 +254,7 @@ def rolling_window(a, window):
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


def z_norm(a, axis=0):
def z_norm(a, axis=0, threshold=config.STUMPY_STDDEV_THRESHOLD):
NimaSarajpoor marked this conversation as resolved.
Show resolved Hide resolved
"""
Calculate the z-normalized input array `a` by subtracting the mean and
dividing by the standard deviation along a given axis.
Expand All @@ -253,13 +267,16 @@ def z_norm(a, axis=0):
axis : int, default 0
NumPy array axis

threshold : float, default to config.STUMPY_STDDEV_THRESHOLD
A non-nan std value being less than `threshold` will be replaced with 1.0

Returns
-------
output : numpy.ndarray
An array with z-normalized values computed along a specified axis.
"""
std = np.std(a, axis, keepdims=True)
std[std == 0] = 1
std[np.less(std, threshold, where=~np.isnan(std))] = 1.0

return (a - np.mean(a, axis, keepdims=True)) / std

Expand Down Expand Up @@ -2559,6 +2576,201 @@ def _select_P_ABBA_value(P_ABBA, k, custom_func=None):
return MPdist


@njit
def _merge_topk_PI(PA, PB, IA, IB):
NimaSarajpoor marked this conversation as resolved.
Show resolved Hide resolved
"""
Merge two top-k matrix profiles `PA` and `PB`, and update `PA` (in place).
When the inputs are 1D arrays, PA[i] is updated if it is greater than PB[i] and
IA[i] != IB[i]. In such case, PA[i] and IA[i] are replaced with PB[i] and IB[i],
respectively. (Note that it might happen that IA[i]==IB[i] but PA[i] != PB[i].
This situation can occur if there is slight imprecision in numerical calculations.
In that case, we do not update PA[i] and IA[i]. While updating PA[i] and IA[i]
is harmless in this case, we avoid doing that so to be consistent with the merging
process when the inputs are 2D arrays)
When the inputs are 2D arrays, we always prioritize the values of `PA` over the
values of `PB` in case of ties. (i.e., values from `PB` are always inserted to
the right of values from `PA`). Also, update `IA` accordingly. In case of
overlapping values between two arrays IA[i] and IB[i], the ones in IB[i] (and
their corresponding values in PB[i]) are ignored throughout the updating process
of IA[i] (and PA[i]).

Unlike `_merge_topk_ρI`, where `top-k` largest values are kept, this function
keeps `top-k` smallest values.

Parameters
----------
PA : numpy.ndarray
A (top-k) matrix profile where values in each row are sorted in ascending
order. `PA` must be 1- or 2-dimensional.

PB : numpy.ndarray
A (top-k) matrix profile where values in each row are sorted in ascending
order. `PB` must have the same shape as `PA`.

IA : numpy.ndarray
A (top-k) matrix profile indices corresponding to `PA`

IB : numpy.ndarray
A (top-k) matrix profile indices corresponding to `PB`

Returns
-------
None
"""
if PA.ndim == 1:
mask = (PB < PA) & (IB != IA)
PA[mask] = PB[mask]
IA[mask] = IB[mask]
else:
k = PA.shape[1]
tmp_P = np.empty(k, dtype=np.float64)
tmp_I = np.empty(k, dtype=np.int64)
for i in range(PA.shape[0]):
overlap = set(IB[i]).intersection(set(IA[i]))
aj, bj = 0, 0
idx = 0
# 2 * k iterations are required to traverse both A and B if needed.
for _ in range(2 * k):
if idx >= k:
break
if bj < k and PB[i, bj] < PA[i, aj]:
if IB[i, bj] not in overlap:
tmp_P[idx] = PB[i, bj]
tmp_I[idx] = IB[i, bj]
idx += 1
bj += 1
else:
tmp_P[idx] = PA[i, aj]
tmp_I[idx] = IA[i, aj]
idx += 1
aj += 1

PA[i] = tmp_P
IA[i] = tmp_I

NimaSarajpoor marked this conversation as resolved.
Show resolved Hide resolved

@njit
def _merge_topk_ρI(ρA, ρB, IA, IB):
"""
Merge two top-k pearson profiles `ρA` and `ρB`, and update `ρA` (in place).
When the inputs are 1D arrays, ρA[i] is updated if it is less than ρB[i] and
IA[i] != IB[i]. In such case, ρA[i] and IA[i] are replaced with ρB[i] and IB[i],
respectively. (Note that it might happen that IA[i]==IB[i] but ρA[i] != ρB[i].
This situation can occur if there is slight imprecision in numerical calculations.
In that case, we do not update ρA[i] and IA[i]. While updating ρA[i] and IA[i]
is harmless in this case, we avoid doing that so to be consistent with the merging
process when the inputs are 2D arrays)
When the inputs are 2D arrays, we always prioritize the values of `ρA` over
the values of `ρB` in case of ties. (i.e., values from `ρB` are always inserted
to the left of values from `ρA`). Also, update `IA` accordingly. In case of
overlapping values between two arrays IA[i] and IB[i], the ones in IB[i] (and
their corresponding values in ρB[i]) are ignored throughout the updating process
of IA[i] (and ρA[i]).

Unlike `_merge_topk_PI`, where `top-k` smallest values are kept, this function
keeps `top-k` largest values.

Parameters
----------
ρA : numpy.ndarray
A (top-k) pearson profile where values in each row are sorted in ascending
order. `ρA` must be 1- or 2-dimensional.

ρB : numpy.ndarray
A (top-k) pearson profile, where values in each row are sorted in ascending
order. `ρB` must have the same shape as `ρA`.

IA : numpy.ndarray
A (top-k) matrix profile indices corresponding to `ρA`

IB : numpy.ndarray
A (top-k) matrix profile indices corresponding to `ρB`

Returns
-------
None
"""
if ρA.ndim == 1:
mask = (ρB > ρA) & (IB != IA)
ρA[mask] = ρB[mask]
IA[mask] = IB[mask]
else:
k = ρA.shape[1]
tmp_ρ = np.empty(k, dtype=np.float64)
tmp_I = np.empty(k, dtype=np.int64)
last_idx = k - 1
for i in range(len(ρA)):
overlap = set(IB[i]).intersection(set(IA[i]))
aj, bj = last_idx, last_idx
idx = last_idx
# 2 * k iterations are required to traverse both A and B if needed.
for _ in range(2 * k):
if idx < 0:
break
if bj >= 0 and ρB[i, bj] > ρA[i, aj]:
if IB[i, bj] not in overlap:
tmp_ρ[idx] = ρB[i, bj]
tmp_I[idx] = IB[i, bj]
idx -= 1
bj -= 1
else:
tmp_ρ[idx] = ρA[i, aj]
tmp_I[idx] = IA[i, aj]
idx -= 1
aj -= 1

ρA[i] = tmp_ρ
IA[i] = tmp_I


@njit
def _shift_insert_at_index(a, idx, v, shift="right"):
"""
If `shift=right` (default), all elements in `a[idx:]` are shifted to the right by
one element, `v` in inserted at index `idx` and the last element is discarded.
If `shift=left`, all elements in `a[:idx]` are shifted to the left by one element,
`v` in inserted at index `idx-1`, and the first element is discarded. In both cases,
`a` is updated in place and its length remains unchanged.

Note that all unrecognized `shift` inputs will default to `shift=right`.


Parameters
----------
a: numpy.ndarray
A 1d array

idx: int
The index at which the value `v` should be inserted. This can be any
integer number from `0` to `len(a)`. When `idx=len(a)` and `shift="right"`,
OR when `idx=0` and `shift="left"`, then no change will occur on
the input array `a`.

v: float
The value that should be inserted into array `a` at index `idx`

shift: str, default "right"
The value that indicates whether the shifting of elements should be towards
the right or left. If `shift="right"` (default), all elements in `a[idx:]`
are shifted to the right by one element. If `shift="left"`, all elements
in `a[:idx]` are shifted to the left by one element.

Returns
-------
None
"""
if shift == "left":
if 0 < idx <= len(a):
a[: idx - 1] = a[1:idx]
# elements were shifted to the left, thus the insertion index becomes
# `idx-1`
a[idx - 1] = v
else:
if 0 <= idx < len(a):
a[idx + 1 :] = a[idx:-1]
a[idx] = v


def _check_P(P, threshold=1e-6):
"""
Check if the 1-dimensional matrix profile values are too small and
Expand Down
9 changes: 8 additions & 1 deletion stumpy/gpu_aamp.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,9 @@ def _gpu_aamp(
return profile_fname, indices_fname


def gpu_aamp(T_A, m, T_B=None, ignore_trivial=True, device_id=0, p=2.0):
def gpu_aamp(T_A, m, T_B=None, ignore_trivial=True, device_id=0, p=2.0, k=1):
# function needs to be revised to return (top-k) matrix profile and
# matrix profile indices
"""
Compute the non-normalized (i.e., without z-normalization) matrix profile with one
or more GPU devices
Expand Down Expand Up @@ -375,6 +377,11 @@ def gpu_aamp(T_A, m, T_B=None, ignore_trivial=True, device_id=0, p=2.0):
p : float, default 2.0
The p-norm to apply for computing the Minkowski distance.

k : int, default 1
The number of top `k` smallest distances used to construct the matrix profile.
Note that this will increase the total computational time and memory usage
when k > 1.

Returns
-------
out : numpy.ndarray
Expand Down
Loading