Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix #639 Top-K Nearest Neighbors to Matrix Profile (normalize=False) #714

Merged

Conversation

NimaSarajpoor
Copy link
Collaborator

@NimaSarajpoor NimaSarajpoor commented Nov 11, 2022

This PR addresses #639 . This is a follow up to PR #595.

We are going to add top-k support for the following modules:

  • aamp
  • aamped
  • gpu_aamp
  • scraamp
  • aampi

Issues Tracker (so that we do not forget)

  • In scraamp (and scrump), the following if-block is (probably?) never executed
if P_NORM[thread_idx, i, 0] == np.inf:  # pragma: no cover
            I[thread_idx, i, 0] = -1
            continue
  • In scraamp, we may do if-continue in the if block below to avoid computing the rest of code
if not T_A_subseq_isfinite[i]:  # pragma: no cover
            p_norm_profile[:] = np.inf
            continue # NEW

@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2022

Codecov Report

Base: 99.90% // Head: 99.90% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (f6cf8bc) compared to base (a6a82c2).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #714    +/-   ##
========================================
  Coverage   99.90%   99.90%            
========================================
  Files          80       80            
  Lines       12289    12719   +430     
========================================
+ Hits        12277    12707   +430     
  Misses         12       12            
Impacted Files Coverage Δ
stumpy/aamp.py 100.00% <100.00%> (ø)
stumpy/aamped.py 100.00% <100.00%> (ø)
stumpy/aampi.py 100.00% <100.00%> (ø)
stumpy/core.py 100.00% <100.00%> (ø)
stumpy/gpu_aamp.py 100.00% <100.00%> (ø)
stumpy/gpu_stump.py 100.00% <100.00%> (ø)
stumpy/scraamp.py 100.00% <100.00%> (ø)
stumpy/stump.py 100.00% <100.00%> (ø)
stumpy/stumpi.py 100.00% <100.00%> (ø)
tests/naive.py 100.00% <100.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Nov 11, 2022

@seanlaw
I realized that aamped was almost already changed when I was changing aamp. So, I tried to finalize both of them. (Nothing is new here! The changes are basically based on what we did in the Top-K normalized version).

After you review these changes and show the greenlight, I will push the next set of commits for another module.

(please ignore scraamp.py as its changes have not been finalized yet)

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor I quickly glanced over it but didn't see anything that stood out. I trust that there isn't much that will be tricky here but please draw my attention to specific parts if you want me to really turn on my "eagle eyes". Otherwise, I'm going to rely on the unit tests to guide me (of course, your changes to the naive.py could affect things).

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

. I trust that there isn't much that will be tricky here but please draw my attention to specific parts if you want me to really turn on my "eagle eyes"

Sure. I will keep that in mind.

Otherwise, I'm going to rely on the unit tests to guide me (of course, your changes to the naive.py could affect things).

I will try to be careful and I always open the normalized modules to make sure we are consistent in the changes we are going to make for top-k support. If I notice there is something unique in non-normalized version, I will let you know for sure.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Nov 12, 2022

@seanlaw
I've added top-k feature to scraamp. Again, all stuff is the same as the normalized matrix profile.


There is one small concern that I would like to share with you...

In some of the new test functions (related to top-k) in test_scraamp, we have three for-loops that are nested:

  • one for-loop changes param k (NEW)
  • one for-loop changes param p, p-norm
  • one for-loop changes param s

(e.g. see tests/test_scraamp::test_prescraamp_self_join_KNN)

Note that in the normalized version (i.e. test_scrump), we do not have param p, and thus the nested loop consists of two for-loop there.

I was wondering if this is okay with you? As an alternative approach, I can break down this nested for-loop as follows:

p = 2
# two for-loops to change `k` and `s` (similar to normalized test function)

p = 3
# two for-loops to change `k` and `s` (similar to normalized test function)

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor While the triple for-loop isn't my favorite, it seems okay. Please proceed

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
FYI: This PC does not have GPU. I need to push to check if it passes the tests. Sorry for the inconvenience.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 18, 2022

Up to you. What do you think? Minimally we need full coverage

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I prefer to write a test to be consistent with other functions in core.py. It can also help us make sure that that these two functions work properly (so, we do not need to be worried about them later if we face a bug)

@NimaSarajpoor
Copy link
Collaborator Author

I prefer to write a test to be consistent with other functions in core.py. It can also help us make sure that that these two functions work properly (so, we do not need to be worried about them later if we face a bug)

Apparently , we already had the test for _gpu_searchsorted, located in test_gpu_stump 😃. I just updated the test functions.

I will now start comparing the computing time.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
Before I forget, it might be worthwhile to improve top-k related docstrings by mentioning that:

"Each of the top-k neighbors of a sequence s is not the trivial match of s when ignore_trivial==True . However, it should be noted that the top-k neighbors themselves might be trivial match of each other regardless of the value of ignore_trivial".

Please ignore this suggestion if it makes the docstrings more confusing :)

@seanlaw
Copy link
Contributor

seanlaw commented Nov 20, 2022

Before I forget, it might be worthwhile to improve top-k related docstrings by mentioning that:

I get your point. How about this:

Note that for self-joins (i.e., ignore_trivial==True) the top-k neighbors for each subsequence is guaranteed to be located outside of the reference subsequence's exclusion zone. However, the top-k neighbors themselves may be located directly adjacent to/near each other so as long as they are all located outside of the reference subsequence's exclusion zone.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 20, 2022

Apparently , we already had the test for _gpu_searchsorted, located in test_gpu_stump 😃. I just updated the test functions.

Yeah, I was thinking about that the other day but wasn't in front of my computer to check. It makes sense though that we'll simply need to move the tests over. This is good. It means that we've been proactive and "Future Nima thanks past Nima for thinking ahead and making his life easier" 😄 You are developing good habits and you'll have confidence in your own development approach and you'll find that it is more and more rare for there to be a huge mistake/bug if you've protected yourself sufficiently. More importantly, it means that it'll be harder for other people to break things.

@NimaSarajpoor
Copy link
Collaborator Author

I get your point. How about this:

Note that for self-joins (i.e., ignore_trivial==True) the top-k neighbors for each subsequence is guaranteed to be located outside of the reference subsequence's exclusion zone. However, the top-k neighbors themselves may be located directly adjacent to/near each other so as long as they are all located outside of the reference subsequence's exclusion zone.

Yes... your suggestion is cleaner as it specifically talks about the case ignore_trivial==True

This is good. It means that we've been proactive and "Future Nima thanks past Nima for thinking ahead and making his life easier" 😄

😄

You are developing good habits and you'll have confidence in your own development approach and you'll find that it is more and more rare for there to be a huge mistake/bug if you've protected yourself sufficiently. More importantly, it means that it'll be harder for other people to break things.

True! Since we take care of things in the first place, we will face less issue. Even if we face one, it will be easier to resolve it :) 👍

@NimaSarajpoor
Copy link
Collaborator Author

sneak peek on computing time :)

aamp n=10_000, m=50 n=20_000, m=50 n=50_000, m=50
MAIN_BRANCH 0.8 3.1 20.2
TOP-K BRANCH (k==1) 0.84 3.2 19.9

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Nov 22, 2022

@seanlaw

In the following tables, length of subsequence, m, has been set to 50.

aamp: ok

aamp, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.002 0.13 3.03 13.05
top-k, k==1 0.013 0.21 3.37 13.5

aamped: ok

aamped, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.032 0.16 3.11 13.7
top-k, k==1 0.05 0.27 3.73 14.6

(!!!) gpu_aamp: NOT ok (see n=50k, 100k)

gpu_aamp, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.27 2.02 9.28 35.02
top-k, k==1 0.34 3.15 67.33 267.2

prescraamp: ok

prescraamp, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.002 0.07 1.99 8.63
top-k, k==1 0.006 0.12 2.22 9.39

scraamp_update: Not ok ?!

scraamp with two updates, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.005 0.05 0.34 0.88
top-k, k==1 0.014 0.15 0.62 1.36

aampi: ok

aampi with one updates, p=2(integer) n = 1000 n=10_000 n=50_000 n=100_000
main branch 0.011 0.21 3.43 14.12
top-k, k==1 0.022 0.3 3.87 14

I will investigate gpu_aamp. I will provide the performance code.

@NimaSarajpoor NimaSarajpoor force-pushed the TopK_MatrixProfile_Non_Normalized branch from 919ec76 to e58dffd Compare December 1, 2022 21:01
@NimaSarajpoor NimaSarajpoor force-pushed the TopK_MatrixProfile_Non_Normalized branch from 32810fb to 7d74c45 Compare December 2, 2022 04:37
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Dec 2, 2022

@seanlaw
I fixed gpu_aamp. I ran all the modules again and measured their computing time. It seems everything is okay. Also, you may want to pay attention to the last one for which I have provided a short note.


aamp

aamp; p=2 (int) n=1000 n=10_000 n=50_000 n=100_000
main 0.002 0.12 3.18 14.14
top-k (k == 1) 0.008 0.17 3.18 13.43
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)
stumpy.aamp(T, m, p=p, k=k)
lst = []
for _ in range(n_iter):
    tic = time.time()
    stumpy.aamp(T, m, p=p, k=k)
    toc = time.time()
    lst.append(toc - tic)

np.mean(lst)

aamped

aamped; p=2 (int) n=1000 n=10_000 n=50_000 n=100_000
main 0.03 0.16 3.21 14.47
top-k (k==1) 0.04 0.21 3.96 14.96
# performance  code
import numba
from dask.distributed import Client

seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)

with Client(n_workers=1, threads_per_worker=numba.config.NUMBA_NUM_THREADS) as client:
    stumpy.aamped(client, T, m, p=p, k=k)
    
    lst = []
    for _ in range(n_iter):
        tic = time.time()
        stumpy.aamped(client, T, m, p=p, k=k)
        toc = time.time()
        
        lst.append(toc - tic)

np.mean(lst)

gpu_aamp

gpu_aamp; p=2.0 (float) n=1000 n=10_000 n=50_000 n=100_000
main 0.28 1.91 9.84 34.02
top-k (k==1) 0.36 2.59 12.53 34.03
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2.0
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)
stumpy.gpu_aamp(T, m, p=p, k=k)
lst = []
for _ in range(n_iter):
    tic = time.time()
    stumpy.gpu_aamp(T, m, p=p, k=k)
    toc = time.time()
    lst.append(toc - tic)

np.mean(lst)

aampi(egress=True) with 5 updates

aampi; p=2 (int) n=1000 n=10_000 n=50_000 n=100_000
main 0.12 0.21 4 14.82
top-k (k==1) 0.019 0.27 3.65 13.54
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)

seed = 0
np.random.seed(seed)
t_vals = np.random.rand(5) # for 5 updates 

stream = stumpy.aampi(T, m, egress=True, p=p, k=k)
stream.update(t_vals[0])

lst = []
for _ in range(n_iter):
    tic = time.time()
    stream = stumpy.aampi(T, m, egress=True, p=p, k=k)
    for t in t_vals:
        stream.update(t)
    toc = time.time()

    lst.append(toc - tic)

np.mean(lst)

aampi(egress=False) with 5 updates

aampi; p=2 (int) n=1000 n=10_000 n=50_000 n=100_000
main 0.012 0.21 3.56 15.14
top-k (k==1) 0.024 0.33 3.9 14.89
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)

seed = 0
np.random.seed(seed)
t_vals = np.random.rand(5) # for 5 updates 

stream = stumpy.aampi(T, m, egress=False, p=p, k=k)
stream.update(t_vals[0])

lst = []
for _ in range(n_iter):
    tic = time.time()
    stream = stumpy.aampi(T, m, egress=False, p=p, k=k)
    for t in t_vals:
        stream.update(t)
    toc = time.time()

    lst.append(toc - tic)

np.mean(lst)

prescraamp

prescraamp; p=2 (int) n=1000 n=10_000 n=50_000 n=100_000
main 0.002 0.07 2.06 9.31
top-k (k==1) 0.006 0.12 2.21 9.19
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)

seed = 1
np.random.seed(seed)
prescraamp(T, m, p=p, k=k)

lst = []
for _ in range(n_iter):
    np.random.seed(seed)
    tic = time.time()
    prescraamp(T, m, p=p, k=k)
    toc = time.time()
    
    lst.append(toc - tic)

np.mean(lst)

scraamp (prescraamp==False) with 5 updates

scraamp; p=2 (int) 1000 10_000 50_000 100_000
main 0.01 0.11 0.73 1.96
top-k, k==1 0.03 0.3 1.6 3.54
top-k, k==1; (with prange in _aamp)
*see note below
0.01 0.07 0.47 1.27
# performance  code
seed = 0
np.random.seed(seed)
m = 50
p = 2
k = 1
n_iter = 3

n = 1000 # options: 1000, 10k, 50k, 100k
T = np.random.rand(n)

seed = 1
np.random.seed(seed)
approx = stumpy.scraamp(T, m, p=p, k=k)
approx.update()

lst = []
for _ in range(n_iter):
    np.random.seed(seed)
    tic = time.time()
    approx = stumpy.scraamp(T, m, p=p, k=k)
    for _ in range(5): # 5 updates
        approx.update()
    toc = time.time()
    
    lst.append(toc - tic)

np.mean(lst)

NOTE:
The method .update() in scraamp calls _aamp. To detect the source of overhead in top-k version, I replaced core._merge_topk_PI with prange in function _aamp. As you can see, the computing time has decreased. We can work on it in another PR.


I think everything is good :) Please feel free to review.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 2, 2022

I fixed gpu_aamp

Can you please describe what the issue was? I am curious if it was simply a bug or did we have to do something drastically different from the normalized version. My main concern is maintaining consistency across normalized and non-normalized methods.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 2, 2022

@NimaSarajpoor So far, everything looks good to me and nothing obvious stood out. Is there anything else left to be done?

stumpy/gpu_aamp.py Show resolved Hide resolved
stumpy/gpu_stump.py Show resolved Hide resolved
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Dec 2, 2022

@seanlaw

I fixed gpu_aamp

Can you please describe what the issue was? I am curious if it was simply a bug or did we have to do something drastically different from the normalized version. My main concern is maintaining consistency across normalized and non-normalized methods.

I reviewed the code to just show you the parts that are related to gpu and fixed by my recent commits.

@NimaSarajpoor So far, everything looks good to me and nothing obvious stood out. Is there anything else left to be done?

I think everything is good. The only thing you may want to do is to just scan the docstrings to make sure everything is correct.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 2, 2022

The only thing you may want to do is to just scan the docstrings to make sure everything is correct.

I can do that!

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor suggestion

stumpy/aamp.py Outdated Show resolved Hide resolved
@seanlaw seanlaw merged commit 8da71ec into TDAmeritrade:main Dec 3, 2022
@seanlaw
Copy link
Contributor

seanlaw commented Dec 3, 2022

@NimaSarajpoor Thanks again for this contribution!

@NimaSarajpoor NimaSarajpoor deleted the TopK_MatrixProfile_Non_Normalized branch September 4, 2023 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants