Skip to content

Pattern matching on GPU #718

@pavlexander

Description

@pavlexander

Question

Is it possible to utilize GPU in pattern matching tasks?

I am interested in following 2 API's

  • stumpy.mass - returns distance profiles
  • stumpy.match - returns matches

I get following test execution time measurements:

stumpy.mass --- 0.9919984340667725 seconds ---
stumpy.match --- 0.05500030517578125 seconds ---
stumpy.stump --- 6.415998935699463 seconds ---
stumpy.gpu_stump --- 0.030998945236206055 seconds ---

Conclusion: It takes twice as low amount of time (0.03 seconds) to generate the full matrix profile on GPU, than it is to find matches on CPU! Isn't pattern matching requires lower computational power than matrix profile computation?

The above tests were done on dataset where n = 55, and m = 5. I am planning on running the pattern matching on a dataset with n =~ 500_000 so the execution time will proportionally bump into unreasonably high numbers.. :)

This makes me think that currently GPU is not being used for neither of the APIs ( stumpy.mass, stumpy.match)

Therefore, my question is - is "pattern matching on GPU" feature planned or is present, and if not, could I propose it to be implemented/added to the agenda? :)

Thank You!

Code used in tests

import time
import stumpy
from stumpy import config
import numpy as np

def start_timer():
    return time.time()

def stop_timer(start_time, adder=''):
    print(adder, "--- %s seconds ---" % (time.time() - start_time))
    return start_timer()

repeating_data = [219.14, 219.25, 219.15, 219.07, 219.28]

older_data = [218.24, 218.79, 218.69, 218.84, 219.21, 219.0, 218.93, 219.01, 219.0,
              218.31, 218.89, 218.21, 218.47, 218.7, 218.52, 218.24, 218.03, 218.03, 218.04, 218.04, 218.21, 218.25,
              218.54, 218.44, 218.29]

newer_data = [218.29, 218.42, 218.29, 218.51, 218.87, 218.56, 218.67, 218.58, 218.58, 218.79, 218.81, 218.81,
              218.78, 218.84, 218.82, 218.82, 218.69, 218.75, 218.73, 218.98]

data = []
data += older_data
data += repeating_data
data += newer_data
data += repeating_data

m = 5

data_to_find = np.array(repeating_data)
data = np.array(data)

print('data_to_find length ', len(data_to_find))
print('data length ', len(data))

timer2 = start_timer()
distance_profile = stumpy.mass(data_to_find, data)
timer2 = stop_timer(timer2, 'Mass ') 

matches = stumpy.match(data_to_find, data)
timer2 = stop_timer(timer2, 'Match ') 

mp = stumpy.stump(data, m)
timer2 = stop_timer(timer2, 'CPU stomp ') 

mp = stumpy.gpu_stump(data, m)
timer2 = stop_timer(timer2, 'GPU stomp ') 

Software and HW

Win x64, Ryzen 6 cores CPU, 1080 ti GPU

stumpy v1.11.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions