-
Notifications
You must be signed in to change notification settings - Fork 23
Why Correlation Approximation?
The correlation approximation engine is a spark-based implementation of the more well known Google Correlate.
When analyzing a new time series you may want to compare it against a bank of existing time series data to discover possible relationships in the data. You may also want to compare all time series in a data set to one another to finds correlations.
Direct comparison of time series against other series may work for small or moderate size datasets, but with large data sets and long vectors the operation could take longer than a user is willing to wait. By using a scalable approximation technique you can answer these types of correlation queries on huge sets of data very quickly.
For more information on the origins of correlation approximation see Google correlate:
This implementation is currently much simpler than Google Correlate. We've started with a simple system that can read local or hdfs files and can provide correlation results in local or hdfs files. We've also included a simple interactive command line interface.