How can we find patterns and anomalies in a tensor, or multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives each time step? Finding patterns and anomalies in a tensor is a crucial problem with many applications, including building safety monitoring, patient health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard PARAFAC and Tucker decomposition results are not directly interpretable. Although a few sampling-based methods have previously been proposed towards better interpretability, they need to be made faster, more memory efficient, and more accurate.
In this paper, we propose CTD, a fast, accurate, and directly interpretable tensor decomposition method based on sampling. CTD-S, the static version of CTD, provably guarantees a high accuracy that is up to 11x more accurate than that of the state-of-the-art method experimentally. Also, CTD-S is made up to 2.3x faster, and up to 24x more memory-efficient than the state-of-the-art method by removing redundancy. CTD-D, the dynamic version of CTD, is the first interpretable dynamic tensor decomposition method ever proposed. Also, it is made up to 82x faster than already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in the online distributed denial of service (DDoS) attack detection and online troll detection.
Please use the following citation for CTD:
Lee, J., Choi, D., & Sael, L. (2018). CTD: Fast, accurate, and interpretable method for static and dynamic tensor decompositions. PloS One, 13(7), e0200579. paper
The source codes used in the paper are available.
- CTD-v1.0: [download]
Comparison of our proposed CTD and the existing tensor-CUR. The static method CTD-S outperforms the state of-the-art tensor-CUR in terms of time, memory usage, and accuracy. The dynamic method CTD-D is the fastest.
Tensor-CUR (Existing) | CTD-S [Proposed] | CTD-D [Proposed] | |
---|---|---|---|
Interpretability | O | O | O |
Time | Fast | Faster | Fastest |
Memory Usage | Low | Lower | Low |
Accuracy | Low | High | High |
Online | O |
Name | Structure | Size | Nonzero | Download |
---|---|---|---|---|
Facebook-wall | User 1 - User 2 - Time | 63,891 × 63,890 × 1,504 | 738,485 | [Down] |
Facebook-wall (synthetic) | User 1 - User 2- Time | 63,891 × 63,890 × 1,504 | 1,169,656 | [Down] |
Hyperspectral Image | X - Y - Frequency | 538 × 323 × 148 | 25,715,854 | [Down] |
Infectious | Person 1 - Person 2 - Time | 407 × 410 × 1,392 | 17,298 | [Down] |
Hypertext 2009 | Attendee 1 - Attendee 2 - Time | 112 × 113 × 5,246 | 20,818 | [Down] |
Haggle | Person 1 - Person 2 - Time | 77 × 274 × 1,567 | 27,972 | [Down] |
CAIDA | Source IP - Destination IP - Time | 189 × 189 × 1,000 | 20,511 | [Down] |
CAIDA | Source IP - Destination IP - Time | 189 × 189 × 1,000 | 46,102 | [Down] |
Jungwoo Lee (Seoul National University)
Dongjin Choi (Seoul National University)
Lee Sael (Seoul National University)