CepstralCoefficients.jl

Introduction

This package provides different methods of calculating cepstral coefficients. Three implemented methods are as follows:

Cepstral coefficients based on Auto Regressive Moving Average (ARMA) coefficients
Cepstral coefficients based on Auto-Regressive (AR) coefficients
Real cepstral coefficients

Cepstral coefficients based on ARMA coefficients

Considering the following $ARMA(p, q)$ process:

$${X_t} = \sum\limits_{i = 1}^p {{\phi _r}{X_{t - r}} + {\epsilon_t} + \sum\limits_{i = 1}^q {{\theta _r}} } {\epsilon_{t - r}}$$

where ${\phi _r}{\rm{, }}r = 1,2,...,p$ are the autoregressive (AR) parameters, ${\theta _r}{\rm{, }}r = 1,2,...,q$ are the moving average (MA) parameters and ${{\varepsilon _t}}$, is a white noise process. The spectral density of an $ARMA(p, q)$ process is defined as

$${f_x}(\omega ) = {{{\sigma ^2}} \over {2\pi }}\left| {{{1 - \sum\limits_{h - 1}^p {{\phi _h}{e^{ihw}}} } \over {1 - \sum\limits_{h - 1}^q {{\theta _h}{e^{ihw}}} }}} \right|$$

where ${{\sigma ^2}}$ is the variance of ${{\varepsilon _t}}$. The logarithm of an estimated spectral density function can be approximated using an exponential form for the log spectral density function, namely,

$${\lambda _x}(\omega ) = \log {f_x}(\omega ) = {{{\sigma ^2}} \over {2\pi }}\exp (2\sum\limits_{h - 1}^p {{\psi _h}\cos (h\omega )} )$$

where $0 < \omega < \pi $, and where ${\sigma ^2}$ and ${\psi _1},{\psi _2},...,{\psi _p}$ are unknown parameters. The following approximation of the log of the log spectral density function, namely, the spectrum of the log spectral density function, the cepstrum of ${X_t}$ is introduced:

$$CP(\omega ) = \log {\lambda _x}(\omega ) = {\psi _0} + 2\sum\limits_{h - 1}^p {{\psi _k}\cos (2\pi h)}$$

where ${\psi _0} = \int\limits_0^1 {\log {\lambda_x}(\omega )d} \omega$ is the logarithm of the variance of the white noise process ${{\varepsilon _t}}$. Under the absolute integrability on (0,1) of $\log {\lambda _x}(\omega )$, the Fourier coefficients of the expansion of $\log {\lambda _x}(\omega )$ are defined by:

$${\psi _k} = \int\limits_0^1 {\log {\lambda _x}(\omega )\cos (2\pi k)d} \omega {\rm{ }}$$

for $k = 0,1,2,...$ and are referred to as the cepstral coefficients. Due to the convergence in mean square of $\log {f_x}(\omega )$ with increasing $p$, only a small number of cepstral coefficients can describe the second-order characteristics of a time series [1].

Cepstral coefficients based on AR coefficients

Consider a time–series $X_t$ defined by an $AR(p)$ model $X_t+\alpha_1X_{t-1}+\dots+\alpha_pX_{t-p}=\epsilon_t$ where $\alpha_1+\dots+\alpha_p$ are the auto-regression coefficients and $\epsilon_t$ is white noise with mean $0$ and certain non-zero variance. Note that for every ARIMA model, there exists an equivalent AR model, that can be obtained from the ARIMA model by polynomial division.
The cepstral coefficients for an $AR(p)$ time–series can be derived from the auto-regression coefficients [2]:

$${c_n} = \left\{ {\begin{array}{*{20}{c}} { - {\alpha _1},}&{{\text{if n = 1}}} \\ { - {\alpha _n} - \sum\nolimits_{m = 1}^{n - 1} {\left( {1 - \frac{m}{n}} \right){\alpha _m}{c_{n - m}},} }&{{\text{if }}1 < n \leqslant p} \\ { - \sum\nolimits_{m = 1}^p {\left( {1 - \frac{m}{n}} \right){\alpha _m}{c_{n - m}},} }&{{\text{if }}p < n} \end{array}} \right.$$

Real cepstral coefficients

The (real) cepstrum is defined as the inverse Fourier transform of the (real) logarithm of the Fourier transform of the time series.

How to use

To use the implementation, one should pursue the following steps:

Install the package by running the following command in the Julia REPL:

using Pkg
Pkg.add(url="https://github.com/shayandavoodii/CepstralCoefficients.jl.git")

Import the package using the following command in the Julia REPL:
```
using CepstralCoefficients
```

Afterward, one should use the cc function as the primary function of the implementation. The function above calculates the first n number of cepstral coefficients according to the given time series.

Example

A sequence of time series is required to test the implementation. In this case, the following assets are used: ["MSFT", "AAPL", "GOOG", "AMZN", "FB", "TSLA", "NVDA", "INTC", "CSCO", "ADBE"]. In this example, the first 5 cepstral coefficients are calculated. The lag of the $AR$ process is set to be 3.

Fetch data

using YFinance
tickers = ["MSFT", "AAPL", "GOOG", "AMZN", "META", "TSLA", "NVDA", "INTC", "CSCO", "ADBE"]
querry = [get_prices(ticker, startdt="2019-01-01", enddt="2020-01-01")["adjclose"] for ticker in tickers]
prices = stack(querry, dims=1);

Afterward, the cc function is employed to calculate the cepstral coefficients.

Calculate cepstral coefficients

The calculation method (ARMACepstral or ARCepstral or RealCepstral) should be passed as the first argument to the cc function, the time series should be passed as the second argument in order to calculate the cepstral coefficients of the passed time series, and the number of coefficients should be specificed as the third argument. For example, the following code calculates the first 5 cepstral coefficients of the given time series prices using the ARCepstral(3) method:

n=5
p=3
model = ARCepstral(p)
cepscoefs = cc(model, prices, n)
# 5×10 Matrix{Float64}:
#  -0.895194  -0.960283  -0.987055  -1.0       -0.968447  -1.0       -0.992712  -1.0       -1.0       -0.964975
#   0.438019   0.504086   0.623425   0.671433   0.475873   0.551948   0.563323   0.556987   0.577228   0.553053
#  -0.409617  -0.422937  -0.586731  -0.638229  -0.317215  -0.373129  -0.468687  -0.368036  -0.464358  -0.48796
#   0.313867   0.336202   0.509322   0.56959    0.233927   0.291145   0.386832   0.286327   0.384007   0.402437
#  -0.257968  -0.286627  -0.482991  -0.557164  -0.184029  -0.241863  -0.3434    -0.236681  -0.341143  -0.359296

The result is a n×m matrix, where n is the number of cepstral coefficients and m is the number of time series. The next step is to perform clustering on the calculated cepstral coefficients.

Extra tools

Regarding this field of study, an extra tool has been provided in this package that is shiped as an extension. Kalpakis et al. (2001) have used cepstral coefficients in field of time series analysis. They have used Partitioning Around Medoids (PAM) clustering method (AKA K-Medoids) to find similar time series regarding their cepstral coefficient values. The result of cc function should be passed to the cepsclustering function in order to perform PAM method on the cepstral coefficient in order to perform clustering. Hence, this extension can be referred to as an implementation of the aforementioned study, in Julia.

PAM Clustering

In this regard, one can use the cepsclustering function to perform Partition Around Medoids (PAM) clustering on the calculated cepstral coefficients. The function takes the cepstral coefficients as the first argument and the maximum number of clusters to be examined (in order to find the optimal number of clusters) as the second argument.

using Clustering
k = 4
clusters = cepsclustering(cepscoefs, k)
# 2-element Vector{Vector{Int64}}:
#  [1, 3, 5, 7, 9]
#  [2, 4, 6, 8, 10]

The result indicates that the 1st, 3rd, 5th, 7th, and 9th time series are in the first cluster and the rest are in the second cluster. In other words, the first cluster contains "MSFT", "GOOG", "META", "NVDA", and "CSCO" and the second cluster contains "AAPL", "AMZN", "FB", "TSLA", and "INTC". Note that, the result may vary each time due to the random nature of the PAM algorithm.

Plotting

In order to probe the results, it is better to visualize it. In this subsection, the time series are plotted in two different color tones each of which represents a cluster.

colortones = ["#E46262" "#2C8EF6" "#CD6969" "#408BDA" "#B57070" "#5387BF" "#9E7777" "#6784A3" "#877E7E" "#7A8188"]
linestyle = [:solid :dash :solid :dash :solid :dash :solid :dash :solid :dash]
plot(
  prices',
  label=permutedims(tickers),
  title="Stock prices\nAnalysed data range: $(startdt) to $(enddt)",
  xlabel="Date",
  ylabel="Price",
  legend=:outerright,
  dpi=300,
  color=colortones,
  size=(1000, 400),
  linestyle=linestyle,
  bottom_margin=6mm,
  left_margin=6mm,
)

The results are not satisfactory, which is expected since the PAM clustering is inaccurate due to its random initialization. The random initialization may result in a nonoptimal solution. As seen in the figure above, the 'ABDE' and 'NVDA' series follow similar patterns but are in different clusters; this is surprising because the opposite was expected. Furthermore, the result of PAM clustering is not deterministic. Thus, the output may differ in each run, which is another drawback.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
assets		assets
data		data
example		example
ext		ext
src		src
test		test
.gitignore		.gitignore
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CepstralCoefficients.jl

Table of contents

Introduction

How to use

Example

Fetch data

Calculate cepstral coefficients

Extra tools

PAM Clustering

Plotting

About

Languages

shayandavoodii/CepstralCoefficients.jl

Folders and files

Latest commit

History

Repository files navigation

CepstralCoefficients.jl

Table of contents

Introduction

How to use

Example

Fetch data

Calculate cepstral coefficients

Extra tools

PAM Clustering

Plotting

About

Topics

Resources

Stars

Watchers

Forks

Languages