Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

Open
Killer3048 opened this issue Feb 5, 2025 · 3 comments

Comments

@Killer3048
Copy link

I propose the following concise enhancements:

  1. Enhanced Positional Encoding:
    Replace the standard sinusoidal positional encodings with relative or learnable encodings—such as Fourier features or RoPE—to better capture the relationships between tokens. This adjustment will allow the model to more effectively learn cyclic and seasonal patterns, independent of absolute positions.

  2. Dynamic/Multi-Scale Patching:
    Instead of using a fixed patch size, implement dynamic patching that adapts to local variability, or design a multi-scale framework that processes both small (fine-grained) and large (coarse-grained) patches in parallel. Merging these representations (via an attention-based fusion layer) will enable the model to capture both detailed local fluctuations and broad long-term trends.

These improvements are aimed at enhancing the model’s ability to model complex temporal dependencies,

@iganggang
Copy link

Could u like to suggest some papers and code about these ideas? Such as dynamic/multi-scale patching?

@Killer3048
Copy link
Author

@iganggang

  1. Dynamic/Multi-scale patching.
  • DRFormer:
    The paper proposes a multi-scale transformer that employs a dynamic tokenizer to extract patches at different granularities, capturing diverse receptive fields for long-term time-series forecasting. The GitHub repository offers a PyTorch implementation of these dynamic patching strategies.
    GitHub – DRFormer
  • Pathformer:
    This work introduces adaptive pathways for multi-scale patching, where patch sizes are dynamically adjusted to capture both local and global temporal features in time series. The integration code on GitHub demonstrates the adaptive multi-resolution patch extraction in a transformer framework.
    GitHub – Pathformer
  1. Enhanced positional encodings.
  • RoFormer:
    The paper introduces rotary positional embeddings (RoPE), a method that encodes absolute and relative positions using rotation matrices, enabling flexible sequence length extrapolation and improved attention. The integration (available on arXiv) details the mathematical formulation and how RoPE is incorporated into transformers.
    RoFormer on arXiv
  • Rotary position embedding for ViT:
    This repository applies RoPE to Vision Transformers, enhancing the handling of spatial positional information in image data. The provided code demonstrates how to integrate rotary position embeddings into ViT models to improve performance on vision tasks.
    GitHub – RoPE-ViT

@iganggang
Copy link

iganggang commented Feb 10, 2025

Thank you very much for your quick, detailed reply. These papers look too complicated for me. I need to spend some time reading.

Besides, I am trying to modify the attention mechanism(scale dot-product attention) to causal attention(masked self-attention) in Patchtst. However, I find the results worse than Patchtst in ETTh1.csv. Its validation loss decreases by 0.0011 during the training, but its test loss increases from 0.4148351252 to 0.4190029799938202. Could you give some suggestions about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants