[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

Killer3048 · 2025-02-05T03:00:38Z

I propose the following concise enhancements:

Enhanced Positional Encoding:
Replace the standard sinusoidal positional encodings with relative or learnable encodings—such as Fourier features or RoPE—to better capture the relationships between tokens. This adjustment will allow the model to more effectively learn cyclic and seasonal patterns, independent of absolute positions.
Dynamic/Multi-Scale Patching:
Instead of using a fixed patch size, implement dynamic patching that adapts to local variability, or design a multi-scale framework that processes both small (fine-grained) and large (coarse-grained) patches in parallel. Merging these representations (via an attention-based fusion layer) will enable the model to capture both detailed local fluctuations and broad long-term trends.

These improvements are aimed at enhancing the model’s ability to model complex temporal dependencies,

iganggang · 2025-02-10T04:08:19Z

Could u like to suggest some papers and code about these ideas? Such as dynamic/multi-scale patching?

Killer3048 · 2025-02-10T04:19:13Z

@iganggang

Dynamic/Multi-scale patching.

DRFormer:
The paper proposes a multi-scale transformer that employs a dynamic tokenizer to extract patches at different granularities, capturing diverse receptive fields for long-term time-series forecasting. The GitHub repository offers a PyTorch implementation of these dynamic patching strategies.
GitHub – DRFormer
Pathformer:
This work introduces adaptive pathways for multi-scale patching, where patch sizes are dynamically adjusted to capture both local and global temporal features in time series. The integration code on GitHub demonstrates the adaptive multi-resolution patch extraction in a transformer framework.
GitHub – Pathformer

Enhanced positional encodings.

RoFormer:
The paper introduces rotary positional embeddings (RoPE), a method that encodes absolute and relative positions using rotation matrices, enabling flexible sequence length extrapolation and improved attention. The integration (available on arXiv) details the mathematical formulation and how RoPE is incorporated into transformers.
RoFormer on arXiv
Rotary position embedding for ViT:
This repository applies RoPE to Vision Transformers, enhancing the handling of spatial positional information in image data. The provided code demonstrates how to integrate rotary position embeddings into ViT models to improve performance on vision tasks.
GitHub – RoPE-ViT

iganggang · 2025-02-10T05:54:45Z

Thank you very much for your quick, detailed reply. These papers look too complicated for me. I need to spend some time reading.

Besides, I am trying to modify the attention mechanism(scale dot-product attention) to causal attention(masked self-attention) in Patchtst. However, I find the results worse than Patchtst in ETTh1.csv. Its validation loss decreases by 0.0011 during the training, but its test loss increases from 0.4148351252 to 0.4190029799938202. Could you give some suggestions about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

Killer3048 commented Feb 5, 2025

iganggang commented Feb 10, 2025

Killer3048 commented Feb 10, 2025

iganggang commented Feb 10, 2025 •

edited

Loading

[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

[enhancement] Enhance positional encoding and introduce dynamic multi-scale patching for improved forecasting #233

Comments

Killer3048 commented Feb 5, 2025

iganggang commented Feb 10, 2025

Killer3048 commented Feb 10, 2025

iganggang commented Feb 10, 2025 • edited Loading

iganggang commented Feb 10, 2025 •

edited

Loading