Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various parts of the spacetimeformer embedding #11

Open
qAp opened this issue Jan 25, 2022 · 4 comments
Open

Various parts of the spacetimeformer embedding #11

qAp opened this issue Jan 25, 2022 · 4 comments

Comments

@qAp
Copy link
Owner

qAp commented Jan 25, 2022

Overall, the spacetimeformer embedding, spacetimeformer_model.nn.embed.SpacetimeformerEmbedding takes inputs of shape (N, L, d_x) and (N, L, d_y), and it outputs several embeddings of shape (N, L * d_y, d_model).

  • N is the batch size.
  • L is the sequence length. For the encoder, this is context_points. For the decoder, it's - target_points.
  • d_x is the number of input features.
  • d_y is the number of output features.

The overall embedding consists of several embeddings: x_emb, y_emb, var_emb, and given_emb.

@qAp
Copy link
Owner Author

qAp commented Jan 25, 2022

x_emb
This is the time2vec embedding, which maps a timestamp to a vector. For more details, see: #9 (comment).

As there are L timestamps, there are actually only L such embedding vectors per batch, and so they are actually repeated d_y times:

t2v_emb = self.x_emb(x).repeat(1, d_y, 1)

@qAp
Copy link
Owner Author

qAp commented Jan 25, 2022

y_emb is for embedding the target values y.

Now, there are L * d_y target values. Each of these has an associated timestamp, so to each of these, the time2vec embedding vector is attached (concatenated). This results in a vector that is 1 element longer than the time2vec embedding vector. This vector is passed through y_emb, which is actually a linear layer, becoming a new embedding vector of length d_model.

For the whole batch, this gives a tensor of shape (N, L * d_y, d_model).

@qAp
Copy link
Owner Author

qAp commented Jan 25, 2022

var_emb embeds the target variables themselves, not their values.

For example, if there are 14 target variables, these can be indexed from 0 to 13. Then, what var_emb does is that it maps index 0 to some vector, index 1 to some other vector, and so on.

Or, in this competition, 'Bitcoin' is mapped to a vector, and Binance is mapped to another, etc.

In general, there are d_y embedding vectors, one for each of the target variable.

Now, the target variable, the name itself, is independent of time, so each of the d_y variable embedding vectors is repeated L times.

For the batch, this gives a tensor of embeddings of shape (N, L * d_y, d_model).

@qAp
Copy link
Owner Author

qAp commented Jan 25, 2022

given_emb embeds whether a target value y is available or not.

Here, "available or not" does not concern missing values in the original data, but rather whether the target value is meant to be predicted (not available) or to be used for prediction (available).

Being available is mapped to an embedding vector, while being unavailable is mapped to another.

For the entire batch, the resulting tensor is of shape (N, L * d_y, d_model).

This embedding tensor is actually summed together with the embedding tensor returned by y_emb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant