Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape order for the input passed to the transformer #33

Open
Abner77 opened this issue May 4, 2020 · 1 comment
Open

Shape order for the input passed to the transformer #33

Abner77 opened this issue May 4, 2020 · 1 comment

Comments

@Abner77
Copy link

Abner77 commented May 4, 2020

Hi Kirill, thanks for the great work! It's great to have this in keras!

I'm trying to use the transformer, but I'm not sure if I'm doing the shapes correctly. I have some data in the form of 10 sequences, 40 length vectors, (something like 10 timesteps of vector that have 40 features) so I'm using a keras Model and I have my layer as inputNiveles = Input(shape=(10, 40), dtype='float', name="input_niveles"). If I purposedly put a wrong number of heads in the transformer, the error I get is this one:
"The size of the last dimension of the input (40) must be evenly divisible by the numberof the attention heads 11"
But, are the heads not supposed to act at the level of sequence, not features making the error say something like input(10) must be....?
Is the transformer expecting the number of steps or sequence be the last dimension?
I'm also using the coordinate embedding layer.
` add_coordinate_embedding2 = TransformerCoordinateEmbedding(transformer_depth, name='coordinate_embedding2')

transformer_block2 = TransformerBlock(name='transformer2',num_heads=10,residual_dropout=0.0,attention_dropout=0.0,use_masking=True)    

nivelesOut = inputNiveles

for step in range(transformer_depth):
    nivelesOut = transformer_block2(        
        add_coordinate_embedding2(nivelesOut, step=step))

nivelesOut = Flatten(name="aplane_niveles")(nivelesOut)`

Thank you very much Kirill

@EreaxQ
Copy link

EreaxQ commented Apr 26, 2021

No, heads act on the features so the head number must perfectly divide the number of features. Read 'Attention is All you Need' for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants