You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I read your paper on speech-driven facial animations, and it's good to see your code that explains the overall architecture of the model.
I have some questions on the sequence discriminator described on your paper. In the paper, you mentioned that the frames at each time steps are encoded using a CNN, and fed into a two-layer GRU. Is this CNN identical to the Identity Encoder used for the Generator?
You also mentioned adding the audio as a conditional input to the network. How is this audio encoded, and how is it added to the input?
Thanks
The text was updated successfully, but these errors were encountered:
Hello. I read your paper on speech-driven facial animations, and it's good to see your code that explains the overall architecture of the model.
I have some questions on the sequence discriminator described on your paper. In the paper, you mentioned that the frames at each time steps are encoded using a CNN, and fed into a two-layer GRU. Is this CNN identical to the Identity Encoder used for the Generator?
You also mentioned adding the audio as a conditional input to the network. How is this audio encoded, and how is it added to the input?
Thanks
The text was updated successfully, but these errors were encountered: