Questions for the shape of the audio_embeddings #43

zypsjtu · 2024-06-19T08:08:34Z

V-Express/pipelines/v_express_pipeline.py:

        frame_audio_embeddings = []
        for frame_idx in range(video_length):
            start_sample = frame_idx
            end_sample = frame_idx + 2 * num_pad_audio_frames

            frame_audio_embedding = audio_embeddings[2 * start_sample:2 * (end_sample + 1), :]  # [2*num_pad+1, dim]
            frame_audio_embeddings.append(frame_audio_embedding)
        audio_embeddings = torch.stack(frame_audio_embeddings, dim=0)  # [vid_len, 2*num_pad+1, dim]

in my tests, the shape of the audio_embeddings is [vid_len, 2*(2num_pad+1), dim] instead of [vid_len, 2num_pad+1, dim] in code annotation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions for the shape of the audio_embeddings #43

Questions for the shape of the audio_embeddings #43

zypsjtu commented Jun 19, 2024

Questions for the shape of the audio_embeddings #43

Questions for the shape of the audio_embeddings #43

Comments

zypsjtu commented Jun 19, 2024