Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing WebVTT cue-identifiers fails #90

Open
scottsidwell opened this issue May 8, 2023 · 1 comment
Open

Parsing WebVTT cue-identifiers fails #90

scottsidwell opened this issue May 8, 2023 · 1 comment

Comments

@scottsidwell
Copy link

scottsidwell commented May 8, 2023

I'm dealing with the output of WebVTT files from 3rd party services, who attempt to provide speaker diarization via cue identifiers.

This means we have a WebVTT file that looks like:

Speaker 1
00:02:00.000 --> 00:03:00.000
Hey there

Speaker 2
00:04:00.000 --> 00:05:00.000
Hi

Currently, this fails because we expect to parse a timestamp (instead of a cue id)
https://github.com/gsantiago/subtitle.js/blob/master/src/Parser.ts#L125-L127

How hard do you think adding support for this work be / do you have any pointers to help me get started?

@scottsidwell scottsidwell changed the title Parsing non-numerical cue-identifiers fails Parsing WebVTT cue-identifiers fails May 8, 2023
@gsantiago
Copy link
Owner

It still doesn't support named ID's, only digits.

If you don't want to update the Parser.ts to support it, a temporary fix could be done by stripping the "Speaker" from the cue ids:

const re = /Speaker (\d+)/g;
const newSource = source.replace(re, "$1");

const cues = parseSync(newSource)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants