Parsing WebVTT cue-identifiers fails #90

scottsidwell · 2023-05-08T04:20:20Z

I'm dealing with the output of WebVTT files from 3rd party services, who attempt to provide speaker diarization via cue identifiers.

This means we have a WebVTT file that looks like:

Speaker 1
00:02:00.000 --> 00:03:00.000
Hey there

Speaker 2
00:04:00.000 --> 00:05:00.000
Hi

Currently, this fails because we expect to parse a timestamp (instead of a cue id)
https://github.com/gsantiago/subtitle.js/blob/master/src/Parser.ts#L125-L127

How hard do you think adding support for this work be / do you have any pointers to help me get started?

The text was updated successfully, but these errors were encountered:

gsantiago · 2023-05-08T12:04:16Z

It still doesn't support named ID's, only digits.

If you don't want to update the Parser.ts to support it, a temporary fix could be done by stripping the "Speaker" from the cue ids:

const re = /Speaker (\d+)/g;
const newSource = source.replace(re, "$1");

const cues = parseSync(newSource)

scottsidwell changed the title ~~Parsing non-numerical cue-identifiers fails~~ Parsing WebVTT cue-identifiers fails May 8, 2023

anoblet mentioned this issue Oct 8, 2024

fix(cxl-ui): cxl-jw-player handle tracks which are improperly formatted conversionxl/aybolit#435

Merged

Provide feedback