Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manage styles in the output #3

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/venv
/.idea
*.vtt
*.srt
/build/
/vtt_to_srt.egg-info
55 changes: 43 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,54 @@
[![asciicast](https://asciinema.org/a/234035.svg)](https://asciinema.org/a/234035)
# vtt-to-srt subtitle converter

# Python 3
## Packages
A simple python module to convert vtt files to srt files **_with colours_**!

- pysrt
- webvtt-py
## ❓ Why this fork?

Some streaming platforms use vtt subtitles with colours. (ie. Auvio)

- [upstream](https://github.com/lbrayner/vtt-to-srt) doesn't convert vtt to srt subtitles when there are colours.
see [issue #2](https://github.com/lbrayner/vtt-to-srt/issues/2)


- other tools like `ffmpeg` don't work either
`ffmpeg -i captions.vtt captions.srt`
see ticket [subtitles: vtt conversion to srt creates an empty output](https://trac.ffmpeg.org/ticket/9609)


- the library used is `webvtt-py`
it can't convert vtt files with colours to srt files
see [pull request #39](https://github.com/glut23/webvtt-py/pull/39)

## 💾 Installation

```
pip3 install pysrt webvtt-py
pip3 install .
```

# Synopsis
## 🖥️ Command line interface

```
./vtt-to-srt FILE...
usage: vtt_to_srt [-h] [-s] [file [file ...]]

vtt_to_srt is a command line tool to convert vtt subtitles to srt files

positional arguments:
file a file. The command accepts zero, one or more files as arguments. For each .vtt, a .srt will be
generated in the same folder. Any other extension is ignored.

optional arguments:
-h, --help show this help message and exit
-s strip all tags in output srt
```

# Description
## 📙 Usage

The command accepts zero, one or more files as arguments.
For each _.vtt_, a _.srt_ will be generated in the same folder.
Any other extension is ignored.
If you specify no file, it will convert all _.vtt_ files in the directory.

## Other projects

The command accepts one or more files as arguments.
For each _.vtt_, a _.srt_ will be generated in the same folder.
Any other extension is ignored.
`vtt_to_srt` is used in a docker image for youtube-dl with vtt to srt conversion
https://github.com/darodi/docker-youtube-dl-vtt-to-srt
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pysrt
webvtt-py
argparse
21 changes: 21 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from setuptools import setup, find_packages

setup(
name='vtt_to_srt',
version='0.1',
url='',
license='LGPLv2',
python_requires='>=3.4',
author='lbrayner',
author_email='[email protected]',
description='vtt_to_srt',
py_modules=['vtt_to_srt'],
install_requires=[
"pysrt",
"webvtt-py",
"argparse"
],
entry_points={
'console_scripts': ['vtt_to_srt=vtt_to_srt:main']
}
)
39 changes: 0 additions & 39 deletions vtt-to-srt

This file was deleted.

85 changes: 85 additions & 0 deletions vtt_to_srt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/usr/bin/env python3
import argparse
import html
import os
import re
import sys

from pysrt.srtitem import SubRipItem, SubRipTime
from webvtt import WebVTT


def replace_colors(raw_text, colours_arg, tag_name):
result = raw_text
for k, v in colours_arg.items():
regex_string = "<" + tag_name + "(?:\\..*?)?\\." + str(k) + "(?:\\..*?)?>(.*?)</" + tag_name + ">"
if re.search(regex_string, result) is not None:
result = re.sub(regex_string, lambda x: replace_color(x, tag_name, v), result)
return result


def replace_color(x, tag_name, v):
return ("" if tag_name == "c" else ("<" + tag_name + ">")) \
+ "<font color=\"" + v + "\">" \
+ html.unescape(x.group(1)) \
+ "</font>" \
+ ("" if tag_name == "c" else ("</" + tag_name + ">"))


COLOURS_PATTERN = re.compile(r'::cue\(\.([^)]+)\)\s*{.*?color:(.*?);.*?}')


def main():
parser = argparse.ArgumentParser(
description='vtt_to_srt is a command line tool to convert vtt subtitles to srt files')
parser.add_argument('file', nargs='*',
help='a file. The command accepts zero, one or more files as arguments.\n'
'For each .vtt, a .srt will be generated in the same folder.\n'
'Any other extension is ignored.')
parser.add_argument('-s', dest='strip', action='store_true', help='strip all tags in output srt')
args = parser.parse_args()

if len(args.file) == 0:
for file in os.listdir():
if file.endswith(".vtt"):
args.file.append(file)

for file in args.file:
index = 0

file_name, file_extension = os.path.splitext(file)

if not file_extension.lower() == ".vtt":
sys.stderr.write("Skipping %s.\n" % file)
continue

srt = open(file_name + ".srt", "w", encoding='utf-8')

read = WebVTT().read(file)

colours = dict()
if args.strip is False:
for style in read.styles:
colours_found = COLOURS_PATTERN.findall(style.text)
colours_classes = list(map(lambda x: x[0], colours_found))
colours_values = list(map(lambda x: x[1].replace(" ", ""), colours_found))
colours = dict(zip(colours_classes, colours_values))

for caption in read.captions:
index += 1
start = SubRipTime(0, 0, caption.start_in_seconds)
end = SubRipTime(0, 0, caption.end_in_seconds)
caption_text = caption.raw_text
no_tag_found = True
if args.strip is False:
for tag in ['c', 'i', 'b', 'u']:
if re.search("<" + tag + "\\..*?>.*?</" + tag + ">", caption_text) is not None:
caption_text = replace_colors(caption_text, colours, tag)
no_tag_found = False
if no_tag_found:
caption_text = html.unescape(caption.text)
srt.write(SubRipItem(index, start, end, caption_text).__str__() + "\n")


if __name__ == "__main__":
main()