Skip to content

Commit

Permalink
Merge branch 'main' into return_best_text
Browse files Browse the repository at this point in the history
  • Loading branch information
TheoBoyer authored Dec 18, 2023
2 parents f677284 + 8bc8860 commit bf2612f
Show file tree
Hide file tree
Showing 20 changed files with 7,461 additions and 2,343 deletions.
33 changes: 30 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,38 @@ on:
pull_request:
branches:
- main

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Fetch base branch
run: git fetch origin ${{ github.base_ref }}
- uses: actions/setup-python@v4
with:
python-version: "3.8"
architecture: x64
- name: Get pip cache dir
id: pip-cache
run: |
echo "dir=$(pip cache dir)" >> $GITHUB_OUTPUT
- name: pip/pre-commit cache
uses: actions/cache@v3
with:
path: |
${{ steps.pip-cache.outputs.dir }}
~/.cache/pre-commit
key: ${{ runner.os }}-pip-pre-commit-${{ hashFiles('**/.pre-commit-config.yaml') }}
restore-keys: |
${{ runner.os }}-pip-pre-commit
- name: pre-commit
run: |
pip install -U pre-commit
pre-commit install --install-hooks
pre-commit run --all-files
whisper-test:
needs: pre-commit
runs-on: ubuntu-latest
strategy:
matrix:
Expand All @@ -23,7 +53,4 @@ jobs:
- uses: actions/checkout@v3
- run: echo "$CONDA/envs/test/bin" >> $GITHUB_PATH
- run: pip install .["dev"]
- run: black --check --diff -t py38 --include '(\.pyi?)$' .
- run: isort --check --diff .
- run: flake8 --ignore E203,W503,W504,E501,E731,E741 .
- run: pytest --durations=0 -vv -k 'not test_transcribe or test_transcribe[tiny] or test_transcribe[tiny.en]' -m 'not requires_cuda'
28 changes: 28 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: check-json
- id: end-of-file-fixer
types: [file, python]
- id: trailing-whitespace
types: [file, python]
- id: mixed-line-ending
- id: check-added-large-files
args: [--maxkb=4096]
- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "-l", "88", "--trailing-comma", "--multi-line", "3"]
- repo: https://github.com/pycqa/flake8.git
rev: 6.0.0
hooks:
- id: flake8
types: [python]
args: ["--max-line-length", "88", "--ignore", "E203,E501,W503,W504"]
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# CHANGELOG

## [v20231117](https://github.com/openai/whisper/releases/tag/v20231117)

* Relax triton requirements for compatibility with pytorch 2.1 and newer ([#1802](https://github.com/openai/whisper/pull/1802))

## [v20231106](https://github.com/openai/whisper/releases/tag/v20231106)

* large-v3 ([#1761](https://github.com/openai/whisper/pull/1761))

## [v20231105](https://github.com/openai/whisper/releases/tag/v20231105)

* remove tiktoken pin ([#1759](https://github.com/openai/whisper/pull/1759))
* docs: Disambiguation of the term "relative speed" in the README ([#1751](https://github.com/openai/whisper/pull/1751))
* allow_pickle=False while loading of mel matrix IN audio.py ([#1511](https://github.com/openai/whisper/pull/1511))
* handling transcribe exceptions. ([#1682](https://github.com/openai/whisper/pull/1682))
* Add new option to generate subtitles by a specific number of words ([#1729](https://github.com/openai/whisper/pull/1729))
* Fix exception when an audio file with no speech is provided ([#1396](https://github.com/openai/whisper/pull/1396))

## [v20230918](https://github.com/openai/whisper/releases/tag/v20230918)

* Add .pre-commit-config.yaml ([#1528](https://github.com/openai/whisper/pull/1528))
* fix doc of TextDecoder ([#1526](https://github.com/openai/whisper/pull/1526))
* Update model-card.md ([#1643](https://github.com/openai/whisper/pull/1643))
* word timing tweaks ([#1559](https://github.com/openai/whisper/pull/1559))
* Avoid rearranging all caches ([#1483](https://github.com/openai/whisper/pull/1483))
* Improve timestamp heuristics. ([#1461](https://github.com/openai/whisper/pull/1461))
* fix condition_on_previous_text ([#1224](https://github.com/openai/whisper/pull/1224))
* Fix numba depreceation notice ([#1233](https://github.com/openai/whisper/pull/1233))
* Updated README.md to provide more insight on BLEU and specific appendices ([#1236](https://github.com/openai/whisper/pull/1236))
* Avoid computing higher temperatures on no_speech segments ([#1279](https://github.com/openai/whisper/pull/1279))
* Dropped unused execute bit from mel_filters.npz. ([#1254](https://github.com/openai/whisper/pull/1254))
* Drop ffmpeg-python dependency and call ffmpeg directly. ([#1242](https://github.com/openai/whisper/pull/1242))
* Python 3.11 ([#1171](https://github.com/openai/whisper/pull/1171))
* Update decoding.py ([#1219](https://github.com/openai/whisper/pull/1219))
* Update decoding.py ([#1155](https://github.com/openai/whisper/pull/1155))
* Update README.md to reference tiktoken ([#1105](https://github.com/openai/whisper/pull/1105))
* Implement max line width and max line count, and make word highlighting optional ([#1184](https://github.com/openai/whisper/pull/1184))
* Squash long words at window and sentence boundaries. ([#1114](https://github.com/openai/whisper/pull/1114))
* python-publish.yml: bump actions version to fix node warning ([#1211](https://github.com/openai/whisper/pull/1211))
* Update tokenizer.py ([#1163](https://github.com/openai/whisper/pull/1163))

## [v20230314](https://github.com/openai/whisper/releases/tag/v20230314)

* abort find_alignment on empty input ([#1090](https://github.com/openai/whisper/pull/1090))
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,7 @@ pip install setuptools-rust

## Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
Expand All @@ -70,9 +69,9 @@ There are five model sizes, four with English-only versions, offering speed and

The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.

Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4. Meanwhile, more BLEU (Bilingual Evaluation Understudy) scores can be found in Appendix D.3. Both are found in [the paper](https://arxiv.org/abs/2212.04356).
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

![WER breakdown by language](https://raw.githubusercontent.com/openai/whisper/main/language-breakdown.svg)
![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)



Expand Down
Loading

0 comments on commit bf2612f

Please sign in to comment.