Merge branch 'main' into return_best_text

openai · Dec 18, 2023 · bf2612f · bf2612f
2 parents f677284 + 8bc8860
commit bf2612f
Show file tree

Hide file tree

Showing 20 changed files with 7,461 additions and 2,343 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -6,8 +6,38 @@ on:
   pull_request:
     branches:
       - main
+
 jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Fetch base branch
+        run: git fetch origin ${{ github.base_ref }}
+      - uses: actions/setup-python@v4
+        with:
+          python-version: "3.8"
+          architecture: x64
+      - name: Get pip cache dir
+        id: pip-cache
+        run: |
+          echo "dir=$(pip cache dir)" >> $GITHUB_OUTPUT
+      - name: pip/pre-commit cache
+        uses: actions/cache@v3
+        with:
+          path: |
+            ${{ steps.pip-cache.outputs.dir }}
+            ~/.cache/pre-commit
+          key: ${{ runner.os }}-pip-pre-commit-${{ hashFiles('**/.pre-commit-config.yaml') }}
+          restore-keys: |
+            ${{ runner.os }}-pip-pre-commit
+      - name: pre-commit
+        run: |
+          pip install -U pre-commit
+          pre-commit install --install-hooks
+          pre-commit run --all-files
   whisper-test:
+    needs: pre-commit
     runs-on: ubuntu-latest
     strategy:
       matrix:
@@ -23,7 +53,4 @@ jobs:
       - uses: actions/checkout@v3
       - run: echo "$CONDA/envs/test/bin" >> $GITHUB_PATH
       - run: pip install .["dev"]
-      - run: black --check --diff -t py38 --include '(\.pyi?)$' .
-      - run: isort --check --diff .
-      - run: flake8 --ignore E203,W503,W504,E501,E731,E741 .
       - run: pytest --durations=0 -vv -k 'not test_transcribe or test_transcribe[tiny] or test_transcribe[tiny.en]' -m 'not requires_cuda'
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,28 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.0.1
+    hooks:
+      - id: check-json
+      - id: end-of-file-fixer
+        types: [file, python]
+      - id: trailing-whitespace
+        types: [file, python]
+      - id: mixed-line-ending
+      - id: check-added-large-files
+        args: [--maxkb=4096]
+  - repo: https://github.com/psf/black
+    rev: 23.7.0
+    hooks:
+      - id: black
+  - repo: https://github.com/pycqa/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+        name: isort (python)
+        args: ["--profile", "black", "-l", "88", "--trailing-comma", "--multi-line", "3"]
+  - repo: https://github.com/pycqa/flake8.git
+    rev: 6.0.0
+    hooks:
+      - id: flake8
+        types: [python]
+        args: ["--max-line-length", "88", "--ignore", "E203,E501,W503,W504"]
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,45 @@
 # CHANGELOG
 
+## [v20231117](https://github.com/openai/whisper/releases/tag/v20231117)
+
+* Relax triton requirements for compatibility with pytorch 2.1 and newer ([#1802](https://github.com/openai/whisper/pull/1802))
+
+## [v20231106](https://github.com/openai/whisper/releases/tag/v20231106)
+
+* large-v3 ([#1761](https://github.com/openai/whisper/pull/1761))
+
+## [v20231105](https://github.com/openai/whisper/releases/tag/v20231105)
+
+* remove tiktoken pin ([#1759](https://github.com/openai/whisper/pull/1759))
+* docs: Disambiguation of the term "relative speed" in the README ([#1751](https://github.com/openai/whisper/pull/1751))
+* allow_pickle=False while loading of mel matrix IN audio.py ([#1511](https://github.com/openai/whisper/pull/1511))
+* handling transcribe exceptions. ([#1682](https://github.com/openai/whisper/pull/1682))
+* Add new option to generate subtitles by a specific number of words ([#1729](https://github.com/openai/whisper/pull/1729))
+* Fix exception when an audio file with no speech is provided ([#1396](https://github.com/openai/whisper/pull/1396))
+
+## [v20230918](https://github.com/openai/whisper/releases/tag/v20230918)
+
+* Add .pre-commit-config.yaml ([#1528](https://github.com/openai/whisper/pull/1528))
+* fix doc of TextDecoder ([#1526](https://github.com/openai/whisper/pull/1526))
+* Update model-card.md ([#1643](https://github.com/openai/whisper/pull/1643))
+* word timing tweaks ([#1559](https://github.com/openai/whisper/pull/1559))
+* Avoid rearranging all caches ([#1483](https://github.com/openai/whisper/pull/1483))
+* Improve timestamp heuristics. ([#1461](https://github.com/openai/whisper/pull/1461))
+* fix condition_on_previous_text ([#1224](https://github.com/openai/whisper/pull/1224))
+* Fix numba depreceation notice ([#1233](https://github.com/openai/whisper/pull/1233))
+* Updated README.md to provide more insight on BLEU and specific appendices ([#1236](https://github.com/openai/whisper/pull/1236))
+* Avoid computing higher temperatures on no_speech segments ([#1279](https://github.com/openai/whisper/pull/1279))
+* Dropped unused execute bit from mel_filters.npz. ([#1254](https://github.com/openai/whisper/pull/1254))
+* Drop ffmpeg-python dependency and call ffmpeg directly. ([#1242](https://github.com/openai/whisper/pull/1242))
+* Python 3.11 ([#1171](https://github.com/openai/whisper/pull/1171))
+* Update decoding.py ([#1219](https://github.com/openai/whisper/pull/1219))
+* Update decoding.py ([#1155](https://github.com/openai/whisper/pull/1155))
+* Update README.md to reference tiktoken ([#1105](https://github.com/openai/whisper/pull/1105))
+* Implement max line width and max line count, and make word highlighting optional ([#1184](https://github.com/openai/whisper/pull/1184))
+* Squash long words at window and sentence boundaries. ([#1114](https://github.com/openai/whisper/pull/1114))
+* python-publish.yml: bump actions version to fix node warning ([#1211](https://github.com/openai/whisper/pull/1211))
+* Update tokenizer.py ([#1163](https://github.com/openai/whisper/pull/1163))
+
 ## [v20230314](https://github.com/openai/whisper/releases/tag/v20230314)
 
 * abort find_alignment on empty input ([#1090](https://github.com/openai/whisper/pull/1090))

diff --git a/README.md b/README.md
@@ -57,8 +57,7 @@ pip install setuptools-rust
 
 ## Available models and languages
 
-There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed. 
-
+There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
 
 |  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
 |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
@@ -70,9 +69,9 @@ There are five model sizes, four with English-only versions, offering speed and
 
 The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
 
-Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4. Meanwhile, more BLEU (Bilingual Evaluation Understudy) scores can be found in Appendix D.3. Both are found in [the paper](https://arxiv.org/abs/2212.04356). 
+Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.
 
-![WER breakdown by language](https://raw.githubusercontent.com/openai/whisper/main/language-breakdown.svg)
+![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)