Skip to content

Commit

Permalink
Use ndimage.median_filter instead of signal.medfilter (openai#812)
Browse files Browse the repository at this point in the history
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.

Co-authored-by: Umar Farooqi <[email protected]>
Co-authored-by: Jong Wook Kim <[email protected]>
  • Loading branch information
3 people authored Jan 17, 2023
1 parent a84191f commit f0083e7
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions notebooks/Multilingual_ASR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -874,7 +874,7 @@
"from IPython.display import display, HTML\n",
"from whisper.tokenizer import get_tokenizer\n",
"from dtw import dtw\n",
"from scipy.signal import medfilt\n",
"from scipy.ndimage import median_filter\n",
"\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = \"retina\""
Expand Down Expand Up @@ -3610,7 +3610,7 @@
"\n",
" weights = torch.cat(QKs) # layers * heads * tokens * frames \n",
" weights = weights[:, :, :, : duration // AUDIO_SAMPLES_PER_TOKEN].cpu()\n",
" weights = medfilt(weights, (1, 1, 1, medfilt_width))\n",
" weights = median_filter(weights, (1, 1, 1, medfilt_width))\n",
" weights = torch.tensor(weights * qk_scale).softmax(dim=-1)\n",
" \n",
" w = weights / weights.norm(dim=-2, keepdim=True)\n",
Expand Down

0 comments on commit f0083e7

Please sign in to comment.