Replies: 7 comments 2 replies
-
different hardware, no comparison. I am using faster-whipser on a 1070ti. about 500ms for transcribing a everyday sentence by large-v2, very nice for me. |
Beta Was this translation helpful? Give feedback.
-
Hi, I don’t have access to a Mac at this time so hopefully someone else can share some insights. If someone runs a comparison, make sure to compare with equivalent options. Here are the options to match the default values in [email protected]: segments, _ = model.transcribe(
audio,
language="en",
beam_size=1,
best_of=2,
temperature=[0.0, 0.4, 0.8],
suppress_tokens=None,
)
segments = list(segments) Note that we now also support 8-bit computation on Apple Silicon. It could be added to the comparison. model = WhisperModel(…, compute_type="int8", cpu_threads=6) |
Beta Was this translation helpful? Give feedback.
-
I measured it with whiper.cpp sample audio, gb1.wav (198.7 sec)
|
Beta Was this translation helpful? Give feedback.
-
Thank you for these results! It might also be interesting to get some numbers for a larger beam size (the Whisper paper recommends 5). One thing we should improve in Apple Silicon builds is multi threading. For example the fp32 run is always using a single core at the moment. And the int8 run only uses multiple threads for the quantized matrix multiplication. See this issue OpenNMT/CTranslate2#1137 where some users tried to compile CTranslate2 with OpenMP on macOS ARM64. I will update this thread if we have something new. |
Beta Was this translation helpful? Give feedback.
-
I just released @neurostar Could you renew the benchmark when you have some time? I also suggest to disable the "temperature fallback" to prevent any irregular transcription time.
import faster_whisper
model = faster_whisper.WhisperModel(..., device="cpu", cpu_threads=8, compute_type="float32")
segments, _ = model.transcribe(
"gb1.wav",
language="en",
beam_size=1,
temperature=0,
suppress_tokens=None,
)
segments = list(segments) Would be cool to also get the results for a beam size of 5. Thanks! |
Beta Was this translation helpful? Give feedback.
-
The same gb1.wav used as previously. Since M2 Air 8G was swapping memory for large fp32 (~6.4GB, short peak at 11.3GB), I did not include fp32 large. There seem to be better performance of 0.51 with bs1 but worse with bs5. The previous measures on faster-whisper did not include model loading time. whisper.cpp ANE only encoder part is running in ANE due to decoder is slower in ANE, so the bs5 setting significantly increase total time. I've read somewhere that decoder of whiper.cpp is not multi-threaded, so performance decrease of bs5 seems to be greater. For large model, faster-whisper 0.51 int8 bs5 is faster than whisper.cpp ANE bs5.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for all these numbers. The next step for us is to explore FP16 execution on the CPU which could improve performance on ARM CPUs: OpenNMT/CTranslate2#1153 |
Beta Was this translation helpful? Give feedback.
-
@guillaumekln Hello! I am developing a real-time ASR running on both Mac OS and Windows, is faster-whisper faster than whisper.cpp with CoreML support on Mac OS?
Beta Was this translation helpful? Give feedback.
All reactions