faster-whisper vs whisper.cpp with CoreML #368

jlia0 · 2023-04-21T18:03:54Z

jlia0
Apr 21, 2023

@guillaumekln Hello! I am developing a real-time ASR running on both Mac OS and Windows, is faster-whisper faster than whisper.cpp with CoreML support on Mac OS?

ILG2021 · 2023-04-22T07:00:19Z

ILG2021
Apr 22, 2023

different hardware, no comparison. I am using faster-whipser on a 1070ti. about 500ms for transcribing a everyday sentence by large-v2, very nice for me.

0 replies

guillaumekln · 2023-04-22T14:56:01Z

guillaumekln
Apr 22, 2023

Hi,

I don’t have access to a Mac at this time so hopefully someone else can share some insights.

If someone runs a comparison, make sure to compare with equivalent options.

Here are the options to match the default values in [email protected]:

segments, _ = model.transcribe(
    audio,
    language="en",
    beam_size=1,
    best_of=2,
    temperature=[0.0, 0.4, 0.8],
    suppress_tokens=None,
)
segments = list(segments)

Note that we now also support 8-bit computation on Apple Silicon. It could be added to the comparison.

model = WhisperModel(…, compute_type="int8", cpu_threads=6)

0 replies

neurostar · 2023-04-23T03:47:07Z

neurostar
Apr 23, 2023

I measured it with whiper.cpp sample audio, gb1.wav (198.7 sec)
Tested on M2 air 8GB, measured twice pick smaller.

type	tiny	base	small	medium	large-v2
faster-whisper fp32	3.80	7.38	20.53	59.42	245.96
faster-whisper int8	3.78	7.00	20.05	63.17	113.03
whisper.cpp CPU	2.73	4.68	14.69	42.86	78.62
whisper.cpp ANE	1.79	3.18	8.53	23.88	51.01

1 reply

ushnah Nov 13, 2024

how have you used faster whisper on M2?

guillaumekln · 2023-04-24T05:57:45Z

guillaumekln
Apr 24, 2023

Thank you for these results!

It might also be interesting to get some numbers for a larger beam size (the Whisper paper recommends 5).

One thing we should improve in Apple Silicon builds is multi threading. For example the fp32 run is always using a single core at the moment. And the int8 run only uses multiple threads for the quantized matrix multiplication.

See this issue OpenNMT/CTranslate2#1137 where some users tried to compile CTranslate2 with OpenMP on macOS ARM64.

I will update this thread if we have something new.

0 replies

guillaumekln · 2023-04-26T09:39:32Z

guillaumekln
Apr 26, 2023

I just released ctranslate2==3.13.0 which includes some multithreading improvements on Apple Silicon.

@neurostar Could you renew the benchmark when you have some time? I also suggest to disable the "temperature fallback" to prevent any irregular transcription time.

./main --file gb1.wav --language en --threads 8 --no-fallback --beam-size 1

import faster_whisper

model = faster_whisper.WhisperModel(..., device="cpu", cpu_threads=8, compute_type="float32")

segments, _ = model.transcribe(
    "gb1.wav",
    language="en",
    beam_size=1,
    temperature=0,
    suppress_tokens=None,
)
segments = list(segments)

Would be cool to also get the results for a beam size of 5.

Thanks!

0 replies

neurostar · 2023-04-27T02:55:53Z

neurostar
Apr 27, 2023

The same gb1.wav used as previously.
These are --no-fallback --beam-size 1 and --beam-size 5

Since M2 Air 8G was swapping memory for large fp32 (~6.4GB, short peak at 11.3GB), I did not include fp32 large.
I used 4 threads, the regular M2 have only 4 P cores, 8 threads is not performing well.
There is one faster-whisper 0.51 int8 bs5 t8 with 8 thread for reference.

There seem to be better performance of 0.51 with bs1 but worse with bs5.

The previous measures on faster-whisper did not include model loading time.
This include model loading time for all.
whisper.cpp ANE ran twice because some kind of caching happen when model is loaded first time which increase model loading time significantly. The subsequent loading of same model have normal model loading time.

whisper.cpp ANE only encoder part is running in ANE due to decoder is slower in ANE, so the bs5 setting significantly increase total time. I've read somewhere that decoder of whiper.cpp is not multi-threaded, so performance decrease of bs5 seems to be greater. For large model, faster-whisper 0.51 int8 bs5 is faster than whisper.cpp ANE bs5.

type	tiny	base	small	medium	large-v2
faster-whisper 0.41 fp32 bs1	4.10	7.93	21.25	61.12	x
faster-whisper 0.51 fp32 bs1	4.59	8.30	22.30	59.32	x
faster-whisper 0.41 int8 bs1	4.08	7.61	21.99	62.61	118.52
faster-whisper 0.51 int8 bs1	4.03	7.07	19.89	57.32	105.56
faster-whisper 0.41 fp32 bs5	7.21	16.20	43.40	125.47	x
faster-whisper 0.51 fp32 bs5	10.68	24.63	63.81	137.22	x
faster-whisper 0.41 int8 bs5	5.23	10.38	27.07	78.70	177.39
faster-whisper 0.51 int8 bs5	6.81	13.16	31.74	86.75	175.17
faster-whisper 0.51 int8 bs5 t8	9.11	15.16	36.43	98.19	205.31
whisper.cpp CPU bs1	2.68	4.73	15.18	50.11	90.34
whisper.cpp ANE bs1	1.74	2.80	8.09	22.81	57.28
whisper.cpp CPU bs5	7.19	12.42	39.26	108.66	217.79
whisper.cpp ANE bs5	6.19	10.51	29.49	78.37	179.60

1 reply

ushnah Nov 13, 2024

how have you used faster whisper on mps/M2?

guillaumekln · 2023-05-15T09:07:20Z

guillaumekln
May 15, 2023

Thanks for all these numbers.

The next step for us is to explore FP16 execution on the CPU which could improve performance on ARM CPUs: OpenNMT/CTranslate2#1153

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster-whisper vs whisper.cpp with CoreML #368

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

faster-whisper vs whisper.cpp with CoreML #368

Replies: 7 comments · 2 replies

Replies: 7 comments 2 replies