Symbols are taken as plain text ("at" instead of @) #2320
-
When I try to provide an email id, symbols like @ are printed as plain text. For example the audio contains is My email is data@test. com How can I solve this issue so that it can show symbols instead of plain text in output? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hi, you might be able to it with just by postprocessing the result (via scripting, or whatever...). Other than that, prompting/finetuning. There are old posts about those. Note that my use case is very different and it doesn't involve any email addresses, |
Beta Was this translation helpful? Give feedback.
-
In addition to the options listed above, there's also the obvious have-the-compute-dont-care answer... As you perhaps have realized by now, the problem with prompting is that Whisper doesn't really know the context at the same level as you. Actually I don't like this method very much (seems excessive), but who cares if it gets the job done. |
Beta Was this translation helpful? Give feedback.
-
@Rumeysakeskin Hello, you can use my tool https://github.com/gongouveia/Whisper-Synthetic-ASR-Dataset-Generator to simply fine tune the model with a few dozen examples, you can record some samples of audio saying emails, translate it with whisper and edit the answers. In the end you can just fine tune whisper with this edited answers. |
Beta Was this translation helpful? Give feedback.
Hi, you might be able to it with just by postprocessing the result (via scripting, or whatever...).
This means that you would be looking for signs of a domain name that comes after the word "at".
The complication is that nowadays you can have a lot of variety in domain names, it's not plain old dot com.
Other than that, prompting/finetuning. There are old posts about those.
I've used prompting myself with varying success, it's worth trying out imho (no coding required!).
Basically you just give Whisper some examples of how you want the text to look like.
Note that my use case is very different and it doesn't involve any email addresses,
so perhaps someone else is able to give you better a…