Forum > Third party

[AI][Speech Recognition]Run speech recognition locally on CPU with object pascal

(1/4) > >>

csukuangfj:
Hi all,

Just want to share an open-source project, https://github.com/k2-fsa/sherpa-onnx, with you. It provides Object Pascal APIs
for speech recognition that runs locally on CPU.

Currently, it supports the following models

- Whisper
- Zipformer
- Paraformer
- SenseVoice
- TeleSpeech ASR

It has been tested on the following platforms
- Linux
- macOS
- Windows

You can find the documentation at
https://k2-fsa.github.io/sherpa/onnx/pascal-api/index.html


We also provide an example using Lazarus that generates subtitles.
The documentation is at
https://k2-fsa.github.io/sherpa/onnx/lazarus/generate-subtitles.html



Attached are some screenshots for running generating subtitles on different platforms.

Pre-built Lazarus APPs for generating subtitles can be found at
https://k2-fsa.github.io/sherpa/onnx/lazarus/download-generated-subtitles.html




TRon:
I post because you also reacted in this thread about text to speech.

I had a few minutes to spare and tried TTS (instructions where buried deep in the documentation) and my results were very poor.

For me it did not even sounded as human speech but rather some sort of balloon that inflated with here and there a consonant (I assume that the German model that I downloaded was enough pre-trained).

My poor results might perhaps be due to the fact that I do no fully understand how to set things up correctly or provide the correct parameters. Would love to discuss further but at the moment have very little time to invest.

Just letting you know in case you are not aware.

csukuangfj:
Could you describe in detail what commands you have used?

We have a hugginface space for you to try it from within your browser
https://huggingface.co/spaces/k2-fsa/text-to-speech

You don't need to install anything to use it.

The quality you get from the above huggingface space is the same as the one you would get once we wrap it to object pascal.

TRon:

--- Quote from: csukuangfj on August 20, 2024, 10:07:37 am ---Could you describe in detail what commands you have used?

--- End quote ---
Sure.

Btw I was mistaken about the language (I tested several and got confused). It was actually the dutch voice that I tested that seem to act strange for me.



--- Code: Bash  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---mkdir sherpa && cd sherpawget https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.10.22/sherpa-onnx-v1.10.22-linux-x64-static.tar.bz2tar xf sherpa-onnx-v1.10.22-linux-x64-static.tar.bz2 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-nl_NL-mls-medium.tar.bz2tar xf vits-piper-nl_NL-mls-medium.tar.bz2 ./sherpa-onnx-v1.10.22-linux-x64-static/bin/sherpa-onnx-offline-tts  --vits-model=./vits-piper-nl_NL-mls-medium/nl_NL-mls-medium.onnx   --vits-tokens=./vits-piper-nl_NL-mls-medium/tokens.txt   --vits-data-dir=./vits-piper-nl_NL-mls-medium/espeak-ng-data   --output-filename=./hallo.wav   "hallo wereld" 
Result wav attached. (you probably need to login in order to be able to see and download the attachment, happens sometimes when the forum is experiencing issues)

You would have to verify with a native Dutch speaker to make sure that the output is wrong but phonetically the wav file makes no sense to me (I know only a little dutch).

csukuangfj:
I see. Turns out the model filenames containing `mls` don't perform well.

Could you try other models, e.g.,
vits-piper-nl_BE-nathalie-medium

I am deleting the `mls` models for dutch.

Navigation

[0] Message Index

[#] Next page

Go to full version