Recent

Author Topic: Speech recognition?  (Read 10538 times)

JJJ

  • New Member
  • *
  • Posts: 20
Speech recognition?
« on: April 12, 2018, 06:04:43 pm »
After more than 15 years back to pascal programming with Lazarus.

Does anybody know is there any speech recognition engine that works together with Lazarus on Win 10?
The purpose is to get a voice command to my app.

Thanks,

JJJ

Thaddy

  • Hero Member
  • *****
  • Posts: 16162
  • Censorship about opinions does not belong here.
Re: Speech recognition?
« Reply #1 on: April 12, 2018, 06:16:11 pm »
It comes with Windows and has a COM interface.... ? Google it...
It is thus free... and easy to use...
If I smell bad code it usually is bad code and that includes my own code.

Trenatos

  • Hero Member
  • *****
  • Posts: 537
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #2 on: April 12, 2018, 06:29:54 pm »

Thaddy

  • Hero Member
  • *****
  • Posts: 16162
  • Censorship about opinions does not belong here.
Re: Speech recognition?
« Reply #3 on: April 12, 2018, 06:33:29 pm »
https://github.com/r1me/TPocketSphinx
There is really no need for that when Microsoft already provides something vastly superior as standard in every Windows version.
It only needs a type library import and some knowledge about COM.
Note that the current versions can also use the cloud and are even more powerful, but you need a license for that.
Also note that the old versions are limited to just a few languages, but free.
« Last Edit: April 12, 2018, 06:36:11 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

rvk

  • Hero Member
  • *****
  • Posts: 6577
Re: Speech recognition?
« Reply #4 on: April 12, 2018, 06:40:25 pm »
I remember a topic back in 2015 for the SpeechToText API.
https://forum.lazarus.freepascal.org/index.php/topic,28952.0.html

I already had the SpeechLib_TLB.zip in that topic and I could make a working example for English.

Note that the current versions can also use the cloud and are even more powerful, but you need a license for that.
Neat  8-)
https://cloud.google.com/speech-to-text/
« Last Edit: April 12, 2018, 06:44:26 pm by rvk »

JJJ

  • New Member
  • *
  • Posts: 20
Re: Speech recognition?
« Reply #5 on: April 12, 2018, 07:42:38 pm »
Thanks guys!

I'm not familiar with Windows so I had no idea that speech engine already exists.

rvk: I checked a topic, tested your example but I got an EOleSysError on CoSpSharedRecoContext.Create.
Have to play with that later.





Trenatos

  • Hero Member
  • *****
  • Posts: 537
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #6 on: April 12, 2018, 07:45:21 pm »
The Microsoft bundled speech recognition is vastly superior when using local-only (No network/server/cloud)?

Thaddy

  • Hero Member
  • *****
  • Posts: 16162
  • Censorship about opinions does not belong here.
Re: Speech recognition?
« Reply #7 on: April 12, 2018, 09:15:32 pm »
Note that are at least three well working API's:
- Amazon Alexa
- Google Home
- Microsoft Cortana (and build in old school off-line)
The MS api also works - most of the time - off-line, but Google works best, then Cortana(ms) and Amazon(Alexa) is not bad either. They also all work on e.g. a Raspberry Pi for a poor man's IoT controller.
You can do amazing things with them, on-line connected.

MS Windows off-line solution still works with Rik's code. All are limited to a couple of languages. English works best. And I only tested English.

All these work with FPC.
« Last Edit: April 12, 2018, 09:21:17 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

Trenatos

  • Hero Member
  • *****
  • Posts: 537
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #8 on: April 12, 2018, 09:20:39 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.

Thaddy

  • Hero Member
  • *****
  • Posts: 16162
  • Censorship about opinions does not belong here.
Re: Speech recognition?
« Reply #9 on: April 12, 2018, 09:22:15 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.
It is not a hack: the browser interfaces are built on top of that, it is separate and FPC comes with the necessary interfaces by default. And these work on linux too....
If you mean you need to be on-line? true. PocketSphinx is limited and if your platform is Windows, use the MS api's off-line.
Also note there was already crude but working SR software for the Commodore 64...

Not rocket science to experiment with a DFT - using fixed length math -(FFT is not necessary and too slow on a 6510) and a look-up and a trainer to match/fill the look-up right (average weight).
It becomes rocket science to me when you try to match current speech recognition...I have not been able to even comprehend that.... %)
« Last Edit: April 12, 2018, 09:39:11 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

cpicanco

  • Hero Member
  • *****
  • Posts: 655
  • Behavioral Scientist and Programmer
    • Portfolio
Re: Speech recognition?
« Reply #10 on: September 23, 2023, 04:14:43 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.
It is not a hack: the browser interfaces are built on top of that, it is separate and FPC comes with the necessary interfaces by default. And these work on linux too....
If you mean you need to be on-line? true. PocketSphinx is limited and if your platform is Windows, use the MS api's off-line.
Also note there was already crude but working SR software for the Commodore 64...

Not rocket science to experiment with a DFT - using fixed length math -(FFT is not necessary and too slow on a 6510) and a look-up and a trainer to match/fill the look-up right (average weight).
It becomes rocket science to me when you try to match current speech recognition...I have not been able to even comprehend that.... %)

Hi Thaddy, I am wondering how would you answer this question today.

I am looking for a speech-recognition and text-to-speech solution that would allow me to create my own invented words for a COLANG, words based on IPA from common brazillian portuguese phonemes. For the text-to-speech part, windows SAPI speech synthesis is a little by robotic, but it is enough for development purposes now. A friend suggested that Whisper AI may help.

Any suggestions?

Best,
R
Be mindful and excellent with each other.
https://github.com/cpicanco/

TRon

  • Hero Member
  • *****
  • Posts: 3623
Re: Speech recognition?
« Reply #11 on: April 12, 2024, 02:48:42 am »
...
A friend suggested that Whisper AI may help.

Any suggestions?
User PaulANormanN stumbled upon a Pascal implementation (wrapper) using whisper.cpp (can be build into a library) here. It can at least help with speech recognition.
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

csukuangfj

  • New Member
  • *
  • Posts: 16
Re: Speech recognition?
« Reply #12 on: August 20, 2024, 04:00:05 am »
After more than 15 years back to pascal programming with Lazarus.

Does anybody know is there any speech recognition engine that works together with Lazarus on Win 10?
The purpose is to get a voice command to my app.

Thanks,

JJJ

I suggest that you have a look at
https://github.com/k2-fsa/sherpa-onnx

It supports not only Windows but also macOS and Linux.
Everything is open-sourced.

It supports running speech recognition locally on your CPU. You don't need a GPU to run it. You also don't need to access the network. Everything is processed locally.

It supports not only Whisper but also other kinds of models, e.g.,  Zipformer and Paraformer.

You can find the documentation at
https://k2-fsa.github.io/sherpa/onnx/lazarus/index.html

and

https://k2-fsa.github.io/sherpa/onnx/pascal-api/index.html

Pre-built Lazarus APP for generating subtitles can be found at

https://k2-fsa.github.io/sherpa/onnx/lazarus/download-generated-subtitles.html


Note it also supports text to speech, though the implementation is in C++ and has not been wrapped to Object Pascal.
(The wrapping should be straightforward as we have already done that for speech recognition.)

Thaddy

  • Hero Member
  • *****
  • Posts: 16162
  • Censorship about opinions does not belong here.
Re: Speech recognition?
« Reply #13 on: August 20, 2024, 06:34:06 am »
For just Windows SAPI remains easiest.
Example that can be made to work in fpc:
http://www.devsuperpage.com/Articles/views/Delphi/Art_1-2290.asp
If I smell bad code it usually is bad code and that includes my own code.

 

TinyPortal © 2005-2018