Recent

Author Topic: Speech recognition?  (Read 7965 times)

JJJ

  • New Member
  • *
  • Posts: 20
Speech recognition?
« on: April 12, 2018, 06:04:43 pm »
After more than 15 years back to pascal programming with Lazarus.

Does anybody know is there any speech recognition engine that works together with Lazarus on Win 10?
The purpose is to get a voice command to my app.

Thanks,

JJJ

Thaddy

  • Hero Member
  • *****
  • Posts: 14159
  • Probably until I exterminate Putin.
Re: Speech recognition?
« Reply #1 on: April 12, 2018, 06:16:11 pm »
It comes with Windows and has a COM interface.... ? Google it...
It is thus free... and easy to use...
Specialize a type, not a var.

Trenatos

  • Hero Member
  • *****
  • Posts: 533
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #2 on: April 12, 2018, 06:29:54 pm »

Thaddy

  • Hero Member
  • *****
  • Posts: 14159
  • Probably until I exterminate Putin.
Re: Speech recognition?
« Reply #3 on: April 12, 2018, 06:33:29 pm »
https://github.com/r1me/TPocketSphinx
There is really no need for that when Microsoft already provides something vastly superior as standard in every Windows version.
It only needs a type library import and some knowledge about COM.
Note that the current versions can also use the cloud and are even more powerful, but you need a license for that.
Also note that the old versions are limited to just a few languages, but free.
« Last Edit: April 12, 2018, 06:36:11 pm by Thaddy »
Specialize a type, not a var.

rvk

  • Hero Member
  • *****
  • Posts: 6056
Re: Speech recognition?
« Reply #4 on: April 12, 2018, 06:40:25 pm »
I remember a topic back in 2015 for the SpeechToText API.
https://forum.lazarus.freepascal.org/index.php/topic,28952.0.html

I already had the SpeechLib_TLB.zip in that topic and I could make a working example for English.

Note that the current versions can also use the cloud and are even more powerful, but you need a license for that.
Neat  8-)
https://cloud.google.com/speech-to-text/
« Last Edit: April 12, 2018, 06:44:26 pm by rvk »

JJJ

  • New Member
  • *
  • Posts: 20
Re: Speech recognition?
« Reply #5 on: April 12, 2018, 07:42:38 pm »
Thanks guys!

I'm not familiar with Windows so I had no idea that speech engine already exists.

rvk: I checked a topic, tested your example but I got an EOleSysError on CoSpSharedRecoContext.Create.
Have to play with that later.





Trenatos

  • Hero Member
  • *****
  • Posts: 533
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #6 on: April 12, 2018, 07:45:21 pm »
The Microsoft bundled speech recognition is vastly superior when using local-only (No network/server/cloud)?

Thaddy

  • Hero Member
  • *****
  • Posts: 14159
  • Probably until I exterminate Putin.
Re: Speech recognition?
« Reply #7 on: April 12, 2018, 09:15:32 pm »
Note that are at least three well working API's:
- Amazon Alexa
- Google Home
- Microsoft Cortana (and build in old school off-line)
The MS api also works - most of the time - off-line, but Google works best, then Cortana(ms) and Amazon(Alexa) is not bad either. They also all work on e.g. a Raspberry Pi for a poor man's IoT controller.
You can do amazing things with them, on-line connected.

MS Windows off-line solution still works with Rik's code. All are limited to a couple of languages. English works best. And I only tested English.

All these work with FPC.
« Last Edit: April 12, 2018, 09:21:17 pm by Thaddy »
Specialize a type, not a var.

Trenatos

  • Hero Member
  • *****
  • Posts: 533
    • MarcusFernstrom.com
Re: Speech recognition?
« Reply #8 on: April 12, 2018, 09:20:39 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.

Thaddy

  • Hero Member
  • *****
  • Posts: 14159
  • Probably until I exterminate Putin.
Re: Speech recognition?
« Reply #9 on: April 12, 2018, 09:22:15 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.
It is not a hack: the browser interfaces are built on top of that, it is separate and FPC comes with the necessary interfaces by default. And these work on linux too....
If you mean you need to be on-line? true. PocketSphinx is limited and if your platform is Windows, use the MS api's off-line.
Also note there was already crude but working SR software for the Commodore 64...

Not rocket science to experiment with a DFT - using fixed length math -(FFT is not necessary and too slow on a 6510) and a look-up and a trainer to match/fill the look-up right (average weight).
It becomes rocket science to me when you try to match current speech recognition...I have not been able to even comprehend that.... %)
« Last Edit: April 12, 2018, 09:39:11 pm by Thaddy »
Specialize a type, not a var.

cpicanco

  • Hero Member
  • *****
  • Posts: 618
  • Behavioral Scientist and Programmer
    • Portfolio
Re: Speech recognition?
« Reply #10 on: September 23, 2023, 04:14:43 pm »
Last time I checked, the Google thing is a hack to mimic what the browser voice recognition does, not sure about the Amazon and Google online APIs, definitely checking that out.

I started a project last year or so where I wanted voice recognition on Linux, using only an offline engine, that's how I ended up at PocketSphinx.
It is not a hack: the browser interfaces are built on top of that, it is separate and FPC comes with the necessary interfaces by default. And these work on linux too....
If you mean you need to be on-line? true. PocketSphinx is limited and if your platform is Windows, use the MS api's off-line.
Also note there was already crude but working SR software for the Commodore 64...

Not rocket science to experiment with a DFT - using fixed length math -(FFT is not necessary and too slow on a 6510) and a look-up and a trainer to match/fill the look-up right (average weight).
It becomes rocket science to me when you try to match current speech recognition...I have not been able to even comprehend that.... %)

Hi Thaddy, I am wondering how would you answer this question today.

I am looking for a speech-recognition and text-to-speech solution that would allow me to create my own invented words for a COLANG, words based on IPA from common brazillian portuguese phonemes. For the text-to-speech part, windows SAPI speech synthesis is a little by robotic, but it is enough for development purposes now. A friend suggested that Whisper AI may help.

Any suggestions?

Best,
R
Be mindful and excellent with each other.
https://github.com/cpicanco/

 

TinyPortal © 2005-2018