Recent

Author Topic: Any successful experience with SpeechToText API on Windows?  (Read 18457 times)

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Any successful experience with SpeechToText API on Windows?
« Reply #30 on: July 03, 2015, 07:59:45 pm »
Well, now we have a successful experience with this.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Any successful experience with SpeechToText API on Windows?
« Reply #31 on: July 12, 2015, 06:58:59 pm »
I have built FPC from SVN source and tried to rebuild Listening project in 64-bit Windows/Lazarus and could not get this project working.

rvk

  • Hero Member
  • *****
  • Posts: 6112
Re: Any successful experience with SpeechToText API on Windows?
« Reply #32 on: July 13, 2015, 11:41:56 am »
I have built FPC from SVN source and tried to rebuild Listening project in 64-bit Windows/Lazarus and could not get this project working.
No problems here with Lazarus 1.4 64bit
(What project are you using. My last listening.zip worked without problems)

Lazarus 1.4.0 r48774 FPC 2.6.4 x86_64-win64-win32/win64


(I'm not sure how I would build a 64bit Lazarus from the sources)

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Any successful experience with SpeechToText API on Windows?
« Reply #33 on: July 13, 2015, 12:06:19 pm »
Do you actually run this project without problems? Don't you have any Acces Violation at runtime?

rvk

  • Hero Member
  • *****
  • Posts: 6112
Re: Any successful experience with SpeechToText API on Windows?
« Reply #34 on: July 13, 2015, 12:15:39 pm »
Do you actually run this project without problems? Don't you have any Acces Violation at runtime?
Yep. I had no problems whatsoever compiling and running this under Laz 1.4 64bit.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Any successful experience with SpeechToText API on Windows?
« Reply #35 on: July 13, 2015, 12:44:37 pm »
I have gave up, I am back to 32-bit Lazarus.

rvk

  • Hero Member
  • *****
  • Posts: 6112
Re: Any successful experience with SpeechToText API on Windows?
« Reply #36 on: July 13, 2015, 01:48:58 pm »
I just adjusted my script to install FPC/Lazarus win64 version.
I also had no trouble compiling it in that.

I did have a problem... where can I find GDB.EXE for 64bit?

In binutils for win32 there is a 32bit GDB.EXE version.
But in binutils for win64 it's nowhere to be found.

I could take the one from the Lazarus 64bit installation (in mingw\x86_64-win64\bin) but isn't this gdb.exe 64bit downloadable like it is with the 32bit version?

(And I never got fpcup64 and fpcup working because of a "doskey" error. That's why I created my own script. So I'm not sure where fpcup gets its gdb's)


Edit: Ok, I found it... It's here:
http://svn2.freepascal.org/svn/lazarus/binaries/x86_64-win64/gdb/bin/
Nicely tucked away in the root instead of binutils :)

(now I can adjust my script to get it from there)
« Last Edit: July 13, 2015, 01:52:03 pm by rvk »

PawelO

  • New Member
  • *
  • Posts: 16
    • Polish railway traffic and interlocking simulator developed with Lazarus
Re: Any successful experience with SpeechToText API on Windows?
« Reply #37 on: March 09, 2024, 04:26:06 pm »
I have gave up, I am back to 32-bit Lazarus.

Just in case anyone is trying to get SAPI to work, finds this topis and has same error (Access violation on creating ActiveX object) (as I yesterday): I had to install English language in Windows Settings and then activate Windows Speech Recognition (I had different language in Windows, which is not supported by Speech Recognition, and apparently you cannot use English Speech Recognition engine with non-English Windows user interface... which could be quite a problem for application end-user). After that error goes away and example from this topic starts working.

jamie

  • Hero Member
  • *****
  • Posts: 6091
Re: Any successful experience with SpeechToText API on Windows?
« Reply #38 on: March 09, 2024, 04:35:11 pm »
I have code for the other way round. TextToSpeach !
The only true wisdom is knowing you know nothing

Thaddy

  • Hero Member
  • *****
  • Posts: 14213
  • Probably until I exterminate Putin.
Re: Any successful experience with SpeechToText API on Windows?
« Reply #39 on: March 09, 2024, 05:55:32 pm »
I too. I wrote an article in the late 90's early 20's about SAPI for UNDU. Used the parrot.
Btw Pawel, I am not aware Polish was ever supported.

Microsoft provides a rest API for speech to text and text to speech and that does support Polish.
Requires a key, Internet connection and is eventually a payed service.
« Last Edit: March 09, 2024, 05:59:27 pm by Thaddy »
Specialize a type, not a var.

PawelO

  • New Member
  • *
  • Posts: 16
    • Polish railway traffic and interlocking simulator developed with Lazarus
Re: Any successful experience with SpeechToText API on Windows?
« Reply #40 on: March 09, 2024, 07:09:13 pm »
Yes, I know there is no SAPI Polish Speech Recognition. But the problem is that even if I want to use it in English in my application, I can't do that unless I switch Windows language to English.

Thaddy

  • Hero Member
  • *****
  • Posts: 14213
  • Probably until I exterminate Putin.
Re: Any successful experience with SpeechToText API on Windows?
« Reply #41 on: March 10, 2024, 07:48:55 am »
@PawelO

I am investigating VOSK, which is open source, runs local and supports Polish and many other languages, including Dutch. `
You can simply install it for Python using pip3 install vosk
And because we can embed a Python engine, we can also work with it in FreePascal is the idea.
That is, for lack of direct Pascal bindings at the moment.
For a speech-to-text engine that can work locally it is quite lightweight (but by nature still big).
I suggest you have a go at it too... ;D :D
« Last Edit: March 10, 2024, 07:56:20 am by Thaddy »
Specialize a type, not a var.

PawelO

  • New Member
  • *
  • Posts: 16
    • Polish railway traffic and interlocking simulator developed with Lazarus
Re: Any successful experience with SpeechToText API on Windows?
« Reply #42 on: March 10, 2024, 01:21:10 pm »
Thanks, I will check it.

PawelO

  • New Member
  • *
  • Posts: 16
    • Polish railway traffic and interlocking simulator developed with Lazarus
Re: Any successful experience with SpeechToText API on Windows?
« Reply #43 on: March 22, 2024, 04:23:43 pm »
One more update after some playing with SAPI.

Actually switching Windows language to English is only needed if you work with SpSharedRecoContext, which is engine shared in the system that opens Windows speech recognition app on top of the screen - which in the same time will recognize and interpret everything you say (e.g. "close" will close window of your app) - doesn't make much sense to me.

When you use SpInProcRecoContext, which is engine working only inside your app, it works simply after installing language pack with speech recognition, no change of Windows language needed.

However, Windows speech recognition out-of-the-box often make errors (at least for me, but it also has problems with understanding default Windows TTS Voices..). It improved significantly after "Speech Recognition Voice Training", which - of course - can only be launched when you switch Windows to English. Anyway, it works good enough for me.

Below simple example of SpInProcRecoContext working in Command mode (recognition of commands defined in xml).

Main unit:

Code: Pascal  [Select][+][-]
  1. unit uMain;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. // NOTE - to use SAPI:
  6. // 1. Install and use in project LazActiveX package
  7. // 2. Create SpeechLib_5_4_TLB (dll header): Tools -> Import Type Library ...:
  8. //    - browse for dll
  9. //    - select Create visual component (creates TAxc objects)
  10. //    - OK, save TLB file in project directory
  11. // 3. Install EN-US (or other) language pack with Speech Recognition in Windows Settings
  12. //
  13. // typical path to SAPI dll files:
  14. // C:\Windows\SysWOW64\Speech\Common\sapi.dll - 32bit
  15. // C:\Windows\System32\Speech\Common\sapi.dll - 64bit
  16.  
  17. interface
  18.  
  19. uses
  20.   Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls, LCLType, SpeechLib_5_4_TLB;
  21.  
  22. type
  23.  
  24.   TfrmMain = class(TForm)
  25.     memLog: TMemo;
  26.     procedure FormCreate(Sender: TObject);
  27.   private
  28.     FSpInProcRecoContext: TAxcSpInProcRecoContext;
  29.     FRecoGrammar: ISpeechRecoGrammar;
  30.     procedure FOnSoundStart(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant);
  31.     procedure FOnSoundEnd(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant);
  32.     procedure FOnHypothesis(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; Result: ISpeechRecoResult);
  33.     procedure FOnRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; RecognitionType: SpeechRecognitionType; Result: ISpeechRecoResult);
  34.     procedure FOnFalseRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; Result: ISpeechRecoResult);
  35.   public
  36.  
  37.   end;
  38.  
  39. var
  40.   frmMain: TfrmMain;
  41.  
  42. implementation
  43.  
  44. {$R *.lfm}
  45.  
  46. procedure TfrmMain.FormCreate(Sender: TObject);
  47. var
  48.   recoContext: ISpeechRecoContext;
  49.   category: TAxcSpObjectTokenCategory;
  50.   audioToken: TAxcSpObjectToken;
  51. const
  52.   // NOTE: instead of defining SAPI_AudioInId, a const from SAPI .dll should be used, but it's not imported to TLB
  53.   SAPI_AudioInId = 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput';
  54. begin
  55.   try
  56.     //mic audio in setup
  57.     category := TAxcSpObjectTokenCategory.Create(Self);
  58.     category.OleServer.SetId(SAPI_AudioInId, False);
  59.     category.Visible := False; //to avoid appearing TAxc object on Form
  60.     audioToken := TAxcSpObjectToken.Create(Self);
  61.     audioToken.OleServer.SetId(category.OleServer.Default, '', False);
  62.     audioToken.Visible := False;
  63.  
  64.     //creating recognition object and setting audio in
  65.     FSpInProcRecoContext := TAxcSpInProcRecoContext.Create(Self);
  66.     FSpInProcRecoContext.Visible := False;
  67.     recoContext := FSpInProcRecoContext.OleServer;
  68.     recoContext.EventInterests := SREAllEvents;
  69.     recoContext.Recognizer.AudioInput := audioToken.OleServer;
  70.     //event handling
  71.     FSpInProcRecoContext.OnSoundStart  := @FOnSoundStart;  //start of sound detected
  72.     FSpInProcRecoContext.OnSoundEnd    := @FOnSoundEnd;    //end of sound detected
  73.     FSpInProcRecoContext.OnHypothesis  := @FOnHypothesis;  //recognition hypothesis
  74.     FSpInProcRecoContext.OnRecognition := @FOnRecognition; //recognition result - successful recognition
  75.     FSpInProcRecoContext.OnFalseRecognition := @FOnFalseRecognition; //recognition result - false recognition (not enough confidence)
  76.  
  77.     //loading grammar file
  78.     FRecoGrammar := recoContext.CreateGrammar(0);
  79.     try
  80.       FRecoGrammar.CmdLoadFromFile('grammar.xml', SLOStatic);
  81.       FRecoGrammar.CmdSetRuleState('', SGDSActive); //set all top-level rules active
  82.     except
  83.       Application.MessageBox('Loading grammar file failed. Check if file exists and contains no errors.', 'Error', MB_ICONERROR);
  84.     end;
  85.   except
  86.     Application.MessageBox('Cannot initialize SAPI. Check if Speech Recognition is installed in Windows.', 'Error', MB_ICONERROR);
  87.   end;
  88. end;
  89.  
  90. procedure TfrmMain.FOnSoundStart(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant);
  91. begin
  92.   memLog.Lines.Add('OnSoundStart');
  93. end;
  94.  
  95. procedure TfrmMain.FOnSoundEnd(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant);
  96. begin
  97.   memLog.Lines.Add('OnSoundEnd');
  98. end;
  99.  
  100. procedure TfrmMain.FOnHypothesis(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; Result: ISpeechRecoResult);
  101. begin
  102.   memLog.Lines.Add('OnHypothesis: ' + Result.PhraseInfo.GetText(0, -1, True));
  103. end;
  104.  
  105. procedure TfrmMain.FOnRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; RecognitionType: SpeechRecognitionType; Result: ISpeechRecoResult);
  106. begin
  107.   memLog.Lines.Add('OnRecognition: ' + Result.PhraseInfo.GetText(0, -1, True) + ' (Rule: ' + Result.PhraseInfo.Rule.Name + ')');
  108. end;
  109.  
  110. procedure TfrmMain.FOnFalseRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; Result: ISpeechRecoResult);
  111. begin
  112.   memLog.Lines.Add('OnFalseRecognition: ' + Result.PhraseInfo.GetText(0, -1, True));
  113. end;
  114.  
  115. end.
  116.  

grammar.xml:

Code: XML  [Select][+][-]
  1. <!-- 409 = EN-US -->
  2. <GRAMMAR LANGID="409">
  3.   <DEFINE>
  4.     <ID NAME="RID_color" VAL="1"/>
  5.     <ID NAME="RID_shape" VAL="2"/>
  6.   </DEFINE>
  7.  
  8.   <RULE NAME="colors">
  9.     <L>
  10.       <P>red</P>
  11.       <P>blue</P>
  12.       <P>green</P>
  13.       <P>yellow</P>
  14.       <P>white</P>
  15.       <P>black</P>
  16.     </L>
  17.   </RULE>
  18.  
  19.   <RULE NAME="shapes">
  20.     <L>
  21.       <P>square</P>
  22.       <P>triangle</P>
  23.       <P>circle</P>
  24.       <P>rectangle</P>
  25.     </L>
  26.   </RULE>
  27.  
  28.   <RULE NAME="color" ID="RID_color" TOPLEVEL="ACTIVE">
  29.     <P>color</P>
  30.     <RULEREF NAME="colors" />
  31.     <O>please</O>
  32.   </RULE>
  33.  
  34.   <RULE NAME="shape" ID="RID_shape" TOPLEVEL="ACTIVE">
  35.     <P>shape</P>
  36.     <RULEREF NAME="shapes" />
  37.     <O>please</O>
  38.   </RULE>
  39. </GRAMMAR>
  40.  

 

TinyPortal © 2005-2018