Recent

Author Topic: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems  (Read 12907 times)

Lexxy

  • New member
  • *
  • Posts: 9
I am working on a project involving Speech Recognition and Text-to-speech on a Windows CE handheld device (WinCE 5.0, arm). I am using the latest 0.9.29 build of Lazarus from SVN.

The application I am building is a client/server application. I have a server (.Net) and a handheld client which connects to the server using a (wireless) socket connection (Wifi). This is working nicely. I'm using LNet on the client side for this.

Regarding the Text-to-Speech and Voice Recognition, I am considering two ways to implement this at the moment :

1. Handle the actual TTS and SR on the server, and send WAV-data over the socket connection. This is the easiest to implement, since I can use SAPI on the server and the Win CE handheld client only needs to play server-generated WAV files and send WAV files it recorded from it's headset back to the server. The downside to this approach is heavy network traffic (over wifi). Due to that, I don't think this approach will work out in the end. I have a proof-of-concept for this running on the server, works fine. I would still have to figure out how to play wav files and record wav files on Win CE using Lazarus.

2. Handle the TTS and SR on the Win CE handheld device itself, so only plain text (very small amounts of data) is transmitted over the wireless client-server connection. This way, server load will be less, and much, much less wireless network bandwith will be needed. However, I have to figure out how to implement TTS and SR on Win CE 5.0 (arm CPU).

For approach 2, my current idea is to use SAPI. I am still kind of new to Win CE programming, but I think OLE automation is the way to go for SAPI on Win CE 5.0. I have tried this, using the following code as a test :

Code: [Select]
procedure TForm1.Button2Click(Sender: TObject);
var
  objOle: Variant;
begin
  objOle := CreateOleObject('SAPI.SpVoice');
  // objOle.Speak call here
end;

This code fails right at the CreateOleObject call. An exception is being generated, but the exception message is blank. No error message. I have tried creating OTHER Ole objects, Like InternetExplorer.Application, ADODB.Connection, ADOCE.RecordSet, to see if the problem lies in the SAPI object or in the CreateOleObject() implementation itself. For every Ole object I try to create like this, I get the same blank exception. Because of that, I think the problem might not be in my code, but in the combination Lazarus 0.9.29 / WinCE / arm.

I guess  my questions are :

* Are there known problems in the ComObj unit, regarding the use of CreateOleObject() for WinCE 5.0 / arm ? Any solutions / workarounds for this ?
* Is Ole automation + SAPI the way to go for Text-to-Speech and Speech Rocognition on Win CE 5.0 (arm) ?
* Does anyone have other approaches for TTS and SR on Win CE 5.0 (arm) ? Preferrably free or low-cost solutions :)

I would appreciate any help in this a lot...

Troodon

  • Sr. Member
  • ****
  • Posts: 484
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #1 on: August 08, 2010, 01:47:29 pm »
I assume you checked that you have SAPI on WinCE 5.0. Any idea what the CPU requirements for speech recognition are? On my dual core 2.7 GHz system CPU usage is 20%-30% with speech recognition activated and drops to around 2% without it.
Lazarus/FPC on Linux

Lexxy

  • New member
  • *
  • Posts: 9
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #2 on: August 09, 2010, 04:16:35 pm »
I think I have SAPI available on the handheld, although I am not quite sure how to test this without actually implementing some OLE code to test it. And creating this OLE code is what i causing me problems.

I think the stuff you say about CPU utilization is interesting too though. It might be too CPU intensive to actually run on the handheld.

Anyone got some more ideas about this question ?

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3541
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #3 on: August 09, 2010, 05:03:03 pm »
I tryed in the emulator and the messagebox says "Invalid class string". Strange that no exception message appears in your device.

Looking at the FPC sources you can see that CreateOleObject is implemented as:

   function CreateOleObject(const ClassName : string) : IDispatch;
     var
       id : TCLSID;
     begin
        id:=ProgIDToClassID(ClassName);
        OleCheck(CoCreateInstance(id,nil,CLSCTX_INPROC_SERVER or CLSCTX_LOCAL_SERVER,IDispatch,result));
     end;

So one thing to better test this would be if you try to use the WinAPI CoCreateInstance directly. Docs are here: http://msdn.microsoft.com/en-us/library/ms863893.aspx

I tryed a simple example and it crashes between steps 4 and 5. Maybe you have a better idea of what is going on, as I don't really use OLE Automation =)

Code: [Select]
uses
  Windows, ActiveX, Ole2;

{$R *.lfm}

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
var
  unkXl: IUnknown;
  dispXl: IDispatch;
  clsidXl: TCLSID;
  did: TDispID;
  lname: POleStr;
  dps: DISPPARAMS;
  didNamed: TDispID;
  param: VARIANTARG;
  hres: HRESULT;
begin
  Memo1.Text := 'Step1';
  // initialize COM
  CoInitializeEx(nil, 0);
  Memo1.Text := 'Step2';
  // get CLSID of InternetExplorer
  CLSIDFromProgID('InternetExplorer.Application', clsidXl);
  Memo1.Text := 'Step3';
  // create InternetExplorer instance
  CoCreateInstance(clsidXl, nil, CLSCTX_LOCAL_SERVER, IID_IUnknown, unkXl);
  Memo1.Text := 'Step4';
  // get IDispatch interface of received object
  unkXl.QueryInterface(IID_IDispatch, dispXl);
  Memo1.Text := 'Step5';

  // DO SOME STUFF WITH InternetExplorer HERE
  // for example, we'll just show it ;)
  lname:= 'Visible';
  param.vt:= VT_BOOL;
  param.vbool:= true;
  dps.cArgs:= 1;
  dps.rgvarg:= @param;
  didNamed:= DISPID_PROPERTYPUT;
  dps.cNamedArgs:= 1;
  dps.rgdispidNamedArgs:= @didNamed;

  dispXl.GetIDsOfNames(GUID_NULL, @lname, 1, LOCALE_SYSTEM_DEFAULT, @did);
  hres:= dispXl.Invoke(did, GUID_NULL, LOCALE_SYSTEM_DEFAULT, DISPATCH_PROPERTYPUT,
    dps, nil, nil, nil);

  // release unused objects
  dispXl._Release;
  unkXl._Release;
  // deinitialize COM
  CoUninitialize;
end;

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3541
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #4 on: August 09, 2010, 05:15:26 pm »
Also, how do you know which strings to use? I searched in google, but I couldn't find any list of OLE Automation servers which are available in standard Win CE phone ... like for example which string to use to call Internet Explorer in WinCE.

Lexxy

  • New member
  • *
  • Posts: 9
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #5 on: August 10, 2010, 12:17:31 pm »
Hi felipemdc,

From another forum post, i read someone who got sapi working on a CE device, using the 'SAPI.SpVoice' Ole object. The other ones I used in my tests are guesses, I think these Ole objects should be available, but I am not 100% sure.

I might focus my attention on server-side SAPI processing first, so I can get at least something working.

Do you (or someone else) know how to capture audio, from a microphone/headset, on WinCE (arm) ? I found documentation about the PlaySound() function in the MMSystem unit, but I can't find anything about capturing audio.

I hope someone can point me in the right direction for this...

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3541
Re: Text-to-Speech and Speech Recognition : SAPI / Ole automation problems
« Reply #6 on: August 10, 2010, 01:20:47 pm »
Sorry, no idea. But it would be good if you could document whatever tips you find here: http://wiki.freepascal.org/WinCE_Programming_Tips

Then the next people searching for the same thing will have it easier.