Recent

Author Topic: [AI][Speech Recognition]Run speech recognition locally on CPU with object pascal  (Read 5230 times)

csukuangfj

  • New Member
  • *
  • Posts: 16
By the way, are you able to run the examples? Have you encountered any usage issues?
It took me some time to get familiar with your repository.

Resulting build & run log-file attached.

The log shows that it failed on the examples making use of portaudio. That is because I forgot to install the libportaudio dev-package before running the tests  :-X

BTW: with regards to the MLS model(s). I experimented a bit more with them and when you provide longer sequences of text then the model seems to pick up on pronunciation after several words/sentences.

@VisualLab:
I do not know for sure what might be the culprit there (I am (also) new when it comes to sherpa-onnx) but better voice training usually yield better results. How you can do that can f.e. be seen in this video (that channel is interesting anyway if you are interested in this kind of software).

That speed and volume differs might be caused by training as well. In case you did not already do realize that some language are spoken must faster/louder then you might be used to (or slower/softer in case you are using a fast paste language). Some engines solve that better (e.g. automatically) than others (where you have change things manually).

The API allows for setting details about the voice model/output that can't be set by the command-line programs.

Great to know you are able to run our object pascal API examples.

Hope that you can get the tests related to portaudio to run after you fix installing portaudio.

TRon

  • Hero Member
  • *****
  • Posts: 3302
@csukuangfj
I am currently toying with another audio output library instead but am sure the portaudio examples work as the c pre-compiled examples from the static repository archive that play audio worked for me as well :-)

Do you happen to know if it is possible to retrieve (additional) information about individual speakers of a voice model ?

It would for example be very helpful if it would be possible to know "the gender" of a speaker in order to be able to make a selection (e.g. libritts-r alone has 904 speakers).

It would also be nice to be able to use an actual name instead of "speaker with ID <insert ID number here> says:" in order to be able to distinguish individual speakers or be able to select a name from a list instead of a number.

Is there something in particular I (c/sh)ould look for in the original source-tree of sherpa-onnx to figure out if it is even possible to obtain information like that ?

TIA

Code: Pascal  [Select][+][-]
  1. procedure RotateSpeakerIDs;
  2. var
  3.   AudioDriver   : TAudioDriver;
  4.   VoiceModel    : TSSTVoiceModel;
  5.   VoiceID       : integer = 0;
  6.   VoiceSPD      : single  = 1.0;
  7.   Txt           : AnsiString = 'The right word of the day like determined yesterday by the radiation-frequency of the transmitter: When it rained coliflower.';
  8.   SpeakersCount : integer;
  9.   VoiceModelDir : string;
  10. begin
  11.   VoiceModelDir := FetchVoiceModel(GB_URLS[3]);
  12.   if VoiceModelDir = '' then
  13.   begin
  14.     writeln('PANIC: invalid voice model name. emergency exit of program.');
  15.     exit;
  16.   end;
  17.  
  18.   AudioDriver := TAudioDriver.Create;
  19.   VoiceModel  := TSSTVoiceModel.Create(VoiceModelDir);
  20.   try
  21.     writeln('Voice samplerate = ', VoiceModel.TTS.GetSampleRate);
  22.     SpeakersCount := VoiceModel.TTS.GetNumSpeakers;
  23.     writeLn('Voice has ', SpeakersCount, ' speakers');
  24.  
  25.     for VoiceID := 0 to SpeakersCount-1
  26.       do VoiceModel.Speak(VoiceID, VoiceSPD, Txt, AudioDriver);
  27.  
  28.   finally
  29.     VoiceModel.Free;
  30.     AudioDriver.Free;
  31.   end;
  32. end;
  33.  
This tagline is powered by AI

csukuangfj

  • New Member
  • *
  • Posts: 16
@csukuangfj
I am currently toying with another audio output library instead but am sure the portaudio examples work as the c pre-compiled examples from the static repository archive that play audio worked for me as well :-)

Do you happen to know if it is possible to retrieve (additional) information about individual speakers of a voice model ?

It would for example be very helpful if it would be possible to know "the gender" of a speaker in order to be able to make a selection (e.g. libritts-r alone has 904 speakers).

It would also be nice to be able to use an actual name instead of "speaker with ID <insert ID number here> says:" in order to be able to distinguish individual speakers or be able to select a name from a list instead of a number.

Is there something in particular I (c/sh)ould look for in the original source-tree of sherpa-onnx to figure out if it is even possible to obtain information like that ?

TIA

Code: Pascal  [Select][+][-]
  1. procedure RotateSpeakerIDs;
  2. var
  3.   AudioDriver   : TAudioDriver;
  4.   VoiceModel    : TSSTVoiceModel;
  5.   VoiceID       : integer = 0;
  6.   VoiceSPD      : single  = 1.0;
  7.   Txt           : AnsiString = 'The right word of the day like determined yesterday by the radiation-frequency of the transmitter: When it rained coliflower.';
  8.   SpeakersCount : integer;
  9.   VoiceModelDir : string;
  10. begin
  11.   VoiceModelDir := FetchVoiceModel(GB_URLS[3]);
  12.   if VoiceModelDir = '' then
  13.   begin
  14.     writeln('PANIC: invalid voice model name. emergency exit of program.');
  15.     exit;
  16.   end;
  17.  
  18.   AudioDriver := TAudioDriver.Create;
  19.   VoiceModel  := TSSTVoiceModel.Create(VoiceModelDir);
  20.   try
  21.     writeln('Voice samplerate = ', VoiceModel.TTS.GetSampleRate);
  22.     SpeakersCount := VoiceModel.TTS.GetNumSpeakers;
  23.     writeLn('Voice has ', SpeakersCount, ' speakers');
  24.  
  25.     for VoiceID := 0 to SpeakersCount-1
  26.       do VoiceModel.Speak(VoiceID, VoiceSPD, Txt, AudioDriver);
  27.  
  28.   finally
  29.     VoiceModel.Free;
  30.     AudioDriver.Free;
  31.   end;
  32. end;
  33.  

> Do you happen to know if it is possible to retrieve (additional) information about individual speakers of a voice model ?

Please have a look at
https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/libritts_r/medium

Each model has a corresponding `json` file containing the meta info.

 

TinyPortal © 2005-2018