Recent

Author Topic: Whisper (video to) audio to text with FrerPascal  (Read 1161 times)

PaulANormanNZ

  • Full Member
  • ***
  • Posts: 117
Whisper (video to) audio to text with FrerPascal
« on: April 11, 2024, 08:52:56 am »
Hi,

Looking at a speech recognition question over at: https://forum.lazarus.freepascal.org/index.php/topic,40886.msg492181.html
I thought that this warranted a new thread, as also people may not know of the following yet ...

I have been exploring just using process monitoring utilising this very good CLI

GitHub - ‘Purfview/whisper-standalone-win: Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python’

https://github.com/Purfview/whisper-standalone-win

It is really multi platform and I have tested it under Windows command shell and it works really very well.

Then found this turning up in a search..
‘Whisper for FreePascal’
Which looks like it could give tighter/faster(?) integration with a Lazarus project.

Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?

Any experience or how-tos to share?

‘1. Whisper for FreePascal with 2. ggerganov whisper.cpp’

1. whisper-pas/ at master · Kagamma/whisper-pas · GitHub

https://github.com/Kagamma/whisper-pas/tree/master

2. ‘whisper.cpp’

https://github.com/ggerganov/whisper.cpp

TIA

Paul

TRon

  • Hero Member
  • *****
  • Posts: 3136
Re: Whisper (video to) audio to text with FrerPascal
« Reply #1 on: April 11, 2024, 11:27:22 pm »
Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?
Why pollute it with Lazarus ? There is not really something to explore there.

Windows ? Nah, I rather not  :P

Quote
Any experience or how-tos to share?
First of all the original cpp project contains all sorts of examples. That can be used to your advantage.

Other than that:
Code: Bash  [Select][+][-]
  1. mkdir work
  2. cd work
  3.  
  4. # setup whisper.cpp
  5. wget https://github.com/ggerganov/whisper.cpp/archive/refs/heads/master.zip
  6. unzip master.zip
  7. rm master.zip
  8. pushd whisper.cpp-master
  9. make
  10. make libwhisper.so
  11. popd
  12.  
  13. # setup whisper.pas
  14. wget https://github.com/Kagamma/whisper-pas/archive/refs/heads/master.zip
  15. unzip master.zip
  16. rm master.zip
  17. sed -i 's/libwhisper.so/.\/libwhisper.so/g' whisper-pas-master/src/whisper.pas
  18. pushd whisper-pas-master/whisper-cli
  19. fpc -B -Fu../src -Mobjfpc -Sh whisper_cli.lpr
  20. popd
  21.  
  22. # setup whispering Pascal project
  23. mkdir whispering
  24. pushd whispering
  25. cp ../whisper-pas-master/whisper-cli/whisper_cli .
  26. cp ../whisper.cpp-master/libwhisper.so .
  27. cp ../whisper.cpp-master/samples/jfk.wav .
  28. wget --no-config --quiet --show-progress -O ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin?download=true
  29. ./whisper_cli -m ggml-base.en.bin -i jfk.wav
  30. popd
  31.  

Code: [Select]
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  132.07 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB
 And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =   111.28 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    35.86 ms
whisper_print_timings:   sample time =    26.79 ms /     1 runs (   26.79 ms per run)
whisper_print_timings:   encode time = 44312.43 ms /     1 runs (44312.43 ms per run)
whisper_print_timings:   decode time =  1756.38 ms /    27 runs (   65.05 ms per run)
whisper_print_timings:   batchd time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 46248.65 ms

It is just as a normal library. You call its functions in the order you want (as per documentation/example) and get back a (or several) result(s). The only difference, I guess, is that some function calls take a bit longer to return  :D
All software is open source (as long as you can read assembler)

 

TinyPortal © 2005-2018