Forum > Audio and Video

Whisper (video to) audio to text with FrerPascal

(1/1)

PaulANormanNZ:
Hi,

Looking at a speech recognition question over at: https://forum.lazarus.freepascal.org/index.php/topic,40886.msg492181.html
I thought that this warranted a new thread, as also people may not know of the following yet ...

I have been exploring just using process monitoring utilising this very good CLI

GitHub - ‘Purfview/whisper-standalone-win: Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python’

https://github.com/Purfview/whisper-standalone-win

It is really multi platform and I have tested it under Windows command shell and it works really very well.

Then found this turning up in a search..
‘Whisper for FreePascal’
Which looks like it could give tighter/faster(?) integration with a Lazarus project.

Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?

Any experience or how-tos to share?

‘1. Whisper for FreePascal with 2. ggerganov whisper.cpp’

1. whisper-pas/ at master · Kagamma/whisper-pas · GitHub

https://github.com/Kagamma/whisper-pas/tree/master

2. ‘whisper.cpp’

https://github.com/ggerganov/whisper.cpp

TIA

Paul

TRon:

--- Quote from: PaulANormanNZ on April 11, 2024, 08:52:56 am ---Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?

--- End quote ---
Why pollute it with Lazarus ? There is not really something to explore there.

Windows ? Nah, I rather not  :P


--- Quote ---Any experience or how-tos to share?

--- End quote ---
First of all the original cpp project contains all sorts of examples. That can be used to your advantage.

Other than that:

--- Code: Bash  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---mkdir workcd work # setup whisper.cppwget https://github.com/ggerganov/whisper.cpp/archive/refs/heads/master.zipunzip master.ziprm master.zippushd whisper.cpp-mastermakemake libwhisper.sopopd # setup whisper.paswget https://github.com/Kagamma/whisper-pas/archive/refs/heads/master.zipunzip master.ziprm master.zipsed -i 's/libwhisper.so/.\/libwhisper.so/g' whisper-pas-master/src/whisper.paspushd whisper-pas-master/whisper-clifpc -B -Fu../src -Mobjfpc -Sh whisper_cli.lprpopd # setup whispering Pascal projectmkdir whisperingpushd whisperingcp ../whisper-pas-master/whisper-cli/whisper_cli .cp ../whisper.cpp-master/libwhisper.so .cp ../whisper.cpp-master/samples/jfk.wav .wget --no-config --quiet --show-progress -O ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin?download=true./whisper_cli -m ggml-base.en.bin -i jfk.wavpopd 

--- Code: ---whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  132.07 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB
 And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =   111.28 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    35.86 ms
whisper_print_timings:   sample time =    26.79 ms /     1 runs (   26.79 ms per run)
whisper_print_timings:   encode time = 44312.43 ms /     1 runs (44312.43 ms per run)
whisper_print_timings:   decode time =  1756.38 ms /    27 runs (   65.05 ms per run)
whisper_print_timings:   batchd time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 46248.65 ms

--- End code ---

It is just as a normal library. You call its functions in the order you want (as per documentation/example) and get back a (or several) result(s). The only difference, I guess, is that some function calls take a bit longer to return  :D

Navigation

[0] Message Index

Go to full version