Forum > Audio and Video
Whisper (video to) audio to text with FrerPascal
(1/1)
PaulANormanNZ:
Hi,
Looking at a speech recognition question over at: https://forum.lazarus.freepascal.org/index.php/topic,40886.msg492181.html
I thought that this warranted a new thread, as also people may not know of the following yet ...
I have been exploring just using process monitoring utilising this very good CLI
GitHub - ‘Purfview/whisper-standalone-win: Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python’
https://github.com/Purfview/whisper-standalone-win
It is really multi platform and I have tested it under Windows command shell and it works really very well.
Then found this turning up in a search..
‘Whisper for FreePascal’
Which looks like it could give tighter/faster(?) integration with a Lazarus project.
Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?
Any experience or how-tos to share?
‘1. Whisper for FreePascal with 2. ggerganov whisper.cpp’
1. whisper-pas/ at master · Kagamma/whisper-pas · GitHub
https://github.com/Kagamma/whisper-pas/tree/master
2. ‘whisper.cpp’
https://github.com/ggerganov/whisper.cpp
TIA
Paul
TRon:
--- Quote from: PaulANormanNZ on April 11, 2024, 08:52:56 am ---Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?
--- End quote ---
Why pollute it with Lazarus ? There is not really something to explore there.
Windows ? Nah, I rather not :P
--- Quote ---Any experience or how-tos to share?
--- End quote ---
First of all the original cpp project contains all sorts of examples. That can be used to your advantage.
Other than that:
--- Code: Bash [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---mkdir workcd work # setup whisper.cppwget https://github.com/ggerganov/whisper.cpp/archive/refs/heads/master.zipunzip master.ziprm master.zippushd whisper.cpp-mastermakemake libwhisper.sopopd # setup whisper.paswget https://github.com/Kagamma/whisper-pas/archive/refs/heads/master.zipunzip master.ziprm master.zipsed -i 's/libwhisper.so/.\/libwhisper.so/g' whisper-pas-master/src/whisper.paspushd whisper-pas-master/whisper-clifpc -B -Fu../src -Mobjfpc -Sh whisper_cli.lprpopd # setup whispering Pascal projectmkdir whisperingpushd whisperingcp ../whisper-pas-master/whisper-cli/whisper_cli .cp ../whisper.cpp-master/libwhisper.so .cp ../whisper.cpp-master/samples/jfk.wav .wget --no-config --quiet --show-progress -O ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin?download=true./whisper_cli -m ggml-base.en.bin -i jfk.wavpopd
--- Code: ---whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 111.28 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 35.86 ms
whisper_print_timings: sample time = 26.79 ms / 1 runs ( 26.79 ms per run)
whisper_print_timings: encode time = 44312.43 ms / 1 runs (44312.43 ms per run)
whisper_print_timings: decode time = 1756.38 ms / 27 runs ( 65.05 ms per run)
whisper_print_timings: batchd time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 46248.65 ms
--- End code ---
It is just as a normal library. You call its functions in the order you want (as per documentation/example) and get back a (or several) result(s). The only difference, I guess, is that some function calls take a bit longer to return :D
Navigation
[0] Message Index