Has any one explored integrating this multi platform work into a Lazarus GUI project, under Windows for starters?
Why pollute it with Lazarus ? There is not really something to explore there.
Windows ? Nah, I rather not
Any experience or how-tos to share?
First of all the original cpp project contains all sorts of examples. That can be used to your advantage.
Other than that:
mkdir work
cd work
# setup whisper.cpp
wget https://github.com/ggerganov/whisper.cpp/archive/refs/heads/master.zip
unzip master.zip
rm master.zip
pushd whisper.cpp-master
make
make libwhisper.so
popd
# setup whisper.pas
wget https://github.com/Kagamma/whisper-pas/archive/refs/heads/master.zip
unzip master.zip
rm master.zip
sed -i 's/libwhisper.so/.\/libwhisper.so/g' whisper-pas-master/src/whisper.pas
pushd whisper-pas-master/whisper-cli
fpc -B -Fu../src -Mobjfpc -Sh whisper_cli.lpr
popd
# setup whispering Pascal project
mkdir whispering
pushd whispering
cp ../whisper-pas-master/whisper-cli/whisper_cli .
cp ../whisper.cpp-master/libwhisper.so .
cp ../whisper.cpp-master/samples/jfk.wav .
wget --no-config --quiet --show-progress -O ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin?download=true
./whisper_cli -m ggml-base.en.bin -i jfk.wav
popd
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 111.28 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 35.86 ms
whisper_print_timings: sample time = 26.79 ms / 1 runs ( 26.79 ms per run)
whisper_print_timings: encode time = 44312.43 ms / 1 runs (44312.43 ms per run)
whisper_print_timings: decode time = 1756.38 ms / 27 runs ( 65.05 ms per run)
whisper_print_timings: batchd time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 46248.65 ms
It is just as a normal library. You call its functions in the order you want (as per documentation/example) and get back a (or several) result(s). The only difference, I guess, is that some function calls take a bit longer to return