I've just released PasLLM, an LLM inference engine written completely in Object Pascal. It allows you to run models such as Llama 3.x, Qwen 2.5, Qwen 3, Phi-3, Mixtral, Gemma 1, DeepSeek R1 and others locally, without Python or external dependencies at inference-runtime.It works with Delphi 11.2+ and FreePascal 3.3.1+ on all major modern operating system targets. I've implemented custom 4-bit quantization formats that get very close to full precision quality while keeping model sizes manageable. CLI and GUI versions are included (FMX, VCL, LCL). Pre-quantized models are available for download. PasLLM can also be integrated as a unit directly into your own Object Pascal projects.Right now it's CPU-only. GPU acceleration via PasVulkan is planned but will take significant time. I mainly test only 64-bit builds, compiling for 32-bit might work, but isn't officially supported and may run into memory limitations with larger models.The repository is at https://github.com/BeRo1985/pasllm (synced from my private server where development takes place). It's AGPL 3.0 licensed for opensource usage and with commercial licenses available if needed.
It is better to directly call Llama.dll