Recent

Author Topic: PasLLM - LLM Inference Engine in Pure Pascal  (Read 1793 times)

BeRo

  • Jr. Member
  • **
  • Posts: 50
    • My site
PasLLM - LLM Inference Engine in Pure Pascal
« on: November 20, 2025, 01:39:24 pm »
I've just released PasLLM, an LLM inference engine written completely in Object Pascal. It allows you to run models such as Llama 3.x, Qwen 2.5, Qwen 3, Phi-3, Mixtral, Gemma 1, DeepSeek R1 and others locally, without Python or external dependencies at inference-runtime.

It works with Delphi 11.2+ and FreePascal 3.3.1+ on all major modern operating system targets. I've implemented custom 4-bit quantization formats that get very close to full precision quality while keeping model sizes manageable. CLI and GUI versions are included (FMX, VCL, LCL). Pre-quantized models are available for download. PasLLM can also be integrated as a unit directly into your own Object Pascal projects.

Right now it's CPU-only. GPU acceleration via PasVulkan is planned but will take significant time. I mainly test only 64-bit builds, compiling for 32-bit might work, but isn't officially supported and may run into memory limitations with larger models.

The repository is at https://github.com/BeRo1985/pasllm (synced from my private server where development takes place). It's AGPL 3.0 licensed for opensource usage and with commercial licenses available if needed.

« Last Edit: November 20, 2025, 01:51:46 pm by BeRo »

LeP

  • Full Member
  • ***
  • Posts: 124
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #1 on: November 20, 2025, 02:44:44 pm »
First of all, congratulations on your work.
I have a few questions for you:
1) Do you have any more information about the commercial license (terms, conditions, prices)?
2) The LLM models you offer don't have their own license, with no indication of their origin. Is there a license for this? How can I update the models?

BeRo

  • Jr. Member
  • **
  • Posts: 50
    • My site
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #2 on: November 20, 2025, 03:47:59 pm »
First of all, congratulations on your work.

Thank you :-)

I have a few questions for you:
1) Do you have any more information about the commercial license (terms, conditions, prices)?

Regarding the commercial license, it's not something I have publicly detailed yet, since I will until at least Q2 2026 be focused on another bigger project, but which is also using PasLLM. However, I am open to discussing terms, conditions, and pricing on a case-by-case basis.

2) The LLM models you offer don't have their own license, with no indication of their origin. Is there a license for this? How can I update the models?

The LLM models I offer are sourced from various Hugging Face repositories, and each model comes with its own license.  For updating the models, I recommend checking the original repositories for any updates or newer versions. Here some Hugging Face links for reference:

- Qwen 2.5 0.5B Instruct
- Qwen 2.5 1.5B Instruct
- Qwen 2.5 3B Instruct
- Qwen 2.5 7B Instruct
- Qwen 3 0.6B
- Qwen 3 1.7B
- Qwen 3 4B
- Qwen 3 14B
- Qwen 3 4B Instruct 2507
- Qwen 3 4B Thinking 2507
- Qwen 3 30B A3B Instruct 2507
- Qwen 3 30B A3B Thinking 2507
- Qwen 3 Coder 30B A3B Instruct
- SmolLM2 135M Instruct
- SmolLM2 360M Instruct
- SmolLM2 1.7B Instruct
- SmolLM3 3B
- Llama 3 series
- Llama 3.1 series
- Llama 3.2 series
- Mistral Mixtral 8x7B Instruct v0.1
- and many others, just search on Hugging Face.

You can convert them to PasLLM format using the provided conversion script convert.py in the PasLLM repository.

Example:

Code: Bash  [Select][+][-]
  1. cd ${modelpath}
  2. python ${pasllmbasepath}/tools/convert.py --config config.json --tokenizer tokenizer.json --models model*.safetensors --dtype q40nl --cpu ${pasllmbasepath}/bin/models/${modelname}_q40nl.safetensors

But don't worry, Python is not required to run PasLLM, it's only needed for model conversion. 

If you have any more questions or need further assistance, please don't hesitate to ask.

« Last Edit: November 20, 2025, 03:51:05 pm by BeRo »

avra

  • Hero Member
  • *****
  • Posts: 2582
    • Additional info
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #3 on: November 20, 2025, 05:35:49 pm »
Nice!    :D 8-)  :D
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

domasz

  • Hero Member
  • *****
  • Posts: 616
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #4 on: November 20, 2025, 05:39:12 pm »
Do you know which model is "the smartest"? And do they take as much RAM as they take disk space?
« Last Edit: November 20, 2025, 07:03:32 pm by domasz »

domasz

  • Hero Member
  • *****
  • Posts: 616
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #5 on: November 20, 2025, 08:21:53 pm »
This is AMAZING work! qwen3_8b_q40nl works pretty nice.

BeRo

  • Jr. Member
  • **
  • Posts: 50
    • My site
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #6 on: November 20, 2025, 09:17:20 pm »
Do you know which model is "the smartest"? And do they take as much RAM as they take disk space?

Among these models, Qwen3 is considered the smartest, with Qwen3 being the most advanced line. And regarding memory usage, PasLLM uses memory mapping to load models, which means it primarily uses disk space rather than RAM for the model weights themselves. This allows it to handle larger models without requiring a significant amount of RAM. However, the actual RAM usage can vary depending on the specific context size currently used, the bigger the context, the more RAM is needed for processing.

The Qwen 4B 2507 models and the Qwen 3 30B A3B 2507 models are currently the best. And Qwen3 4B 2507 is newer than the Qwen3 models without 2507 in their names, so it is generally considered better. For example, Qwen3 4B 2507 can outperform the several months older Qwen3 8B, but not necessarily in all cases, it depends on the tasks. And Qwen 3 30B A3B 2507 is better than all Qwen3 models with fewer parameters, even though it uses only 3B active parameters out of the 30B total parameters, making it more efficient since it is a mixture of experts model (MoE). And the fun fact, at https://chat.qwen.ai/ you can also use exactly the same models like Qwen3 30B A3B. This means the locally runnable Qwen models are just as smart as the online Qwen models, because they are exactly the same. And if you had enough RAM, you could also convert and run the Qwen3 235B A22B Instruct model locally, which is the biggest Qwen model available, using 22B active parameters out of the 235B total parameters. But for that you would need a lot of RAM and disk space, far over 128GB RAM and over 1TB disk space, so roughly speaking a high end server setup. But it would be on the same level as ChatGPT, Gemini, and other top models, more or less, depending on the tasks.

So in summary, Qwen3 models are currently the smartest among the PasLLM supported models, with Qwen3 4B 2507 and Qwen3 30B A3B 2507 being the best options. And they use disk space for model weights due to memory mapping, while RAM usage depends on the context size used during inference.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6311
  • Compiler Developer
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #7 on: November 20, 2025, 09:40:39 pm »
The repository is at https://github.com/BeRo1985/pasllm (synced from my private server where development takes place). It's AGPL 3.0 licensed for opensource usage and with commercial licenses available if needed.

Nice to see that you decided to open source it 😍 I've already loved your demonstration at the Pascal Conference and I'm looking forward what is going to come there (like the PasVulkan GPU acceleration). 😁

Thausand

  • Sr. Member
  • ****
  • Posts: 458
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #8 on: November 21, 2025, 02:16:43 am »
I've just released PasLLM, an LLM inference engine written completely in Object Pascal. It allows you to run models such as Llama 3.x, Qwen 2.5, Qwen 3, Phi-3, Mixtral, Gemma 1, DeepSeek R1 and others locally, without Python or external dependencies at inference-runtime.

It works with Delphi 11.2+ and FreePascal 3.3.1+ on all major modern operating system targets. I've implemented custom 4-bit quantization formats that get very close to full precision quality while keeping model sizes manageable. CLI and GUI versions are included (FMX, VCL, LCL). Pre-quantized models are available for download. PasLLM can also be integrated as a unit directly into your own Object Pascal projects.
I make congratulate for BeRo. Nice work !

I make observate for MD readme:
Code: [Select]
$ ./bin/pasllmcli --model bin/models/qwen3_0.6b_gabliterated_q40nl.safetensors
An unhandled exception occurred at $00000000004028C4:
Exception: Model file not found: llama32_1b_instruct_abliterated_q40nl.safetensors
  $00000000004028C4
Make verify:
Code: [Select]
$ test -f bin/models/llama32_1b_instruct_abliterated_q40nl.safetensors && echo "$FILE exists."
exists.
Have note, test qwen no llama. Model qwen and llama is dowload Mega.

BeRo

  • Jr. Member
  • **
  • Posts: 50
    • My site
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #9 on: November 21, 2025, 12:07:37 pm »
I make observate for MD readme:
Code: [Select]
$ ./bin/pasllmcli --model bin/models/qwen3_0.6b_gabliterated_q40nl.safetensors
An unhandled exception occurred at $00000000004028C4:
Exception: Model file not found: llama32_1b_instruct_abliterated_q40nl.safetensors
  $00000000004028C4
Make verify:
Code: [Select]
$ test -f bin/models/llama32_1b_instruct_abliterated_q40nl.safetensors && echo "$FILE exists."
exists.
Have note, test qwen no llama. Model qwen and llama is dowload Mega.

Ups, it should be "-model=" not "--model" and space, fixed :D Thanks.

Thausand

  • Sr. Member
  • ****
  • Posts: 458
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #10 on: November 21, 2025, 10:20:24 pm »
Ups, it should be "-model=" not "--model" and space, fixed :D Thanks.
I make confirm that is now work how describe readme.

Many very thank for fix/update BeRo 👍

hshatti

  • New Member
  • *
  • Posts: 14
  • Be kind!
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #11 on: January 05, 2026, 03:05:51 pm »
WoW! What Great project!

I recently got an NVIDIA DGX Spark and tried to run your project on it (despite running on CPU only and not leveraging the full DGX power), after a little tinkering with it, it works like a charm!

I have been working on a Tensor manipulation library in pascal  (and just pascal) for Computer Vision, similar LLMs (and other AI stuff) for both inference and training that relies optionally on  NVIDIA CUDA and cuDNN or intel's MKL oneDNN if the platform was detected and falls back to native pascal CPU implementation if not, but unfortunately despite the success with object detection models I had to abandon it for a while, this project shows how pascal is still more than capable and powerful in modern days among the noise of other newly emerged languages, now I just got jealous, if you are looking for a contribution in GPU integration (Specially NVIDIA) please let me know.

Spectacular job! keep it up!
H
Comrades!, Unpython the world


matthius

  • Full Member
  • ***
  • Posts: 186
  • Creating VRAD...
    • LIBERLOG - Développement rapide
Re: PasLLM - LLM Inference Engine in Pure Pascal
« Reply #13 on: January 15, 2026, 01:41:45 pm »
Here are Pas LLM full sources uploaded :
https://archive.org/download/pasllm/pasllm_archive.torrent
M. GIROUX
13 rue Tanguy PRIGENT
35000 RENNES - France
http://liberlog.fr

 

TinyPortal © 2005-2018