Recent

Author Topic: Which quantized model works well for local AI ? (Pascal programming)  (Read 1165 times)

Jorg3000

  • Jr. Member
  • **
  • Posts: 83
Hello,
I’m using LM Studio and was wondering if anyone can recommend a distilled quantized model that performs well for generating Pascal code and works on a standard laptop (local inference).
I’d like to have AI assistance for routine tasks while on the go, without relying on online services.

Are there already AI-optimized documentations (RAG) for FreePascal/Lazarus components - e.g. in Markdown format - that can be provided to a local model as a knowledge base? (to reduce hallucinations)
Jörg
« Last Edit: April 05, 2026, 12:21:51 am by Jorg3000 »

Wallaby

  • Guest
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #1 on: April 05, 2026, 01:14:53 am »
My experience with most local models was rarther bad even with a hi-end 4090 GPU.

It'd be interesting to hear if it worked well for someone else.

Thaddy

  • Hero Member
  • *****
  • Posts: 18918
  • Glad to be alive.
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #2 on: April 05, 2026, 07:25:40 am »
I believe BeRo's PasLLM can run existing, downloadable LLM models locally, with configurable optimizations. It hasn't GPU optimizations yet, but he is working on it.
It should run workably on a 16 core or less with 16GB (quite light for A.I.)

https://forum.lazarus.freepascal.org/index.php/topic,72801.0.html

https://github.com/BeRo1985/pasllm

This one is on my to-do list, haven't explored it myself yet, but have read very good reviews.

For your purpose you could also use schuler's code with creates its own LLM, works great but takes some considerable effort since you need to train it yourself (well, that is your plan anyway). Once running, it works great.
See: https://forum.lazarus.freepascal.org/index.php/topic,71128.0.html
I don't know if his latest codebase is available, though: I am running some of his older code that he developed during and shortly after he finished his Ph.d in A.I. and that greatly differs from the above.

Both authors are forum members and both code bases are very well documented.

Both projects are of professional standard, they are not toys, they are just as good as what is commercially available.
« Last Edit: April 05, 2026, 08:01:57 am by Thaddy »
Recovered from removal of tumor in tongue following tongue reconstruction with a part from my leg.

paweld

  • Hero Member
  • *****
  • Posts: 1593
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #3 on: April 05, 2026, 08:37:27 am »
A while back, I tested several models at “LM Studio” to see how they performed when coding with “Lazarus.” In my opinion, the “Qwen2.5 Coder - 15B Instruct” model performed the best—it was the top choice in terms of processing speed and accuracy of responses. Smaller models [Qwen (3b and 7b), Deepseek 7b, Mistral 7b] produce nonsense with pascal code. I tried the “Deepseek Coder v2 16b” and “Devstral small 2505 24b,” but my laptop is too weak for this - the response speed was terrible.
The issues I raised were related to database management and networking (HTTP/TCP/SMTP/IMAP).

I also ran a few tests for C# and JavaScript, and even the smaller models (3b) perform quite well in this case.

Hardware specs I used for testing: i7-11850 / 64GB RAM / NVIDIA RTX A2000 4GB VRAM
Best regards / Pozdrawiam
paweld

gidesa

  • Full Member
  • ***
  • Posts: 237
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #4 on: April 08, 2026, 11:55:59 am »
I have a graphic card with 12 gb of Vram. Qwen3 8B VL "instruct" work well, quantized Q4-K-M. I can reach a context length around 65000. It has also very powerful image understanding, I am especially very impressed by OCR.
The "thinking" models are terrible, they "eat" a big part of context with "thinking" details.
Now I trying Qwen3 14B Q4-K-M, it works, but context length is reduced around 16000, that is small.
But note that these models don't work with agents, because also 65000 context length is too small.
« Last Edit: April 08, 2026, 12:00:42 pm by gidesa »

LeP

  • Full Member
  • ***
  • Posts: 224
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #5 on: April 08, 2026, 02:12:04 pm »
I have a graphic card with 12 gb of Vram. Qwen3 8B VL "instruct" work well, quantized Q4-K-M. I can reach a context length around 65000. It has also very powerful image understanding, I am especially very impressed by OCR.
The "thinking" models are terrible, they "eat" a big part of context with "thinking" details.
Now I trying Qwen3 14B Q4-K-M, it works, but context length is reduced around 16000, that is small.
But note that these models don't work with agents, because also 65000 context length is too small.
Try Qwen3.5 35b

EDIT:

- 7 GBytes of VRam;
- 20 GBytes of RAM;

Model datas:
    parameters          36.0B
    context length      262144
    embedding length    2048
    quantization        Q4_K_M
    requires            0.17.1

  Capabilities
    completion
    vision
    tools
    thinking
« Last Edit: April 08, 2026, 02:17:46 pm by LeP »
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

gidesa

  • Full Member
  • ***
  • Posts: 237
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #6 on: April 08, 2026, 02:36:18 pm »
Try Qwen3.5 35b

EDIT:

- 7 GBytes of VRam;
- 20 GBytes of RAM;

Model datas:
    parameters          36.0B
    context length      262144
    embedding length    2048
    quantization        Q4_K_M
    requires            0.17.1

  Capabilities
    completion
    vision
    tools
    thinking

So, if I understand, you have only 7 GB of graphic Vram and have success to load this 35B model, using all 262000 context length?

LeP

  • Full Member
  • ***
  • Posts: 224
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #7 on: April 08, 2026, 04:19:04 pm »
So, if I understand, you have only 7 GB of graphic Vram and have success to load this 35B model, using all 262000 context length?

The model use 7 GB out of 8 (+ optional 16 GB shared available) of VRAM (Nvidia 4070) and 20 GB of CPU RAM (Intel I9 14th, available 64 GB of RAM).

I don't try full context length, but I think it should work, why not ?
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

gidesa

  • Full Member
  • ***
  • Posts: 237
Re: Which quantized model works well for local AI ? (Pascal programming)
« Reply #8 on: April 08, 2026, 06:14:16 pm »
The model use 7 GB out of 8 (+ optional 16 GB shared available) of VRAM (Nvidia 4070) and 20 GB of CPU RAM (Intel I9 14th, available 64 GB of RAM).

I don't try full context length, but I think it should work, why not ?

Thank you, now I am downloading the model. I will try. But I am dubious, because I have only 16 gb Ram :-)
Context use a big quantity  of Vram, subtracting it to network layers that are moved to Ram. But more on Ram, less is speed.

 

TinyPortal © 2005-2018