Do you know which model is "the smartest"? And do they take as much RAM as they take disk space?
Among these models, Qwen3 is considered the smartest, with Qwen3 being the most advanced line. And regarding memory usage, PasLLM uses memory mapping to load models, which means it primarily uses disk space rather than RAM for the model weights themselves. This allows it to handle larger models without requiring a significant amount of RAM. However, the actual RAM usage can vary depending on the specific context size currently used, the bigger the context, the more RAM is needed for processing.
The Qwen 4B 2507 models and the Qwen 3 30B A3B 2507 models are currently the best. And Qwen3 4B 2507 is newer than the Qwen3 models without 2507 in their names, so it is generally considered better. For example, Qwen3 4B 2507 can outperform the several months older Qwen3 8B, but not necessarily in all cases, it depends on the tasks. And Qwen 3 30B A3B 2507 is better than all Qwen3 models with fewer parameters, even though it uses only 3B active parameters out of the 30B total parameters, making it more efficient since it is a mixture of experts model (MoE). And the fun fact, at
https://chat.qwen.ai/ you can also use exactly the same models like Qwen3 30B A3B. This means the locally runnable Qwen models are just as smart as the online Qwen models, because they are exactly the same. And if you had enough RAM, you could also convert and run the Qwen3 235B A22B Instruct model locally, which is the biggest Qwen model available, using 22B active parameters out of the 235B total parameters. But for that you would need a lot of RAM and disk space, far over 128GB RAM and over 1TB disk space, so roughly speaking a high end server setup. But it would be on the same level as ChatGPT, Gemini, and other top models, more or less, depending on the tasks.
So in summary, Qwen3 models are currently the smartest among the PasLLM supported models, with Qwen3 4B 2507 and Qwen3 30B A3B 2507 being the best options. And they use disk space for model weights due to memory mapping, while RAM usage depends on the context size used during inference.