I have a graphic card with 12 gb of Vram. Qwen3 8B VL "instruct" work well, quantized Q4-K-M. I can reach a context length around 65000. It has also very powerful image understanding, I am especially very impressed by OCR.
The "thinking" models are terrible, they "eat" a big part of context with "thinking" details.
Now I trying Qwen3 14B Q4-K-M, it works, but context length is reduced around 16000, that is small.
But note that these models don't work with agents, because also 65000 context length is too small.