Got #qwen3coder 30B #LLM to work on #Framework Desktop. I tested it with e.g. "Write an Angular login page to oauth idp server. Use best practices. use HttpOnly cookies." and few others. It thinks few seconds and writes the code about in 30 seconds. The codes looked okay. I'm satisfied what i got.
And it's quiet - even when thinking! The monster multi-GPU AI machines are so outdated with a price of one 5090.
I followed github.com/pablo-ross/strix-ha… to install #llamacpp. Some changes from it:
- Skipped kernel update due to newer Ubuntu 25.10.
- Got sudo group error from distrobox. Removed --group-add sudo from distrobox create.
- Tweaked run parameters. I hacked them with trial and error. Looks like i can increase "context size" a lot.
Current command to run Qwen is:
```
distrobox enter llama-rocm-7rc-rocwmma -- ~/llama.cpp/build/bin/llama-cli -m ~/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf --no-mmap -ngl 99 --ctx_size 16384 -n 20000
```
#homelab #AI
strix-halo-gmktec-evo-x2/ROADMAP.md at main · pablo-ross/strix-halo-gmktec-evo-x2
Configuration and documentation for optimizing Ubuntu 24.04 on AMD Ryzen AI Max+ 395 with Radeon 8060S for LLM inference using llama.cpp with ROCm 7 RC and rocWMMA. - pablo-ross/strix-halo-gmktec-e...GitHub
