Skip to main content


Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090


This entry was edited (1 day ago)
in reply to ☆ Yσɠƚԋσʂ ☆

The irony. Before llamacpp the only way to run llama was using other and on Nvidia GPUs. Then llamacpp expanded to other models, introduced gguf, added backends to run on GPUs and now we're taking about running qwen using just python on a single Nvidia. Ouroboros is complete.