Skip to main content


DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence


in reply to ☆ Yσɠƚԋσʂ ☆

865GB? I can’t run that locally. I want like 30 specialized 100GB models I can run locally. I can’t load/unload them as needed. Would take longer to do the inference but things have gotten good enough to set it and forget it.
in reply to monkeyslikebananas2

It looks like you can run a low quant version on a 125gb machine, and apparently performance is still really good. github.com/makepad/llama_antir…
in reply to ☆ Yσɠƚԋσʂ ☆

On OpenCode Go, Deepseek V4 Flash is crazy cheap, and a lot of people are saying they're getting good results from it. V4 Pro is said to be competitive with Kimi K2.6 and GLM 5.1, and its also a lot cheaper at least for now.
This entry was edited (3 weeks ago)
in reply to ☆ Yσɠƚԋσʂ ☆

Holy shit, I barely learned what the quadratic cost of attention was like 2 weeks ago. Can we hit the brakes a bit, before we start optimizing the shit out of everything? I am going to get lost in the layers of abstraction.