DeepSeek V4—almost on the frontier, a fraction of the price
DeepSeek V4—almost on the frontier, a fraction of the price
Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two …Simon Willison’s Weblog

zikzak025
in reply to ☆ Yσɠƚԋσʂ ☆ • • •Dr_Vindaloo
in reply to zikzak025 • • •slacktoid
in reply to Dr_Vindaloo • • •HiddenLayer555
in reply to ☆ Yσɠƚԋσʂ ☆ • • •☆ Yσɠƚԋσʂ ☆
in reply to HiddenLayer555 • • •audaxdreik
in reply to ☆ Yσɠƚԋσʂ ☆ • • •☆ Yσɠƚԋσʂ ☆
in reply to audaxdreik • • •fubarx
in reply to ☆ Yσɠƚԋσʂ ☆ • • •Simon may want to randomize his Pelican/Bicycle test.
There is a long tradition in tech of firms tweaking their outputs to get higher scores on well-known tests. The ultimate example is VW Dieselgate.
But in AI, it's easy to game benchmarks, by adding the best answers to the training set for the next version.