Skip to main content


Pelicans for Opus 4.6 and Codex 5.3 - I don't have much interesting to say about these models yet to be honest, they're both incremental improvements on their predecessors and very capable simonwillison.net/2026/Feb/5/t…
in reply to Simon Willison

it could be that the guys at #Anthropic know you and your "pelican on a bicycle" test, since you are a well known AI blogger
in reply to Simon Willison

"I've been having trouble finding tasks that those previous models couldn't handle but the new ones are able to ace." Ask it to write assembly. Gemini 3 and Opus 4.5 were the first I could get to write non-trivial assembly programs, though they both failed to write "life" with sixel graphics.
in reply to Simon Willison

haven't tried opus 4.6 yet but 4.5 couldn't generate emails with the beefree simple schema json with a design that actually looked that great
docs.beefree.io/beefree-sdk/da…
in reply to Simon Willison

"I've had a bit of preview access to both of these models and to be honest I'm finding it hard to find a good angle to write about them"

How about rating their own and each other's completed code with different instances?

From my experience, Claude was still much worse considering overall planning. Also web search on Claude seemed to be much worse than GPT 5.2.