Claude 4 models are meaningfully better on code benchmarks than o3 โ the best reasoning model so far.
And Anthropic is bundling web search, a Python execution sandbox and a files API ๐คฏ
As far as dev-ex for agents goes, I think Anthropic has pulled ahead now!