Claude 3 Opus: A Strong Competitor to GPT-4 in AI Testing
Included is our research on Claude 3 Opus, I think its the most extensive hand crafted test suite yet run on Claude (meaning its not overfit yet): https://x.com/aparnadhinak/status/1766161976529711298?s=20 The take away, Claude 3 is the first competitor against GPT-4 that is real. It is an incredibly solid model. The haystack tests were complex generations requiring retrievals, arithmetic, rounding and formatting. Claude 2.1 wasn't even close to usable, where Claude 3 Opus was highly competitive with GPT-4. We also ran a suite of Evals that hit on Code Gen, summarization, etc.... ---- There is a twitter hot take that Claude 3 is better than GPT-4, I wouldn't jump to conclusion given what we see. Build your own opinion and Evals on your own datasets.
