Extra Spicy Morning Paper Review: New Benchmark Challenges LRMs

·Jun 18, 2025 03:12 PM

👋 This morning's paper read will be served Extra Spicy by Dylan C. and Parth S.. 🔥 A new paper from researchers at Apple challenges today’s evaluation methods and introduces a new benchmark: synthetic puzzles with controllable complexity and clean logic. Their findings? LRMs show surprising failure modes, including a complete collapse on high-complexity tasks and a decline in reasoning effort as problems get harder. BUT THEN someone at Anthropic published a response aptly titled The Illusion of the Illusion of Thinking, which argues that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Here's a direct link to join at 10:00am PT 👉 https://arize.zoom.us/j/89593430181

🧂3

Extra Spicy Morning Paper Review: New Benchmark Challenges LRMs

1 comment

Extra Spicy Morning Paper Review: New Benchmark Challenges LRMs

1 comment