DiscussionsDiscussion on Claude Plays Pok茅mon: Insights for LLM BenchmarkingVibhu S.路Mar 29, 2025 09:13 PMShareOpen with AIAnyone have thoughts on claude plays pokemon (https://www.twitch.tv/claudeplayspokemon / https://x.com/AnthropicAI/status/1894419011569344978) and what they would want to see measured if something like this became a general llm / agent benchmark?2