Hey everyone - we re-ran the AnthropicAI tests with the prompting guidance the Anthropic team sent us! It was better - total misses went from 165 to 74.
Key Takeaway - Small differences in prompts can have drastically different outcomes for some LLMs. Therefore, evaluation of the prompt you’re using for your task is critical
Check out the results here:
https://twitter.com/aparnadhinak/status/1736809013864472954