yes, but that is a regular eval that i loaded in with LLM as a judge (clearly the rules aren't very accurate 😅) But I want to go in and add my own eval of how accurate the rule is based on my human opinion, because the LLM as a judge isn't so good
I want to do some human driven evals, seems like annotations is the right way to go. Is there a way to see aggregated results of annotations? Similar to how we see it for evals:
I added some annotations, but don't see any high level summary:
What's weird is when you sign up with google you get legacy token that works with the headers (simple token) but when you create account with email and password it takes you to the new portal with new key structure