Thank you Mikyo and Jason. I have worked several use cases. Sometimes need ranges and other times need classifications. Explanations help yes quite a lot - but not as volume increases. Then we need humans (or me), I like your idea of “mini” versions and will explore more in that direction.
I have been struggling with LLM Judges. Variance, inconsistency, latency, cost, etc. - many problems. Anyone figured it out here? and using LLM judges like a breeze?