Hey John L., thanks for joining! Great question.We see some teams doing what you mentioned (run these against another LLM), but it really depends on the use case. For internal document summarization there are task based metrics coupled with peer reviews to get a ground truth, but for chatbots interacting with users teams are looking for positive entities in responses and direct user feedback (thumbs up or a smile face).