Clarifying Evaluation Process for LLM Conversations

·Nov 07, 2023 05:33 PM

hi Amber R., as a follow up for the conversation eval, I totally follow on the retrieval of data to perform an eval but I was curious how the actual eval would take place i.e. would you run these against another LLM after establishing ground truth responses to see if the end of the conversation achieved the initial user's request, etc.?

2 comments

· Sorted by Oldest

Arize News

Clarifying Evaluation Process for LLM Conversations

John L.

·Nov 07, 2023 05:33 PM

hi Amber R., as a follow up for the conversation eval, I totally follow on the retrieval of data to perform an eval but I was curious how the actual eval would take place i.e. would you run these against another LLM after establishing ground truth responses to see if the end of the conversation achieved the initial user's request, etc.?

2 comments

· Sorted by Oldest

Amber R.
·
Hey John L., thanks for joining! Great question.We see some teams doing what you mentioned (run these against another LLM), but it really depends on the use case. For internal document summarization there are task based metrics coupled with peer reviews to get a ground truth, but for chatbots interacting with users teams are looking for positive entities in responses and direct user feedback (thumbs up or a smile face).
1
Amber R.
·
Let me know if you additional questions or use cases in mind!
👍1

Amber R.
·
Hey John L., thanks for joining! Great question.We see some teams doing what you mentioned (run these against another LLM), but it really depends on the use case. For internal document summarization there are task based metrics coupled with peer reviews to get a ground truth, but for chatbots interacting with users teams are looking for positive entities in responses and direct user feedback (thumbs up or a smile face).
1
Amber R.
·
Let me know if you additional questions or use cases in mind!
👍1