Is there a way to implement a custom evaluator that doesn't use an LLM? One way I want to evaluate retrieval is whether a certain document appears in the results for a certain query. I have a dataset of query --> expected document to use for evaluation, and want to run retrieval for the queries in this dataset and measure how often the results include the expected document, and score based on what position the expected document was found. How would I do this using phoenix and attach this custom eval to a span?
In general you can run and create any Evals that you want. Our LLM Evals are really a helpful toolbox but you can send your own. They can be sent in to phoenix offline, as a job that pushes into the server. We are also rolling out "inline" Evals that allow you to run an Eval as a callback as the spans are sent. Is this something that is more hand crafted ad hoc analysis you want to send after the fact? Or is it something you know how to run the check "in-line" and desire to run this as spans are being "traced"?
Below is what to follow for the inline Evals: https://github.com/Arize-ai/phoenix/pull/3240
Hey Hain-Lee H. - yeah your evals don't need to be llm based, you just need to have dataframe of label or score with a name and you can annotate the trace. https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations
Jason I don't need it to run "inline" or at the same time as spans are generated. It is more hand-crafted ad hoc analysis. Mikyo thanks for that clarification! So as long as I have a function that can generate the dataframe linked with span_ids, then I just have to log them into Phoenix; is that right? Since I'm trying to generate an eval metric for the set of retrieval results, rather than score individual retrieved documents, would I use SpanEvaluation to log into Phoenix? I also saw TraceEvaluation but not sure what that is. THanks again for your help!
Hey Hain-Lee H. that's exactly right - evals are just label, score, explanation and are a type of annotation on any part of our tracing. The subject of the annotation can be a document span or trace (still not visible in the UI). So if you are scoring a span rather than a document, I would use SpanEvaluation
