To use `llm_classify` with experiments in Phoenix, you first prepare your dataset (as a pandas DataFrame), define a task, and then use `llm_classify` as an evaluator within the `run_experiment` function. The typical workflow is: upload your dataset, define a task function, define evaluators (using `llm_classify`), and run the experiment. For example:
```
from phoenix.evals import llm_classify, OpenAIModel
from phoenix.experiments import run_experiment
import phoenix as px
# Prepare your dataset as a DataFrame
dataset = px.Client().upload_dataset(dataframe=df, input_keys=["question"], output_keys=["answer"])
# Define your task function
def task(example):
return my_llm(example["question"])
# Define evaluator using llm_classify
def evaluator(ex):
return llm_classify(
dataframe=ex.as_dataframe(),
template=your_template,
model=OpenAIModel("gpt-4"),
rails=["correct", "incorrect"]
)
# Run experiment
experiment = run_experiment(dataset, task, evaluators=[evaluator])
```
This approach is documented in <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-experiments/run-experiments.md|Phoenix experiments how-to> and <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/bring-your-own-evaluator.md|llm_classify usage>.
Would you like more detail or a more specific example for a particular LLM SDK?
Sources:
- <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-experiments/run-experiments.md|Run Experiments>
- <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/bring-your-own-evaluator.md|Bring Your Own Evaluator>