To use llm_classify with experiments in Phoenix, you first prepare your dataset (as a pandas DataFrame), define a task, and then use llm_classify as an evaluator within the run_experiment function. The typical workflow is: upload your dataset, define a task function, define evaluators (using llm_classify), and run the experiment. For example: LBKSfrom phoenix.evals import llm_classify, OpenAIModel from phoenix.experiments import run_experiment import phoenix as pxLBKS# Prepare your dataset as a DataFrame dataset = px.Client().upload_dataset(dataframe=df, input_keys=["question"], output_keys=["answer"])LBKS# Define your task function def task(example): return my_llm(example["question"])LBKS# Define evaluator using llm_classify def evaluator(ex): return llm_classify( dataframe=ex.as_dataframe(), template=your_template, model=OpenAIModel("gpt-4"), rails=["correct", "incorrect"] )LBKS# Run experiment experiment = run_experiment(dataset, task, evaluators=[evaluator])LBKSThis approach is documented in Phoenix experiments how-to and llm_classify usage.Would you like more detail or a more specific example for a particular LLM SDK?Sources:
- Run Experiments
- Bring Your Own Evaluator
馃挕聽Hint: Mention RunLLM in the thread for followups.