Arize AI Community Icon

Home
Events

Arize AX Releases
Arize AX Support
Arize News
Arize Observe
Discussions
Introductions
Phoenix Support

Powered by Tightknit

Phoenix Support

Phoenix Support

Using llm_classify with Experiments: Best Practices and Examples

·Jun 20, 2025 06:28 AM

Hi guys, Does anyone have an example to share on how to use llm_classify with experiments or use an LLM SDK to reproduce same functionality?

1 comment

· Sorted by Oldest

RunLLM
APP
·
To use llm_classify with experiments in Phoenix, you first prepare your dataset (as a pandas DataFrame), define a task, and then use llm_classify as an evaluator within the run_experiment function. The typical workflow is: upload your dataset, define a task function, define evaluators (using llm_classify), and run the experiment. For example: LBKSfrom phoenix.evals import llm_classify, OpenAIModel from phoenix.experiments import run_experiment import phoenix as pxLBKS# Prepare your dataset as a DataFrame dataset = px.Client().upload_dataset(dataframe=df, input_keys=["question"], output_keys=["answer"])LBKS# Define your task function def task(example): return my_llm(example["question"])LBKS# Define evaluator using llm_classify def evaluator(ex): return llm_classify( dataframe=ex.as_dataframe(), template=your_template, model=OpenAIModel("gpt-4"), rails=["correct", "incorrect"] )LBKS# Run experiment experiment = run_experiment(dataset, task, evaluators=[evaluator])LBKSThis approach is documented in Phoenix experiments how-to and llm_classify usage.Would you like more detail or a more specific example for a particular LLM SDK?Sources: - Run Experiments - Bring Your Own Evaluator
💡 Hint: Mention RunLLM in the thread for followups.