RunLLM is it possible to only log custom experiment runs' results to a dataset in phoenix? I have custom logic for evaluations, all i wanna do is log the results and have them be collated by dataset on phoenix ?
Yes, you can log custom experiment runs' results to a dataset in Phoenix. You can use the run_experiment function to execute your custom logic on each example in a dataset and log the results. The results from each experiment, including evaluations, are stored in the Phoenix database for analysis and comparison. You can define your own evaluators and tasks to suit your custom logic, and the results will be associated with the dataset used in the experiment.Would you like more detail on how to set up the run_experiment function or define custom evaluators?Sources:
- https://github.com/Arize-ai/phoenix/pull/4521
- https://arize-ai.slack.com/archives/C04R3GXC8HK/p1725295210.398799
馃挕聽Hint: Mention RunLLM in the thread for followups.
To log your custom experiment results to Phoenix without using run_experiment, you can directly use the px.Client().log_evaluations() method. This method allows you to log evaluation results to Phoenix by creating SpanEvaluations or DocumentEvaluations objects from your results and associating them with the relevant dataset in Phoenix.Would you like more detail on how to format your results for logging or how to use the log_evaluations() method?Sources:
- update notebooks for px.Client().log_evaluations()
馃挕聽Hint: Mention RunLLM in the thread for followups.
To log specific experiment results to a dataset in Phoenix, you can use the px.Client().log_evaluations() method. First, ensure your results are in a pandas DataFrame with columns for span_id and any evaluation metrics like label or score. Then, create SpanEvaluations or DocumentEvaluations objects and log them using the Phoenix client.Here's a basic example: ```from phoenix.trace import SpanEvaluations
import phoenix as px# Create a Phoenix client
client = px.Client()# Assume results_df is your DataFrame with evaluation results
results_df['label'] = results_df['label'].astype(str) results_df['score'] = results_df['score'].astype(int)# Log evaluations to Phoenix client.log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=results_df) )```Would you like more detail on preparing your DataFrame or using the Phoenix client?Sources: - Upload evaluations to Phoenix
馃挕聽Hint: Mention RunLLM in the thread for followups.
The connection to the dataset in Phoenix is established through the span_id in your DataFrame. The span_id acts as a unique identifier that links each evaluation to the corresponding trace or example in the dataset. When you log evaluations using px.Client().log_evaluations(), Phoenix uses these span_id values to associate the evaluations with the correct dataset entries.If you need to specify a dataset explicitly, ensure that the span_id values in your DataFrame match those in the dataset you uploaded to Phoenix. This way, the evaluations are correctly mapped to the dataset entries.Would you like more detail on how to ensure your span_id values are correctly set up?Sources:
- Log Evaluation Results
馃挕聽Hint: Mention RunLLM in the thread for followups.
Yeah exactly , just wanted phoenix to act as source of truth for experiment results on top of the datasets we鈥檝e added But want log experiment results manually instead of using run experiment function
I could probably refactor my take to be used by run experiment but wanted to see if there was something quicker
Hmm let me think about this for a moment I think you may be able to, what format are your experiment results in?
So I have a list of result items Each result item is made up of Id, input, golden output, and metadata of example + a output response field + output metadata fields
Swaraj R. you might try submitting manually via the API, here's a sketch
from phoenix.experiments.types import ExperimentRun, Experiment
from phoenix.utilities.json import jsonify
httpx_client = httpx.Client() # configure to point to the phoenix server
dataset = ... # given a dataset
payload = {
"version_id": dataset.version_id,
"name": experiment_name,
"description": experiment_description,
"metadata": experiment_metadata,
"repetitions": 1,
}
experiment_response = httpx_client.post(
f"/v1/datasets/{dataset.id}/experiments",
json=payload,
)
experiment_response.raise_for_status()
exp_json = experiment_response.json()["data"]
project_name = exp_json["project_name"]
experiment = Experiment(
dataset_id=dataset.id,
dataset_version_id=dataset.version_id,
repetitions=repetitions,
id=exp_json["id"],
project_name=project_name,
)
exp_run = ExperimentRun(
start_time=_decode_unix_nano(cast(int, span.start_time)),
end_time=_decode_unix_nano(cast(int, span.end_time)),
experiment_id=experiment.id,
dataset_example_id=example.id,
repetition_number=repetition_number,
output=output,
error=repr(error) if error else None,
trace_id=_str_trace_id(span.get_span_context().trace_id), # type: ignore[no-untyped-call]
)
resp = httpx_client.post(f"/v1/experiments/{experiment.id}/runs", json=jsonify(exp_run))
resp.raise_for_status()Awesome will try this and get back to you today
