Logging Custom Experiment Results to Phoenix Dataset

·Oct 30, 2024 05:42 AM

RunLLM is it possible to only log custom experiment runs' results to a dataset in phoenix? I have custom logic for evaluations, all i wanna do is log the results and have them be collated by dataset on phoenix ?

14 comments

· Sorted by Oldest

RunLLM
APP
·
Yes, you can log custom experiment runs' results to a dataset in Phoenix. You can use the run_experiment function to execute your custom logic on each example in a dataset and log the results. The results from each experiment, including evaluations, are stored in the Phoenix database for analysis and comparison. You can define your own evaluators and tasks to suit your custom logic, and the results will be associated with the dataset used in the experiment.
Would you like more detail on how to set up the run_experiment function or define custom evaluators?
Sources: - https://github.com/Arize-ai/phoenix/pull/4521 - https://arize-ai.slack.com/archives/C04R3GXC8HK/p1725295210.398799
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM I don't want to use run_experiment, I already have experiment runners, i'd only like to save the results back into phoenix since the dataset lives there
RunLLM
APP
·
To log your custom experiment results to Phoenix without using run_experiment, you can directly use the px.Client().log_evaluations() method. This method allows you to log evaluation results to Phoenix by creating SpanEvaluations or DocumentEvaluations objects from your results and associating them with the relevant dataset in Phoenix.
Would you like more detail on how to format your results for logging or how to use the log_evaluations() method?
Sources: - update notebooks for px.Client().log_evaluations()
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM yeah how can I log specific experiemnt results to a pandas dataset that I uploaded
RunLLM
APP
·
To log specific experiment results to a dataset in Phoenix, you can use the px.Client().log_evaluations() method. First, ensure your results are in a pandas DataFrame with columns for span_id and any evaluation metrics like label or score. Then, create SpanEvaluations or DocumentEvaluations objects and log them using the Phoenix client.
Here's a basic example:
```
from phoenix.trace import SpanEvaluations import phoenix as px
client = px.Client()
results_df['label'] = results_df['label'].astype(str) results_df['score'] = results_df['score'].astype(int)
client.log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=results_df) )
LBKSWould you like more detail on preparing your DataFrame or using the Phoenix client? Sources: - [Upload evaluations to Phoenix](https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#upload-evaluations-to-phoenix)
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM how is this getting connected to the dataset, I don't see dataset_name or example id anywhere ?
RunLLM
APP
·
The connection to the dataset in Phoenix is established through the span_id in your DataFrame. The span_id acts as a unique identifier that links each evaluation to the corresponding trace or example in the dataset. When you log evaluations using px.Client().log_evaluations(), Phoenix uses these span_id values to associate the evaluations with the correct dataset entries.
If you need to specify a dataset explicitly, ensure that the span_id values in your DataFrame match those in the dataset you uploaded to Phoenix. This way, the evaluations are correctly mapped to the dataset entries.
Would you like more detail on how to ensure your span_id values are correctly set up?
Sources: - Log Evaluation Results
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
hi Swaraj R. can you explain this a little more? Are you looking to log experimment runs that aren't executed using our run_experiment function?
Swaraj R.
·
Yeah exactly , just wanted phoenix to act as source of truth for experiment results on top of the datasets we’ve added But want log experiment results manually instead of using run experiment function
Swaraj R.
·
I could probably refactor my take to be used by run experiment but wanted to see if there was something quicker
Dustin N.
·
Hmm let me think about this for a moment I think you may be able to, what format are your experiment results in?
Swaraj R.
·
So I have a list of result items Each result item is made up of Id, input, golden output, and metadata of example + a output response field + output metadata fields

Dustin N.

Swaraj R. you might try submitting manually via the API, here's a sketch

from phoenix.experiments.types import ExperimentRun, Experiment
from phoenix.utilities.json import jsonify

httpx_client = httpx.Client()  # configure to point to the phoenix server

dataset = ... # given a dataset
payload = {
    "version_id": dataset.version_id,
    "name": experiment_name,
    "description": experiment_description,
    "metadata": experiment_metadata,
    "repetitions": 1,
}
experiment_response = httpx_client.post(
    f"/v1/datasets/{dataset.id}/experiments",
    json=payload,
)
experiment_response.raise_for_status()
exp_json = experiment_response.json()["data"]
project_name = exp_json["project_name"]
experiment = Experiment(
    dataset_id=dataset.id,
    dataset_version_id=dataset.version_id,
    repetitions=repetitions,
    id=exp_json["id"],
    project_name=project_name,
)

exp_run = ExperimentRun(
    start_time=_decode_unix_nano(cast(int, span.start_time)),
    end_time=_decode_unix_nano(cast(int, span.end_time)),
    experiment_id=experiment.id,
    dataset_example_id=example.id,
    repetition_number=repetition_number,
    output=output,
    error=repr(error) if error else None,
    trace_id=_str_trace_id(span.get_span_context().trace_id),  # type: ignore[no-untyped-call]
)
resp = httpx_client.post(f"/v1/experiments/{experiment.id}/runs", json=jsonify(exp_run))
resp.raise_for_status()

Swaraj R.
·
Awesome will try this and get back to you today

RunLLM
APP
·
Yes, you can log custom experiment runs' results to a dataset in Phoenix. You can use the run_experiment function to execute your custom logic on each example in a dataset and log the results. The results from each experiment, including evaluations, are stored in the Phoenix database for analysis and comparison. You can define your own evaluators and tasks to suit your custom logic, and the results will be associated with the dataset used in the experiment.
Would you like more detail on how to set up the run_experiment function or define custom evaluators?
Sources: - https://github.com/Arize-ai/phoenix/pull/4521 - https://arize-ai.slack.com/archives/C04R3GXC8HK/p1725295210.398799
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM I don't want to use run_experiment, I already have experiment runners, i'd only like to save the results back into phoenix since the dataset lives there
RunLLM
APP
·
To log your custom experiment results to Phoenix without using run_experiment, you can directly use the px.Client().log_evaluations() method. This method allows you to log evaluation results to Phoenix by creating SpanEvaluations or DocumentEvaluations objects from your results and associating them with the relevant dataset in Phoenix.
Would you like more detail on how to format your results for logging or how to use the log_evaluations() method?
Sources: - update notebooks for px.Client().log_evaluations()
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM yeah how can I log specific experiemnt results to a pandas dataset that I uploaded
RunLLM
APP
·
To log specific experiment results to a dataset in Phoenix, you can use the px.Client().log_evaluations() method. First, ensure your results are in a pandas DataFrame with columns for span_id and any evaluation metrics like label or score. Then, create SpanEvaluations or DocumentEvaluations objects and log them using the Phoenix client.
Here's a basic example:
```
from phoenix.trace import SpanEvaluations import phoenix as px
client = px.Client()
results_df['label'] = results_df['label'].astype(str) results_df['score'] = results_df['score'].astype(int)
client.log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=results_df) )
LBKSWould you like more detail on preparing your DataFrame or using the Phoenix client? Sources: - [Upload evaluations to Phoenix](https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#upload-evaluations-to-phoenix)
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM how is this getting connected to the dataset, I don't see dataset_name or example id anywhere ?
RunLLM
APP
·
The connection to the dataset in Phoenix is established through the span_id in your DataFrame. The span_id acts as a unique identifier that links each evaluation to the corresponding trace or example in the dataset. When you log evaluations using px.Client().log_evaluations(), Phoenix uses these span_id values to associate the evaluations with the correct dataset entries.
If you need to specify a dataset explicitly, ensure that the span_id values in your DataFrame match those in the dataset you uploaded to Phoenix. This way, the evaluations are correctly mapped to the dataset entries.
Would you like more detail on how to ensure your span_id values are correctly set up?
Sources: - Log Evaluation Results
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
hi Swaraj R. can you explain this a little more? Are you looking to log experimment runs that aren't executed using our run_experiment function?
Swaraj R.
·
Yeah exactly , just wanted phoenix to act as source of truth for experiment results on top of the datasets we’ve added But want log experiment results manually instead of using run experiment function
Swaraj R.
·
I could probably refactor my take to be used by run experiment but wanted to see if there was something quicker
Dustin N.
·
Hmm let me think about this for a moment I think you may be able to, what format are your experiment results in?
Swaraj R.
·
So I have a list of result items Each result item is made up of Id, input, golden output, and metadata of example + a output response field + output metadata fields

Dustin N.

Swaraj R. you might try submitting manually via the API, here's a sketch

from phoenix.experiments.types import ExperimentRun, Experiment
from phoenix.utilities.json import jsonify

httpx_client = httpx.Client()  # configure to point to the phoenix server

dataset = ... # given a dataset
payload = {
    "version_id": dataset.version_id,
    "name": experiment_name,
    "description": experiment_description,
    "metadata": experiment_metadata,
    "repetitions": 1,
}
experiment_response = httpx_client.post(
    f"/v1/datasets/{dataset.id}/experiments",
    json=payload,
)
experiment_response.raise_for_status()
exp_json = experiment_response.json()["data"]
project_name = exp_json["project_name"]
experiment = Experiment(
    dataset_id=dataset.id,
    dataset_version_id=dataset.version_id,
    repetitions=repetitions,
    id=exp_json["id"],
    project_name=project_name,
)

exp_run = ExperimentRun(
    start_time=_decode_unix_nano(cast(int, span.start_time)),
    end_time=_decode_unix_nano(cast(int, span.end_time)),
    experiment_id=experiment.id,
    dataset_example_id=example.id,
    repetition_number=repetition_number,
    output=output,
    error=repr(error) if error else None,
    trace_id=_str_trace_id(span.get_span_context().trace_id),  # type: ignore[no-untyped-call]
)
resp = httpx_client.post(f"/v1/experiments/{experiment.id}/runs", json=jsonify(exp_run))
resp.raise_for_status()

Swaraj R.
·
Awesome will try this and get back to you today