Displaying Precision Metrics with Log Evaluations in Phoenix UI

·Sep 04, 2024 09:41 AM

If I mark my spans as false positives/true positives and log it to phoenix as log_evaluations how to display precision as a metric on top of the UI?

6 comments

· Sorted by Oldest

RunLLM
APP
·
To display precision as a metric on top of the Phoenix UI, you need to log your evaluations and then ensure that the precision metric is calculated and displayed. Here's a concise breakdown:
1. Log Evaluations: Use the log_evaluations method to log your span evaluations, marking them as true positives or false positives.
2. Calculate Precision: Precision is calculated as the number of true positives divided by the sum of true positives and false positives. You can use a custom evaluation function to calculate this metric.
3. Display in UI: Ensure that the precision metric is included in the evaluations logged to Phoenix, so it can be displayed in the UI.
Here's a step-by-step approach:
Step 1: Log Evaluations
import pandas as pd from phoenix.trace import SpanEvaluations import phoenix as px # Example DataFrame with span evaluations data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data) # Convert labels to binary scores evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) # Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision Eval", dataframe=evals_df))
Step 2: Calculate Precision
def calculate_precision(evals_df): true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions return precision precision = calculate_precision(evals_df) print(f"Precision: {precision}")
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - GitHub Issue #1163 - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
Alicja K.
·
RunLLM yes I see you calculate precision but the only thing you send to phoenix is false/true positive labels. how to send precision calculation?
RunLLM
APP
·
To send the precision calculation to Phoenix, you need to log the precision metric as part of your evaluations. Phoenix can then display this metric in the UI. Here's how you can do it:
1. Calculate Precision: Calculate the precision based on your true positive and false positive labels. 2. Log Precision Metric: Log the precision metric to Phoenix using the log_evaluations method.
Step-by-Step Guide
Step 1: Calculate Precision First, calculate the precision based on your evaluation data.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)
true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions print(f"Precision: {precision}")
LBKS**Step 2: Log Precision Metric** Log the precision metric to Phoenix using the `log_evaluations` method. LBKS``` from phoenix.trace import SpanEvaluations import phoenix as px # Create a DataFrame for precision metric precision_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as precision is a single value 'precision': [precision] }) # Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision", dataframe=precision_df))
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
Alicja K.
·
RunLLM so in order to display my own calculated global metric in phoenix UI I have to send a singular calculation with dummy span id? lets say I want to mark my spans as true/false positive/negative and want to calculate myself and siplay specificity and precision
RunLLM
APP
·
To display your own calculated global metrics like precision and specificity in the Phoenix UI, you can follow these steps:
1. Mark Spans: Mark your spans as true positive, false positive, true negative, or false negative. 2. Calculate Metrics: Calculate the desired metrics (precision, specificity) based on these labels. 3. Log Metrics: Log these calculated metrics to Phoenix using a dummy span ID or a custom evaluation method.
Step-by-Step Guide
Step 1: Mark Spans Create a DataFrame with your span evaluations, marking them as true positive, false positive, true negative, or false negative.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4, 5, 6], 'label': ['true_positive', 'false_positive', 'true_negative', 'false_negative', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['tp'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) evals_df['fp'] = evals_df['label'].apply(lambda x: 1 if x == 'false_positive' else 0) evals_df['tn'] = evals_df['label'].apply(lambda x: 1 if x == 'true_negative' else 0) evals_df['fn'] = evals_df['label'].apply(lambda x: 1 if x == 'false_negative' else 0)
LBKS**Step 2: Calculate Metrics** Calculate precision and specificity based on the labels. LBKS``` # Calculate precision true_positives = evals_df['tp'].sum() false_positives = evals_df['fp'].sum() precision = true_positives / (true_positives + false_positives) # Calculate specificity true_negatives = evals_df['tn'].sum() false_negatives = evals_df['fn'].sum() specificity = true_negatives / (true_negatives + false_positives) print(f"Precision: {precision}") print(f"Specificity: {specificity}")
Step 3: Log Metrics Log these calculated metrics to Phoenix using a dummy span ID.
```
from phoenix.trace import SpanEvaluations import phoenix as px
metrics_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as these are global metrics 'precision': [precision], 'specificity': [specificity] })
client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))
LBKS**Step 4: Display in UI** Ensure that the precision and specificity metrics are included in the evaluations logged to Phoenix. Phoenix will automatically display these metrics if they are part of the logged evaluations. For more details, refer to the [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations). Would you like more detail on any specific part? **Sources:** - [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations)
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
hi Alicja K. sorry for the late response, as far as I understand we don't currently have an evaluation summary that would calculate precision automatically at the moment maybe the closest that can happen is if there's an enclosing trace that the precision is computed manually for all spans contained within the trace. What kind of grouping for computing the precision would you find the most helpful? (trace, project, etc)

RunLLM
APP
·
To display precision as a metric on top of the Phoenix UI, you need to log your evaluations and then ensure that the precision metric is calculated and displayed. Here's a concise breakdown:
1. Log Evaluations: Use the log_evaluations method to log your span evaluations, marking them as true positives or false positives.
2. Calculate Precision: Precision is calculated as the number of true positives divided by the sum of true positives and false positives. You can use a custom evaluation function to calculate this metric.
3. Display in UI: Ensure that the precision metric is included in the evaluations logged to Phoenix, so it can be displayed in the UI.
Here's a step-by-step approach:
Step 1: Log Evaluations
import pandas as pd from phoenix.trace import SpanEvaluations import phoenix as px # Example DataFrame with span evaluations data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data) # Convert labels to binary scores evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) # Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision Eval", dataframe=evals_df))
Step 2: Calculate Precision
def calculate_precision(evals_df): true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions return precision precision = calculate_precision(evals_df) print(f"Precision: {precision}")
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - GitHub Issue #1163 - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
Alicja K.
·
RunLLM yes I see you calculate precision but the only thing you send to phoenix is false/true positive labels. how to send precision calculation?
RunLLM
APP
·
To send the precision calculation to Phoenix, you need to log the precision metric as part of your evaluations. Phoenix can then display this metric in the UI. Here's how you can do it:
1. Calculate Precision: Calculate the precision based on your true positive and false positive labels. 2. Log Precision Metric: Log the precision metric to Phoenix using the log_evaluations method.
Step-by-Step Guide
Step 1: Calculate Precision First, calculate the precision based on your evaluation data.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)
true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions print(f"Precision: {precision}")
LBKS**Step 2: Log Precision Metric** Log the precision metric to Phoenix using the `log_evaluations` method. LBKS``` from phoenix.trace import SpanEvaluations import phoenix as px # Create a DataFrame for precision metric precision_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as precision is a single value 'precision': [precision] }) # Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision", dataframe=precision_df))
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
Alicja K.
·
RunLLM so in order to display my own calculated global metric in phoenix UI I have to send a singular calculation with dummy span id? lets say I want to mark my spans as true/false positive/negative and want to calculate myself and siplay specificity and precision
RunLLM
APP
·
To display your own calculated global metrics like precision and specificity in the Phoenix UI, you can follow these steps:
1. Mark Spans: Mark your spans as true positive, false positive, true negative, or false negative. 2. Calculate Metrics: Calculate the desired metrics (precision, specificity) based on these labels. 3. Log Metrics: Log these calculated metrics to Phoenix using a dummy span ID or a custom evaluation method.
Step-by-Step Guide
Step 1: Mark Spans Create a DataFrame with your span evaluations, marking them as true positive, false positive, true negative, or false negative.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4, 5, 6], 'label': ['true_positive', 'false_positive', 'true_negative', 'false_negative', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['tp'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) evals_df['fp'] = evals_df['label'].apply(lambda x: 1 if x == 'false_positive' else 0) evals_df['tn'] = evals_df['label'].apply(lambda x: 1 if x == 'true_negative' else 0) evals_df['fn'] = evals_df['label'].apply(lambda x: 1 if x == 'false_negative' else 0)
LBKS**Step 2: Calculate Metrics** Calculate precision and specificity based on the labels. LBKS``` # Calculate precision true_positives = evals_df['tp'].sum() false_positives = evals_df['fp'].sum() precision = true_positives / (true_positives + false_positives) # Calculate specificity true_negatives = evals_df['tn'].sum() false_negatives = evals_df['fn'].sum() specificity = true_negatives / (true_negatives + false_positives) print(f"Precision: {precision}") print(f"Specificity: {specificity}")
Step 3: Log Metrics Log these calculated metrics to Phoenix using a dummy span ID.
```
from phoenix.trace import SpanEvaluations import phoenix as px
metrics_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as these are global metrics 'precision': [precision], 'specificity': [specificity] })
client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))
LBKS**Step 4: Display in UI** Ensure that the precision and specificity metrics are included in the evaluations logged to Phoenix. Phoenix will automatically display these metrics if they are part of the logged evaluations. For more details, refer to the [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations). Would you like more detail on any specific part? **Sources:** - [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations)
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
hi Alicja K. sorry for the late response, as far as I understand we don't currently have an evaluation summary that would calculate precision automatically at the moment maybe the closest that can happen is if there's an enclosing trace that the precision is computed manually for all spans contained within the trace. What kind of grouping for computing the precision would you find the most helpful? (trace, project, etc)