If I mark my spans as false positives/true positives and log it to phoenix as log_evaluations how to display precision as a metric on top of the UI?
To display precision as a metric on top of the Phoenix UI, you need to log your evaluations and then ensure that the precision metric is calculated and displayed. Here's a concise breakdown:
1. Log Evaluations: Use the log_evaluations method to log your span evaluations, marking them as true positives or false positives.
2. Calculate Precision: Precision is calculated as the number of true positives divided by the sum of true positives and false positives. You can use a custom evaluation function to calculate this metric.
3. Display in UI: Ensure that the precision metric is included in the evaluations logged to Phoenix, so it can be displayed in the UI.
Here's a step-by-step approach:
Step 1: Log Evaluations
import pandas as pd
from phoenix.trace import SpanEvaluations
import phoenix as px
# Example DataFrame with span evaluations
data = {
'span_id': [1, 2, 3, 4],
'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive']
}
evals_df = pd.DataFrame(data)
# Convert labels to binary scores
evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)
# Log evaluations to Phoenix
client = px.Client()
client.log_evaluations(SpanEvaluations(eval_name="Precision Eval", dataframe=evals_df))
Step 2: Calculate Precision
def calculate_precision(evals_df):
true_positives = evals_df['score'].sum()
total_predictions = len(evals_df)
precision = true_positives / total_predictions
return precision
precision = calculate_precision(evals_df)
print(f"Precision: {precision}")
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - GitHub Issue #1163 - Phoenix documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
To send the precision calculation to Phoenix, you need to log the precision metric as part of your evaluations. Phoenix can then display this metric in the UI. Here's how you can do it:
1. Calculate Precision: Calculate the precision based on your true positive and false positive labels.
2. Log Precision Metric: Log the precision metric to Phoenix using the log_evaluations method.
Step-by-Step Guide
Step 1: Calculate Precision First, calculate the precision based on your evaluation data.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)
true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions print(f"Precision: {precision}")
LBKS**Step 2: Log Precision Metric**
Log the precision metric to Phoenix using the `log_evaluations` method.
LBKS```
from phoenix.trace import SpanEvaluations
import phoenix as px
# Create a DataFrame for precision metric
precision_df = pd.DataFrame({
'span_id': [1], # Dummy span_id, as precision is a single value
'precision': [precision]
})
# Log evaluations to Phoenix
client = px.Client()
client.log_evaluations(SpanEvaluations(eval_name="Precision", dataframe=precision_df))
Step 3: Display in UI Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.
For more details, refer to the Phoenix documentation.
Would you like more detail on any specific part?
Sources: - Phoenix documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
To display your own calculated global metrics like precision and specificity in the Phoenix UI, you can follow these steps:
1. Mark Spans: Mark your spans as true positive, false positive, true negative, or false negative. 2. Calculate Metrics: Calculate the desired metrics (precision, specificity) based on these labels. 3. Log Metrics: Log these calculated metrics to Phoenix using a dummy span ID or a custom evaluation method.
Step-by-Step Guide
Step 1: Mark Spans Create a DataFrame with your span evaluations, marking them as true positive, false positive, true negative, or false negative.
```
import pandas as pd
data = { 'span_id': [1, 2, 3, 4, 5, 6], 'label': ['true_positive', 'false_positive', 'true_negative', 'false_negative', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)
evals_df['tp'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) evals_df['fp'] = evals_df['label'].apply(lambda x: 1 if x == 'false_positive' else 0) evals_df['tn'] = evals_df['label'].apply(lambda x: 1 if x == 'true_negative' else 0) evals_df['fn'] = evals_df['label'].apply(lambda x: 1 if x == 'false_negative' else 0)
LBKS**Step 2: Calculate Metrics**
Calculate precision and specificity based on the labels.
LBKS```
# Calculate precision
true_positives = evals_df['tp'].sum()
false_positives = evals_df['fp'].sum()
precision = true_positives / (true_positives + false_positives)
# Calculate specificity
true_negatives = evals_df['tn'].sum()
false_negatives = evals_df['fn'].sum()
specificity = true_negatives / (true_negatives + false_positives)
print(f"Precision: {precision}")
print(f"Specificity: {specificity}")
Step 3: Log Metrics Log these calculated metrics to Phoenix using a dummy span ID.
```
from phoenix.trace import SpanEvaluations import phoenix as px
metrics_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as these are global metrics 'precision': [precision], 'specificity': [specificity] })
client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))
LBKS**Step 4: Display in UI**
Ensure that the precision and specificity metrics are included in the evaluations logged to Phoenix. Phoenix will automatically display these metrics if they are part of the logged evaluations.
For more details, refer to the [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations).
Would you like more detail on any specific part?
**Sources:**
- [Phoenix documentation](https://docs.arize.com/phoenix/tracing/how-to-tracing/llm-evaluations)
馃挕聽Hint: Mention RunLLM in the thread for followups.
hi Alicja K. sorry for the late response, as far as I understand we don't currently have an evaluation summary that would calculate precision automatically at the moment maybe the closest that can happen is if there's an enclosing trace that the precision is computed manually for all spans contained within the trace. What kind of grouping for computing the precision would you find the most helpful? (trace, project, etc)
