If I mark my spans as false positives/true positives and log it to phoenix as log_evaluations how to display precision as a metric on top of the UI?
To display precision as a metric on top of the Phoenix UI, you need to log your evaluations and then ensure that the precision metric is calculated and displayed. Here's a concise breakdown: 1. Log Evaluations: Use the log_evaluations method to log your span evaluations, marking them as true positives or false positives.2. Calculate Precision: Precision is calculated as the number of true positives divided by the sum of true positives and false positives. You can use a custom evaluation function to calculate this metric.3. Display in UI: Ensure that the precision metric is included in the evaluations logged to Phoenix, so it can be displayed in the UI.Here's a step-by-step approach: Step 1: Log EvaluationsLBKSimport pandas as pd from phoenix.trace import SpanEvaluations import phoenix as pxLBKS# Example DataFrame with span evaluations data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)LBKS# Convert labels to binary scores evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)LBKS# Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision Eval", dataframe=evals_df))LBKSStep 2: Calculate PrecisionLBKSdef calculate_precision(evals_df): true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions return precisionLBKSprecision = calculate_precision(evals_df) print(f"Precision: {precision}")LBKSStep 3: Display in UI
Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.For more details, refer to the Phoenix documentation.Would you like more detail on any specific part?Sources:
- GitHub Issue #1163
- Phoenix documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
To send the precision calculation to Phoenix, you need to log the precision metric as part of your evaluations. Phoenix can then display this metric in the UI. Here's how you can do it: 1. Calculate Precision: Calculate the precision based on your true positive and false positive labels.
2. Log Precision Metric: Log the precision metric to Phoenix using the log_evaluations method.Step-by-Step GuideStep 1: Calculate Precision
First, calculate the precision based on your evaluation data.LBKSimport pandas as pdLBKS# Example DataFrame with span evaluations data = { 'span_id': [1, 2, 3, 4], 'label': ['true_positive', 'false_positive', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)LBKS# Convert labels to binary scores evals_df['score'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0)LBKS# Calculate precision true_positives = evals_df['score'].sum() total_predictions = len(evals_df) precision = true_positives / total_predictions print(f"Precision: {precision}")LBKSStep 2: Log Precision Metric
Log the precision metric to Phoenix using the log_evaluations method.LBKSfrom phoenix.trace import SpanEvaluations import phoenix as pxLBKS# Create a DataFrame for precision metric precision_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as precision is a single value 'precision': [precision] })LBKS# Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Precision", dataframe=precision_df))LBKSStep 3: Display in UI
Ensure that the precision metric is included in the evaluations logged to Phoenix. Phoenix will automatically calculate and display the precision metric if it is part of the logged evaluations.For more details, refer to the Phoenix documentation.Would you like more detail on any specific part?Sources:
- Phoenix documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
To display your own calculated global metrics like precision and specificity in the Phoenix UI, you can follow these steps: 1. Mark Spans: Mark your spans as true positive, false positive, true negative, or false negative.
2. Calculate Metrics: Calculate the desired metrics (precision, specificity) based on these labels.
3. Log Metrics: Log these calculated metrics to Phoenix using a dummy span ID or a custom evaluation method.Step-by-Step GuideStep 1: Mark Spans
Create a DataFrame with your span evaluations, marking them as true positive, false positive, true negative, or false negative.LBKSimport pandas as pdLBKS# Example DataFrame with span evaluations data = { 'span_id': [1, 2, 3, 4, 5, 6], 'label': ['true_positive', 'false_positive', 'true_negative', 'false_negative', 'true_positive', 'false_positive'] } evals_df = pd.DataFrame(data)LBKS# Convert labels to binary scores for precision and specificity calculations evals_df['tp'] = evals_df['label'].apply(lambda x: 1 if x == 'true_positive' else 0) evals_df['fp'] = evals_df['label'].apply(lambda x: 1 if x == 'false_positive' else 0) evals_df['tn'] = evals_df['label'].apply(lambda x: 1 if x == 'true_negative' else 0) evals_df['fn'] = evals_df['label'].apply(lambda x: 1 if x == 'false_negative' else 0)LBKSStep 2: Calculate Metrics
Calculate precision and specificity based on the labels.LBKS# Calculate precision true_positives = evals_df['tp'].sum() false_positives = evals_df['fp'].sum() precision = true_positives / (true_positives + false_positives)LBKS# Calculate specificity true_negatives = evals_df['tn'].sum() false_negatives = evals_df['fn'].sum() specificity = true_negatives / (true_negatives + false_positives)LBKSprint(f"Precision: {precision}") print(f"Specificity: {specificity}")LBKSStep 3: Log Metrics
Log these calculated metrics to Phoenix using a dummy span ID.LBKSfrom phoenix.trace import SpanEvaluations import phoenix as pxLBKS# Create a DataFrame for precision and specificity metrics metrics_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id, as these are global metrics 'precision': [precision], 'specificity': [specificity] })LBKS# Log evaluations to Phoenix client = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))LBKSStep 4: Display in UI
Ensure that the precision and specificity metrics are included in the evaluations logged to Phoenix. Phoenix will automatically display these metrics if they are part of the logged evaluations.For more details, refer to the Phoenix documentation.Would you like more detail on any specific part?Sources:
- Phoenix documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
hi Alicja K. sorry for the late response, as far as I understand we don't currently have an evaluation summary that would calculate precision automatically at the moment maybe the closest that can happen is if there's an enclosing trace that the precision is computed manually for all spans contained within the trace. What kind of grouping for computing the precision would you find the most helpful? (trace, project, etc)
