Hi, I am checking the docs, where I could see the latest one? for example : https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/hallucinations
It seems likely that the latest documentation for Phoenix, including sections on running pre-tested evaluations like hallucinations, can be found on the Arize Phoenix documentation site. This page provides detailed guidance on using the hallucination evaluation template to detect if a model's output is a hallucination based on contextual data.Would you like more detail on how to navigate or use specific features in the documentation? Sources: - Arize Phoenix documentation site
💡 Hint: Mention RunLLM in the thread for followups.
Yes, I would like to define a custom evaluator by myself. I define the 'TEMPLATE' and then I try to use 'run_evals'. But I got the error. Exception in worker on attempt 1: raised KeyError('input') Requeuing... Exception in worker on attempt 2: raised KeyError('input') Requeuing... Exception in worker on attempt 3: raised KeyError('input') Requeuing... Exception in worker on attempt 4: raised KeyError('input') Requeuing... Exception in worker on attempt 5: raised KeyError('input') Requeuing... Exception in worker on attempt 6: raised KeyError('input') Requeuing... Exception in worker on attempt 7: raised KeyError('input') Requeuing... Exception in worker on attempt 8: raised KeyError('input') Requeuing... Exception in worker on attempt 9: raised KeyError('input') Requeuing... Exception in worker on attempt 10: raised KeyError('input') Requeuing... Retries exhausted after 11 attempts: 'input'
yes, the dataframe includes the input column. Evaluation DataFrame: input 0 assistant: {\n"message": "Mucho gusto, Mark. ¿...
GUIDELINES_PROMPT_TEMPLATE = """You are examining written text content. Here is the text: [BEGIN DATA] ************ [Text]: {input} ************ [END DATA] Examine the text and determine whether it is VALID or INVALID. A response is considered VALID if it meets these requirements: 1. Format Check: - Contains all required JSON fields: "message", "extracted_answers", "remaining_questions" - JSON is properly formatted 2. Content Check: - Message is in Spanish - Message is polite and professional - Message asks about missing information 3. Data Handling: - Extracts any provided information correctly - Only includes information that was actually given - Lists remaining unanswered questions Your response must be single word, either "VALID" or "INVALID", and should not contain any text or characters aside from that word. "VALID" means that the text meets all requirements. "INVALID" means the text fails to meet one or more requirements."""
I run two evaluator, the Evaluation DataFrame pass the first one, failed in the second one(custom one)
As I mentioned above 'GUIDELINES_PROMPT_TEMPLATE' include '{input}'
Ok, I will try, it, thank you so much
So, Whats the difference between 'llm_classify' and 'run_evals'?
