Hi folks, did you experiment with using LLM response logits distribution to obtain insights about model behaviour with respect to biases, hallucinations and other undesired effects? I understand that this applies to open-source models only where you have an access to logits. I am curious in general whether this direction has a potential from your perspective? Let me know if u did some research and feel free to link relevant papers if any!
WRT to phoenix evals we are currently focused on evaluating how good LLMs are as judges - maybe not so much on model evals or model interpretability (which I think Anthropic seems to be leading the charge). So our focus right now is more on human alignment of LLMs with regards to your custom task. https://arxiv.org/abs/2406.18403 We are working with some other frameworks to help you "compile" LLM judges based on human feedback (think DSPy optimizers). I think John might be planning some research on fine-tuned smaller models or using prometheus. Basically making evals more accurate, aligned, cheeper, and easier to construct. Hope that helps.
This helps, thanks Mikyo!
Ilya B. to Mikyo's point, we're doing some exploration into fine-tuned smaller models vs SLMs for evaluating responses, but still using the LLM-as-a-Judge approach. Hoping to add some fun augmentation via human-labeled data to further improve performance, but that's the general direction we're moving
