Exploring LLM Response Logits for Bias and Model Insights

·Jul 20, 2024 09:56 AM

Hi folks, did you experiment with using LLM response logits distribution to obtain insights about model behaviour with respect to biases, hallucinations and other undesired effects? I understand that this applies to open-source models only where you have an access to logits. I am curious in general whether this direction has a potential from your perspective? Let me know if u did some research and feel free to link relevant papers if any!

Exploring LLM Response Logits for Bias and Model Insights

4 comments

Exploring LLM Response Logits for Bias and Model Insights

4 comments