Current Limitations in Token Influence Evaluation for Generation

·Mar 12, 2024 05:32 PM

Vincent A. Interesting question. We do have an evaluation library but it's focused on response and retrieval evaluations. For figuring out which tokens in the prompt are influencing the generation, I've seen some research around that that I could dig up but unfortunately we don't have that capability right now.

4 comments

· Sorted by Oldest

Phoenix Support

Current Limitations in Token Influence Evaluation for Generation

Mikyo

·Mar 12, 2024 05:32 PM

Vincent A. Interesting question. We do have an evaluation library but it's focused on response and retrieval evaluations. For figuring out which tokens in the prompt are influencing the generation, I've seen some research around that that I could dig up but unfortunately we don't have that capability right now.

4 comments

· Sorted by Oldest

Vincent A.
·
Thanks a lot for the response. It would be great if you could share any resources that would help with this.
Mikyo
·
Vincent A. I'm not sure it will work for your use-case with fine tuning but I think importance estimation via embeddings is what I was thinking. https://www.watchful.io/blog/a-surprisingly-effective-way-to-estimate-token-importance-in-llm-prompts
Vincent A.
·
I'll look into it. Thank you.
🙌1
Mikyo
·
This also might be worth a look https://github.com/openai/transformer-debugger

Vincent A.
·
Thanks a lot for the response. It would be great if you could share any resources that would help with this.
Mikyo
·
Vincent A. I'm not sure it will work for your use-case with fine tuning but I think importance estimation via embeddings is what I was thinking. https://www.watchful.io/blog/a-surprisingly-effective-way-to-estimate-token-importance-in-llm-prompts
Vincent A.
·
I'll look into it. Thank you.
🙌1
Mikyo
·
This also might be worth a look https://github.com/openai/transformer-debugger