Vincent A. Interesting question. We do have an evaluation library but it's focused on response and retrieval evaluations. For figuring out which tokens in the prompt are influencing the generation, I've seen some research around that that I could dig up but unfortunately we don't have that capability right now.