DSPy does a lot of caching, but it's a bit confusing when they don't show up, because you only see half the requests in Phoenix
Thomas, it's that we actually track calls to LLMs, not the DSPy caching, so yeah that data would be missing right now. Would you be using the same cache as the DSPy cache?
I normally use lm.inspect_history(n=1) to see the last message too/from the LM
That's what I would love to replace with a UI tool like Arize
inspect_history will show me the last prompt/response, even if it was loaded from the cache
E.g., right now I have some weird issue (in my program) where the LM is putting nonsensical sql into what should be a chain of though. So I want to see how I accidentally made a bad prompt or few-shot example. But because I've already run it once, it's cached, so it doesn't show up in Arize. I guess I could change a small character in the prompt or delete the cache. But it would be nice if it just showed up in Arize normally. Maybe with a small [cached] tag.
If so, that requires some thought on how to display that information. My initial thought is that we should reserve LLM spans for when an LLM is actually invoked. We could perhaps add a generic "chain" span to represent a cache hit with the prompt and response info you are looking for.
