Hi guys, I'm working on a LlamaIndex Agent and I exposed 2 endpoints for chatting with it. One for a buffered response and another one for a stream response. The endpoints are exactly the same apart from the way they stream their responses. On phoenix, they're displayed differently. The stream endpoint doesnt group the spans like the buffered one does. And the tokens are not displayed for the stream LLM calls also.
Hi Arthur M., thanks so much for trying out Phoenix! LlamaIndex has recently changed their instrumentation interfaces and we're currently working to get everything sorted out. Grouping spans for streaming calls can be tricky, especially since an LLM streaming response can be consumed at any time, or even left unstreamed. Regarding token counts: not all of the libraries we instrument expose token counts for LLM calls in all cases, even though we try our best to surface that information to you. If you find that LlamaIndex returns token count information for streams and we still don't surface them, please let us know and we'll get that information into our tracing as soon as possible! We're also actively considering adding token counting as a separate feature that is independent of the libraries we implement instrumentation for.
looking at the llama-index source code, this is caused by the fact that stream_chat is missing a step that bundles the child spans into the same trace (see the arrows in screenshot below)
looks like this has been fixed over the weekend https://github.com/run-llama/llama_index/pull/12189
