Hi guys, having a lot of phun ( 馃槈 ) using phoenix in our dev workflow.. We've been attempting to set up a longitudinal framework which shows us performance of evals across multiple experiments. In the self-hosted docker version this doesn't seem to be supported. Is this a premium feature?
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey Damith S. lots of improvements to experiments coming down the pipeline https://github.com/Arize-ai/phoenix/issues/3398 Time series and histograms will be a part of that upcoming work. Phoenix doesn鈥檛 have any feature gates so these will be available as we ship them! You can download and pull experiments via the API which can be used to accomplish this right now. Please let us know if you have any additional ideas on the issue and we can prioritize them:)
Thanks Mikyo (sorry for radio silence). Also, wondering if anyone has used Phoenix for voice evals. E.g., for openai's realtime api evaluations.
Would be a really big plus to have
