Hi. Quick question on umap embeddings for the reference and primary data. Umap is fit on reference data and the embeddings are generated and plotted. For the primary data, is the same umap that is fit on reference data, then applied and used to transform the primary data and generate umap embeddings or a new umap is fit for the primary days selected?
Hi Shery. the umap result is computed on-the-fly for each instance by fit-transforming the concatenation of the primary and reference data
Hi Roger. So as i understand. The data for both reference and primary data is concatenated. and then a umap model is fit_transform on the entire concatenated data. So in this case, the reference data is being fit_transformed everytime a new primary data selection is made in the timeseries plot? the drift is then being calculated between the embeddings of the primary and the reference data?
the reference data is being fit_transformed everytime a new primary data selection is made
that鈥檚 correct
the drift is then being calculated between the embeddings of the primary and the reference data
that鈥檚 correct. it鈥檚 separate from umap
thanks for that 馃檪
I would like to dig into the code to understand the drift calculation better. Is it possible for you to point me in the right direction?
are you referring to euclidean drift specifically?
Yes.. in the timeseries plot
One challenge ive got the is the inability to turn what i see in pheonix into a working pipeline... the insights are great from pheonix, but now I want to operationalise the umap and drift so im trying to see how i can do this best...
here鈥檚 to get you started, let me know if you want more details the request hits the server starting here https://github.com/Arize-ai/phoenix/blob/7f0278c79118c647511ce4013f14313fbbdf8d18/src/phoenix/server/api/types/EmbeddingDimension.py#L210 then it goes to the timeseries function here https://github.com/Arize-ai/phoenix/blob/7f0278c79118c647511ce4013f14313fbbdf8d18/src/phoenix/server/api/types/TimeSeries.py#L72 note that the `metric` is taken from the enum value https://github.com/Arize-ai/phoenix/blob/7f0278c79118c647511ce4013f14313fbbdf8d18/src/phoenix/server/api/types/TimeSeries.py#L132 there鈥檚 only one enum right now and it鈥檚 defined here https://github.com/Arize-ai/phoenix/blob/7f0278c79118c647511ce4013f14313fbbdf8d18/src/phoenix/server/api/types/VectorDriftMetricEnum.py#L10 the metric is just a wrapper for scipy function https://github.com/Arize-ai/phoenix/blob/7f0278c79118c647511ce4013f14313fbbdf8d18/src/phoenix/metrics/metrics.py#L158
Much appreciate Roger...that will be a great help!!!
np!
