My quick guess is that only text prompts now can be sent as input to evaluators. If I am right, let me know whether you have any plans to support multimodality? (Probably I could help) 馃檪
Hey Ilya! Is this for the same use case we worked on together a while back?
Hi guys, let me give some context. We use multimodal models to generate product tags from both texts and images. Here is one of the blog posts about this project, it can give some more context what and why we do. Screenshot shows an image belonging to a given product class and how we extract tags like color, style or theme. As input we pass image and text prompt, as output we return list of tag values. So the output is same format we discussed with you before. We are now developing automated evaluation pipelines. For now It is assumed to run them offline to test different prompts before shipping them to prod. I was thinking whether i can use arize-phoenix for gemini-vision models 馃檪
