Inquiry About Observability Tools for Self-Hosted Multimodal LLMs

Tom M. · 2024-05-22T13:11:44.539Z

Hi! We're self-hosting a multimodal LLM on vllm and are looking for an open source LLM observability tool. Does Phoenix support mulitmodal image + text inputs, and can you hook it up to self hosted LLM's?

Dustin N.
·
Hi Tom, thanks your for interest in Phoenix! We currently do not support either evals or tracing for non text-mode inputs. While we do support self-hosted LLMs for running evals, we do not have autoinstrumentors for tracing them unless you are using them with either Langchain or LlamaIndex as an orchestration framework.
Tom M.
·
Great, thanks Dustin N.! Is there any plans for supporting those soon? (Big fan of Phoenix regardless)
Roger Y.
·
If you’re using vllm as an openai compatible server, you can try our openai instrumentor on the openai client and see whether the instrumentor captures the http payloads in the way you wanted. If not, we can enhance it for you.
Tom M.
·
Thanks Roger, i'll try this out 👍! Phoenix looks amazing for our use case and would be my first choice but its really important for us to be able to view the image inputs as well - is there anything i can do to bump up adding support for this on your roadmap? How much work would this be?
Roger Y.
·
thoughts on image viewer, Mikyo ?
Mikyo
·
Hey Tom M. - Good to see ya! I think we have to dogfood some of these multi-modal use cases and see what the payload structure is. If it's base64 encodes small images we maybe can have a way to opt in to capturing this data. Tom, which multi-modal LLM are you using? Are you self-hosting? If so are you using a python client? I ask because our observability works on top of existing clients so we'd probably have to scope it with regards to the tech stack you are using. Feel free to file a ticket so you can follow along as we scope it!
Tom M.
·
Hey again Mikyo ! It’s another team in my company working on this I’m supporting but from what I can see: So yep it’s base64 encoded small images. We’re using a customised Llava model that we’ve self hosted with VLLM on Kubernetes. We’re using a Python client, just plain httpx to send the requests we aren’t using any frameworks at the moment (although I think we’re open to choosing a framework). I’ll check in with the team to confirm and put this into a ticket, thanks!
Mikyo
·
Great, thanks Tom M. - def. let us know your tech stack. Would love to get the multi-modal stuff off the ground with some stakeholders!
Tom M.
·
Great! So the rest of our tech stack, other than python:
Database: Postgres
Monitoring and Logging: Datadog, Grafana, Prometheus
Containerization and Orchestration: Docker and Kubernetes (EKS)
Infrastructure: Hosted on AWS (use s3 for storage too), SQS for message queues
There's some other components but nothing else that immediately come to mind as relevant, let me know if there's anything else you're wondering about that i might have missed!
Tom M.
·
So with Phoenix we'd look too make use of the latest persistence features in version 4.0+, probably with Postgres. And we'd look to slot the tracing into our production API for classifying images for different types of harmful content that we send to our visual LLM
Mikyo
·
I think the main areas we'd need to understand is what inference looks like to this visual LLM. E.g. what is the IO payload and how the visual image elements are sent / referenced (base_64 encoded images, urls?) Capturing the labeling would probably not be too difficult in some regard on our end. Sounds like you would be using custom instrumentation then since it sounds like you are not using any wrappers to call your LLM. Do you have an evaluation strategy in place for this visual classifier?

Tom M.

We've self hosted our model on vllm, it's API is openai compatible. Here's our client for reference - we send images as base64 encoded, we're not using any wrappers but we can easily use one if it makes things easier:

class VLMClient:
    def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL):
        self._vlm_model = vlm_model
        self._vllm_client = httpx.AsyncClient(base_url=vllm_url)

        if VLLM_HEALTHCHECK:
            wait_for_ready(
                server_url=vllm_url,
                wait_seconds=VLLM_READY_TIMEOUT,
                health_endpoint="health",
            )

    @property
    def vlm_model(self) -> str:
        return self._vlm_model

    async def __call__(
        self,
        prompt: str,
        image_bytes: bytes | None = None,
        image_filetype: filetype.Type | None = None,
        max_tokens: int = 10,
    ) -> str:
        # Assemble the message content
        message_content: list[dict[str, str | dict]] = [
            {
                "type": "text",
                "text": prompt,
            }
        ]

        if image_bytes is not None:
            if image_filetype is None:
                image_filetype = filetype.guess(image_bytes)

            if image_filetype is None:
                raise ValueError("Could not determine image filetype")

            if image_filetype not in ALLOWED_IMAGE_TYPES:
                raise ValueError(
                    f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}"
                )

            image_b64 = base64.b64encode(image_bytes).decode("utf-8")
            message_content.append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{image_filetype.mime};base64,{image_b64}",
                    },
                }
            )

        # Put together the request payload
        payload = {
            "model": self.vlm_model,
            "messages": [{"role": "user", "content": message_content}],
            "max_tokens": max_tokens,
            # "logprobs": True,
            # "top_logprobs": 1,
        }

        response = await self._vllm_client.post("/v1/chat/completions", json=payload)
        response = response.json()
        response_text: str = (
            response.get("choices")[0].get("message", {}).get("content", "").strip()
        )

        return response_text

Tom M.
·
Re. evaluation strategy, i just asked our team:
Measuring accuracy on known eval sets with wide coverage of policy areas and including edge cases, reviewing mistakes to look for patterns
Also interested in using LLMs as judges, for example using GPT to say which model output is best
Mikyo
·
That’s exciting Tom M. , it means out openAI tracing will most likely work for you if you use it. Here’s the ticket for the image message parsing https://github.com/Arize-ai/openinference/issues/495
Tom M.
·
Great, thanks Mikyo!!

Inquiry About Observability Tools for Self-Hosted Multimodal LLMs

16 comments

Inquiry About Observability Tools for Self-Hosted Multimodal LLMs

16 comments