Troubleshooting Connection Issues with localhost:4317

·Apr 22, 2025 05:22 PM

I ran into an issue connecting to localhost:4317 and I appreciate your help. I put the full instruction in the thread to make it easier to follow.

34 comments

· Sorted by Oldest

Saeed M.

I’m using Litellm framework for testing phoenix. The reason I’m using Litellm is because that it the framework we are using in our codebase. I’m using k8s and helm charts to create a statefulset and service for phoenix. In my helm chart, I’m using the latest phoenix image. I’ve passed following environment variables to the helm chart

- name: PHOENIX_WORKING_DIR
  value: /mnt/data
- name: PHOENIX_PORT
  value: "6006"
- name: PHOENIX_SQL_DATABASE_URL
  value: {{ .Values.phoenix.database.url | quote }}
- name: PHOENIX_COLLECTOR_ENDPOINT
  value: "<http://localhost:6006>"

I checked the deployment after the k8s deployment. I can access phoenix UI in localhost:6006 Inside the pod, arize-phoenix-otel is also installed as instructed. BTW, I port-forwarded following ports to be accessible: "6006:6006", "4317:4317” 6006 is for phoenix and 4317 is for opentelemetry. Then as a next step, I went to the pod I’m using for LLM calls and installed the litellm related libraries one more time to make sure everything is set up correctly

pip install openinference-instrumentation-litellm litellm arize-phoenix-otel

Then I opened python and ran the following code based on instruction here: https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Below is what I get

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: my-llm-app
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: localhost:4317
|  Transport: gRPC
|  Transport Headers: {'user-agent': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  ⚠️ WARNING: It is strongly advised to use a BatchSpanProcessor in production environments.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.

Then, added my OpenAI API key:

import os
os.environ["OPENAI_API_KEY"] = "PASTE_YOUR_API_KEY_HERE"

and used litellm as normal

import litellm
completion_response = litellm.completion(model="gpt-3.5-turbo",
                   messages=[{"content": "What's the capital of China?", "role": "user"}])
print(completion_response)

Then I’m getting this error:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 16s.

Saeed M.
·
I can share my helm templates as well if it's going to help.

Roger Y.

can you try the 6006 port (http) via the following to see if works?

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:6006/v1/traces")

Saeed M.
·
sure
Roger Y.
·
we generally recommend the gRPC port. but in this case it seems like that there may be some kind of network problem

Saeed M.

I got the same error. As you see tracing details points to localhost:6006 this time, but litellm still struggles with connection

>>> tracer_provider = register(endpoint="http://localhost:6006/v1/traces")
Overriding of current TracerProvider is not allowed
🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: default
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://localhost:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  ⚠️ WARNING: It is strongly advised to use a BatchSpanProcessor in production environments.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.

>>> completion_response = litellm.completion(model="gpt-3.5-turbo",
...                    messages=[{"content": "What's the capital of China?", "role": "user"}])

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.

Roger Y.
·
that’s very strange. are you able to reach the webpage http:://localhost:6006/?

Roger Y.

and what if you try it this way?

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

endpoint = "http://127.0.0.1:6006/v1/traces"
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

Saeed M.
·
I made some modification on my manifest files. Looks like the ports for http and grpc connections were not exposed internally. So, I added those specifications to my manifest file and also made the modification you proposed and in combination it works. I need to dig deeper though to see how these things are internally connected.
Roger Y.
·
ok sg. now you can use gRPC as in the original program
👍1

Saeed M.

I'll share my manifest file here in case someone is interested.

{{- if .Values.phoenix.enabled }}
apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.phoenix.name }}
  labels:
    app: {{ .Values.phoenix.name }}
    environment: {{ .Values.env.name | quote }}
spec:
  ports:
    - port: 443
      protocol: TCP
      targetPort: 6006
  selector:
    app: {{ .Values.phoenix.name }}
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ .Values.phoenix.name }}
  labels:
    app: {{ .Values.phoenix.name }}
    environment: {{ .Values.env.name | quote }}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Values.phoenix.name }}
  serviceName: {{ .Values.phoenix.name }}
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "9090"
        prometheus.io/scrape: "true"
      labels:
        app: {{ .Values.phoenix.name }}
        environment: {{ .Values.env.name | quote }}
    spec:
      # Add init container to wait for PostgreSQL
      initContainers:
        - name: wait-for-postgresql
          image: postgres:17
          command: ['sh', '-c',
            'until pg_isready -h $POSTGRES_HOST -p 5432 -U $POSTGRES_USER;
            do echo waiting for postgresql; sleep 2; done;']
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: {{ include "asc-agents.fullname" . }}-secret
                  key: pg-user
            - name: POSTGRES_HOST
              valueFrom:
                secretKeyRef:
                  name: {{ include "asc-agents.fullname" . }}-secret
                  key: pg-fqdn
      containers:
        - name: {{ .Values.phoenix.name }}
          image: "{{ .Values.phoenix.image | default "docker.io/arizephoenix/phoenix:version-8.26.1" }}"
          args: ["-m", "phoenix.server.main", "serve"]
          env:
            - name: PHOENIX_WORKING_DIR
              value: /mnt/data
            - name: PHOENIX_PORT
              value: "6006"
            - name: PHOENIX_SQL_DATABASE_URL
              value: {{ .Values.phoenix.database.url | quote }}
            - name: PHOENIX_COLLECTOR_ENDPOINT
              value: "http://localhost:6006"
            - name: PHOENIX_ENDPOINT
              value: "http://asc-agents-phoenix:443"
          ports:
            - containerPort: 6006
            - containerPort: 4317
            - containerPort: 9090
          volumeMounts:
            - mountPath: /mnt/data
              name: {{ .Values.phoenix.name }}
          readinessProbe:
            httpGet:
              path: /metrics
              port: 6006
            initialDelaySeconds: 10
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /metrics
              port: 6006
            initialDelaySeconds: 15
            periodSeconds: 20
          resources:
            requests:
              memory: 1Gi
              cpu: 500m
            limits:
              memory: 4Gi
              cpu: 500m
  volumeClaimTemplates:
    - metadata:
        name: {{ .Values.phoenix.name }}
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: {{ .Values.phoenix.storage | default "8Gi" }}
        storageClassName: {{ .Values.phoenix.storageClass | default "standard" | quote}}
{{- end }}

❤️1

Saeed M.

The values are the following

phoenix:
  name: xxx
  image: docker.io/arizephoenix/phoenix:latest
  enabled: true
  storage: 8Gi
  storageClass: standard
  env:
    - name: PHOENIX_COLLECTOR_ENDPOINT
      value: "http://xxx:443"
  spec:
    ports:
      - name: http
        port: 443
        protocol: TCP
        targetPort: 6006
      - name: grpc
        port: 4317
        protocol: TCP
        targetPort: 4317

Saeed M.

above is my local values, general values are as follows

phoenix:
  enabled: false
  name: xxx
  image: docker.io/arizephoenix/phoenix:version-8.26.1
  database:
    url: postgresql://anyxxx:anyxxx@api-postgresql.default:5432/phoenix
  storage: 8Gi
  storageClass: standard

Saeed M.
·
I also created a database called phoenix in my postgres
Ben M.
·
Roger Y.
we generally recommend the gRPC port
should we avoid the http port if we can? Is there a benefit (in terms of performance or otherwise) that would mean we should prefer gRPC?

Saeed M.

- name: PHOENIX_WORKING_DIR
  value: /mnt/data
- name: PHOENIX_PORT
  value: "6006"
- name: PHOENIX_SQL_DATABASE_URL
  value: {{ .Values.phoenix.database.url | quote }}
- name: PHOENIX_COLLECTOR_ENDPOINT
  value: "<http://localhost:6006>"

pip install openinference-instrumentation-litellm litellm arize-phoenix-otel

Then I opened python and ran the following code based on instruction here: https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Below is what I get

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: my-llm-app
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: localhost:4317
|  Transport: gRPC
|  Transport Headers: {'user-agent': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  ⚠️ WARNING: It is strongly advised to use a BatchSpanProcessor in production environments.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.

Then, added my OpenAI API key:

import os
os.environ["OPENAI_API_KEY"] = "PASTE_YOUR_API_KEY_HERE"

and used litellm as normal

import litellm
completion_response = litellm.completion(model="gpt-3.5-turbo",
                   messages=[{"content": "What's the capital of China?", "role": "user"}])
print(completion_response)

Then I’m getting this error:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 16s.

Saeed M.
·
I can share my helm templates as well if it's going to help.

Roger Y.

can you try the 6006 port (http) via the following to see if works?

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:6006/v1/traces")

Saeed M.
·
sure
Roger Y.
·
we generally recommend the gRPC port. but in this case it seems like that there may be some kind of network problem

Saeed M.

I got the same error. As you see tracing details points to localhost:6006 this time, but litellm still struggles with connection

>>> tracer_provider = register(endpoint="http://localhost:6006/v1/traces")
Overriding of current TracerProvider is not allowed
🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: default
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://localhost:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  ⚠️ WARNING: It is strongly advised to use a BatchSpanProcessor in production environments.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.

>>> completion_response = litellm.completion(model="gpt-3.5-turbo",
...                    messages=[{"content": "What's the capital of China?", "role": "user"}])

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.

Roger Y.
·
that’s very strange. are you able to reach the webpage http:://localhost:6006/?

Roger Y.

and what if you try it this way?

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

endpoint = "http://127.0.0.1:6006/v1/traces"
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

Saeed M.
·
I made some modification on my manifest files. Looks like the ports for http and grpc connections were not exposed internally. So, I added those specifications to my manifest file and also made the modification you proposed and in combination it works. I need to dig deeper though to see how these things are internally connected.
Roger Y.
·
ok sg. now you can use gRPC as in the original program
👍1

Saeed M.

I'll share my manifest file here in case someone is interested.

{{- if .Values.phoenix.enabled }}
apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.phoenix.name }}
  labels:
    app: {{ .Values.phoenix.name }}
    environment: {{ .Values.env.name | quote }}
spec:
  ports:
    - port: 443
      protocol: TCP
      targetPort: 6006
  selector:
    app: {{ .Values.phoenix.name }}
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ .Values.phoenix.name }}
  labels:
    app: {{ .Values.phoenix.name }}
    environment: {{ .Values.env.name | quote }}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Values.phoenix.name }}
  serviceName: {{ .Values.phoenix.name }}
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "9090"
        prometheus.io/scrape: "true"
      labels:
        app: {{ .Values.phoenix.name }}
        environment: {{ .Values.env.name | quote }}
    spec:
      # Add init container to wait for PostgreSQL
      initContainers:
        - name: wait-for-postgresql
          image: postgres:17
          command: ['sh', '-c',
            'until pg_isready -h $POSTGRES_HOST -p 5432 -U $POSTGRES_USER;
            do echo waiting for postgresql; sleep 2; done;']
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: {{ include "asc-agents.fullname" . }}-secret
                  key: pg-user
            - name: POSTGRES_HOST
              valueFrom:
                secretKeyRef:
                  name: {{ include "asc-agents.fullname" . }}-secret
                  key: pg-fqdn
      containers:
        - name: {{ .Values.phoenix.name }}
          image: "{{ .Values.phoenix.image | default "docker.io/arizephoenix/phoenix:version-8.26.1" }}"
          args: ["-m", "phoenix.server.main", "serve"]
          env:
            - name: PHOENIX_WORKING_DIR
              value: /mnt/data
            - name: PHOENIX_PORT
              value: "6006"
            - name: PHOENIX_SQL_DATABASE_URL
              value: {{ .Values.phoenix.database.url | quote }}
            - name: PHOENIX_COLLECTOR_ENDPOINT
              value: "http://localhost:6006"
            - name: PHOENIX_ENDPOINT
              value: "http://asc-agents-phoenix:443"
          ports:
            - containerPort: 6006
            - containerPort: 4317
            - containerPort: 9090
          volumeMounts:
            - mountPath: /mnt/data
              name: {{ .Values.phoenix.name }}
          readinessProbe:
            httpGet:
              path: /metrics
              port: 6006
            initialDelaySeconds: 10
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /metrics
              port: 6006
            initialDelaySeconds: 15
            periodSeconds: 20
          resources:
            requests:
              memory: 1Gi
              cpu: 500m
            limits:
              memory: 4Gi
              cpu: 500m
  volumeClaimTemplates:
    - metadata:
        name: {{ .Values.phoenix.name }}
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: {{ .Values.phoenix.storage | default "8Gi" }}
        storageClassName: {{ .Values.phoenix.storageClass | default "standard" | quote}}
{{- end }}

❤️1

Saeed M.

The values are the following

phoenix:
  name: xxx
  image: docker.io/arizephoenix/phoenix:latest
  enabled: true
  storage: 8Gi
  storageClass: standard
  env:
    - name: PHOENIX_COLLECTOR_ENDPOINT
      value: "http://xxx:443"
  spec:
    ports:
      - name: http
        port: 443
        protocol: TCP
        targetPort: 6006
      - name: grpc
        port: 4317
        protocol: TCP
        targetPort: 4317

Saeed M.

above is my local values, general values are as follows

phoenix:
  enabled: false
  name: xxx
  image: docker.io/arizephoenix/phoenix:version-8.26.1
  database:
    url: postgresql://anyxxx:anyxxx@api-postgresql.default:5432/phoenix
  storage: 8Gi
  storageClass: standard

Saeed M.
·
I also created a database called phoenix in my postgres
Ben M.
·
Roger Y.
we generally recommend the gRPC port
should we avoid the http port if we can? Is there a benefit (in terms of performance or otherwise) that would mean we should prefer gRPC?