Configuring Phoenix Traces to Send Data to Remote URL

Priya · 2024-02-28T13:17:44.128Z

Hi team, I have setup the phoenix traces on my local using launch_app(), which works fine on local host. I have now setup a remote url, I have deployed it as a seperate cluster and pulled phoenix docker image from the hub. now, i want that my traces are sent to this url which i have set instead of localhost. earlier I had another deployment running on the same cluster, so i was sending my traces via cluster endpoint and it used to work fine.

Roger Y.
·
Hi Priya. I just checked our docker image and it’s working. Is there anything different in terms of the setup, e.g. whether the port has been forwarded, between the the deployment that worked and the one that is not working?
Priya
·
Let’s say i have a phoenix ui deployed, and i have a url, and i want to send traces to this url. But in my code i have used launch_app() and sending traces on my local url, but when i am trying to send the traces to this remote url i am facing issue i.e. it is still taking the local host. I want to understand that what should i use instead of this launch_app() (keeping most of the configs same)
Roger Y.
·
Can you try setting the remote endpoint using an environment variable as shown below?
import os os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://<remote host>:6006"
Priya
·
i have already deployed a phoenix ui on a cluster. I am able to access it through a web url. now how can i see traces there. Do i have to open a port or something? as of now its running perfect on localhost using lauchapp. So is launch app preferred for production environment or do i have to use docker setup. i am a bit confused with this.
Roger Y.
·
on the sender side you can try putting the web url into the environment variable shown above. the sender will send the traces to that location. btw which instrumentation are you using?
Priya
·
langchain
Roger Y.
·
from phoenix.trace.langchain import LangChainInstrumentor import os os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://<remote host>:6006" LangChainInstrumentor().instrument()
Roger Y.
·
You can try adding the environment variable before the instrumentor as shown above
Roger Y.
·
the <remote host> would be the web url you use for the ui
Priya
·
Priya
·
if i set the remote url, in none of the places the traces are being sent. but it works fine with localhost.
Roger Y.
·
hm, does it work without the :6006 part?
Priya
·
no

Priya

this is my main code where i have implemented tracing :

def submit(self):
        init_logger()
        # to do: change to support local server and remote server based on the server set

        os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://arize.zscaler.site"

        session = px.launch_app()
        LangChainInstrumentor().instrument()

        ai_dresult = {}
        question_num = 0
        for question, human_answer in self.question_answer_pool.items():
            ai_answer = self.chat_app_run(question)
            ai_dresult[question] = ai_answer
            question_num += 1
        logger.info(f"The evaluation based on dataset:{self.dataset_id}. {question_num} questions to run")

        # log the traces
        logger.info(f"Getting input, output and reference from traces ...")
        input_output_df = get_qa_with_reference(px.Client())
        input_output_df["correct_answer"] = input_output_df["input"].apply(
            lambda x: self.question_answer_pool[x])

        input_output_df["ai_answer"] = input_output_df["input"].apply(
            lambda x: ai_dresult[x])
        retrieved_documents_df = get_retrieved_documents(px.Client())

        if self.evaluators_qa_with_reference:
            evaluations_list = self.evaluators_qa_with_reference.values()
            logger.info(f"Running evaluations: {evaluations_list}")
            df_results = []
            names = []
            for name in self.evaluators_qa_with_reference.values():
                df_results.append(f"result_of_{name}")
                names.append(name)

            df_results = run_evals(
                dataframe=input_output_df,
                evaluators=self.evaluators_qa_with_reference.keys(),
                provide_explanation=True,
            )
            logger.info(f"Log evaluation results to UI for: {evaluations_list}")
            for index, df_result in enumerate(df_results):
                px.Client().log_evaluations(
                    SpanEvaluations(eval_name=names[index], dataframe=df_result)
                )
                self.dashboard_data[names[index]] = df_result.mean(numeric_only=True)["score"]

        if self.evaluators_retrieved_documents:
            evaluations_retrived = self.evaluators_retrieved_documents.values()
            logger.info(f"Running evaluations: {evaluations_retrived}")
            relevance_eval_df = run_evals(
                dataframe=retrieved_documents_df,
                evaluators=self.evaluators_retrieved_documents.keys(),
                provide_explanation=True,
            )[0]
            logger.info(f"Log evaluation results to UI: {evaluations_retrived}")
            px.Client().log_evaluations(
                DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df),
            )
            self.dashboard_data["Relevance"] = relevance_eval_df.mean(numeric_only=True)["score"]

        if self.run_outputtone:
            logger.info("Run evaluation: outputtone")
            output_tone_df = run_output_tone_evaluation(input_output_df)
            logger.debug(f"output_tone_df:{output_tone_df}")
            logger.info(f"Log evaluation results to UI: outputtone")
            px.Client().log_evaluations(SpanEvaluations(eval_name="output_tone", dataframe=output_tone_df))

        # to do: implement the function
        self.dashboard_data["timestamp"] = datetime.now().timestamp()
        if self.dataset_id > BENCHMARK_DATASET_ID_MIN and self.project_name != PROJECT_NAMES['RANDOM']:
            logger.info("Log evaluations scores to Bigquery")
            log_scores_to_bigQuery(self.dashboard_data)

        # to do: check condition, only do this for running on local trace server
        import time
        time.sleep(self.job_save_seconds_for_local_run)

this is my usage_example.py:

from llamaas.qa.chatbot_qa.job import JobBuilder
from llamaas.qa.chatbot_qa.utils import load_from_json
from llamaas.model_connector.azure_resources import DevGPT4_32K
from llamaas.qa.chatbot_qa.constants import *
from llamaas.qa.eval_utils.eval_models import get_azure_openai_model
from llamaas.orchestrator.doc_qa.scoped_doc_qa_app import ScopedDocQaApp
from llamaas.qa.chatbot_qa.server import TraceServer

# [required] the dataset that used for evaluation, format follow ./questions_answers_template.json
json_questions_answers = load_from_json("./questions_answers_template.json")

# [required] the instance of app  under evaluation
chat_app = ScopedDocQaApp(
    vector_store_path="/Users/priya/datasci/All")
# [required] the method name to get answer, the evaluation job will pass question as parameter
run_method = "get_answer"

# [option] the Azure model that used to do evaluation, if not set, default use DevGPT4_32K, it will be setted in set_tasks
eval_model = get_azure_openai_model(DevGPT4_32K)

job = (JobBuilder()
       # [required] defined question_answers data set, format same to questions_answers_template.json
       .set_question_answer_pool(json_questions_answers)
       # [option] trace server ip, if local run, set "127.0.0.1", if not set, default is "127.0.0.1"
       .set_server(TraceServer("127.0.0.1"))
       # [required] set model to be used to do evaluation and evaluation tasks(True: run the evaluation, False(default): not run)
       .set_tasks(
           eval_model=eval_model,
           correctness=True,
           hallucination=True,
           toxicity=True,
           groundtruth=True,
           relevance=True,
           outputtone=True)
       # [required]set the chat_app to be evaluationed, and how to run it. The system will pass in "question" as parameter
       .set_chat_app_run(chat_app, run_method)
       # [required] set the model that used in chat_app, str, used for log Metric_store to do model comparision.
       .set_chat_app_model("gpt-4")
       # [required] projectname, used for Metric_store table. The name should lised in constants.PROJECT_NAMES. For new project, please add name into constants.PROJECT_NAMES first
       .set_project_name(PROJECT_NAMES["DOC_GPT"])
       # [option]when local run, how long(senconds) to save job for checking results, default: 0
       .set_job_save_seconds_for_local_run(1000)
       .build()
       )
job.submit()

this is the error i am facing: i tried adding and removing port.

Roger Y.

the messages in the screenshot are only warnings and they are unrelated. The warning about phoenix.session will be gone if line 122 (session = px.launch_app()) is removed. line 122 is launching a second (local) instance of phoenix which you’re not using because the traces will be sent to the remote instance according to line 121 in the screenshot. I’m not familiar with zscaler but it seems that this may be a connection issue. Can you run the following code to see if a span called test will show up in your remote instance?

from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

endpoint = "http://arize.zscaler.site/v1/traces"
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
trace_api.set_tracer_provider(tracer_provider)


if __name__ == "__main__":
    trace_api.get_tracer(__name__).start_span("test").end()

Configuring Phoenix Traces to Send Data to Remote URL

17 comments

Configuring Phoenix Traces to Send Data to Remote URL

17 comments