Mixed Results of GPT-3.5-Instruct in Task Based Evaluations

·Sep 22, 2023 09:51 PM

In case you are wondering about GPT-3.5-Instruct: We tested it against our Task Based Evals with very mixed results. It is highly task dependent. Included are two widely varying results against a golden test dataset. Details of how this works is in our (soon to be officially released Eval library): https://docs.arize.com/phoenix/concepts/llm-evals

Discussions

Mixed Results of GPT-3.5-Instruct in Task Based Evaluations

Jason

·Sep 22, 2023 09:51 PM

In case you are wondering about GPT-3.5-Instruct: We tested it against our Task Based Evals with very mixed results. It is highly task dependent. Included are two widely varying results against a golden test dataset. Details of how this works is in our (soon to be officially released Eval library): https://docs.arize.com/phoenix/concepts/llm-evals