Using MapReducer for Long Contexts in QAEvaluator

·Feb 28, 2025 04:40 AM

I am hitting the Open API token limit while using the QAEvaluator . I see that there is a MapReducer type of evaluator that deals with long contexts by breaking them into chunks and aggregating the results after. However I could not find specific examples of how to use it. Can you point me to an example? If there isn't an example, then can you please give me some pointers on how to perform the same evaluation as a QAEvaluator using the map-reduce technique? TIA! https://github.com/Arize-ai/phoenix/blob/e24d7212ace403f0e396de027a7cfb9bd4a14657/packages/phoenix-evals/src/phoenix/evals/evaluators.py#L270

4 comments

· Sorted by Oldest

RunLLM
APP
·
It seems likely that the `MapReducer` can be used to handle long contexts by breaking them into smaller chunks, evaluating each chunk, and then aggregating the results. However, specific examples of using `MapReducer` for a `QAEvaluator` are not directly available in the data sources. The `MapReducer` is designed to evaluate data that is too large for a single context window by dividing it into chunks, evaluating each chunk, and combining the results (<https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py|source>). Would you like more detailed guidance on how to implement this approach? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py|https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py>
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
Hi Sagar D. are you running into maximum token limits or do you need to set the max_completion_tokens on the model to increase the allowed size of the response?
Dustin N.
·
If you need to increase max_completion_tokens, try instantiating the OpenAIModel with an increased max_tokens argument:
from phoenix.evals import OpenAIModel, QAEvaluator evaluator = QAEvaluator(model=OpenAIModel(..., max_tokens=MAX))
👀1

Sagar D.

Thanks for reverting! I set the max_token to None per the documentation to remove the limit, but it didn't help. Here is the error I am getting. To be fair, I am indeed trying to run evaluation on a long body of text.

2025-02-28 08:47:07,694 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
Exception in worker on attempt 5: Error code: 400 - {'error': {'message': "Invalid 'messages[0].content': string too long. Expected a string with maximum length 1048576, but got a string with length 1081807 instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content', 'code': 'string_above_max_length'}}

RunLLM
APP
·
It seems likely that the `MapReducer` can be used to handle long contexts by breaking them into smaller chunks, evaluating each chunk, and then aggregating the results. However, specific examples of using `MapReducer` for a `QAEvaluator` are not directly available in the data sources. The `MapReducer` is designed to evaluate data that is too large for a single context window by dividing it into chunks, evaluating each chunk, and combining the results (<https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py|source>). Would you like more detailed guidance on how to implement this approach? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py|https://github.com/Arize-ai/phoenix/blob/main/packages/phoenix-evals/src/phoenix/evals/evaluators.py>
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
Hi Sagar D. are you running into maximum token limits or do you need to set the max_completion_tokens on the model to increase the allowed size of the response?
Dustin N.
·
If you need to increase max_completion_tokens, try instantiating the OpenAIModel with an increased max_tokens argument:
from phoenix.evals import OpenAIModel, QAEvaluator evaluator = QAEvaluator(model=OpenAIModel(..., max_tokens=MAX))
👀1

Sagar D.

2025-02-28 08:47:07,694 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
Exception in worker on attempt 5: Error code: 400 - {'error': {'message': "Invalid 'messages[0].content': string too long. Expected a string with maximum length 1048576, but got a string with length 1081807 instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content', 'code': 'string_above_max_length'}}