What eval framework is this? Curious how to replicate.
TOPIC FOR DISCUSSION Limited context size problems? Could virtual memory be the solution? INTRO I have recently been requested to solve the problem of limited context window for both conversational agents and RAG. Essentially, the finite context window hinders the performance of the LLM in tasks such as extended conversations, multi-session chats & document analysis. PROBLEM Question-Answering: The management of long contexts in LLMs is crucial for coherent and engaging dialogues for conversational agents. One approach to tackle the limitations of fixed-length context is through recursive summarization where the LLM generates concise chat representations over a sliding window to fit them within the specified token length. However, this summarization process is inherently lossy and can lead to the unintentional information loss. Using backed store databases to retrieve previous chats is also constrained by the limited token length and may also lead to loss of the natural conversation order. Search and retrieval: Especially for the RAG paradigm, mechanisms, the retrieval QA mechanisms often utilize external databases or internal conversation logs to provide contextually relevant responses. However, multi-document analysis tasks require drawing connections across multiple lengthy documents (eg annual reports or entire books). The retrieved information can easily surpass the finite context window. Deep Dive: In LLM-based conversational agents, a significant portion of main context tokens is generally used to: hold a ‘system message’ that dictates the nature of the interaction to the system long-instructions when LLMs are used to solve complex tasks few/many in-context examples or retrieved context (RAG) the remainder of the tokens can be used to hold conversation data Thus, the entire context window in many modern LLMs will be exhausted only with a few dozen back-and-forth messages between the user and system. PROMISING SOLUTION Being troubled on how to solve the above problem efficiently, I was excited to read the paper on MemGPT agent that came out just last week! What is MemGPT? MemGPT is an LLM agent designed to manage limited context windows of LLMs, giving them an illusion of infinite length. What can it do? For document analysis MemGPT can process lengthy texts and multi-document QA, beyond LLM context limits. For conversational agents MemGPT can perform deep memory retrieval, enabling long-term memory and consistency in extended dialogues. How does it do it? MemGPT uses a memory hierarchy and control flow analogous to traditional operating systems. It introduces virtual context management inspired by traditional hierarchical virtual memory management systems. Ie, the agents treats context windows as a constrained memory resource and designs a memory hiearchy for LLMs analogous to memory tiers used in traditional OSes. Calling & Chaining functions from its toolkit, the agent manages the control flow between the memory management, the LLM processing module, and the user. This design allows for repeated context modifications (self-directed editing & retrieval), allowing the agent to more effectively utilize its limited context. Evaluation: The paper provides evaluation for different benchmarks for conversational agents and for retrieval agents with great results for MmGPT. Limitations: MemGPT’s function calling is based on the corresponding GPT-4 utility. The performance deteriorates when using other models, eg GPT-3.5, Llama-2, even if those models have been fine-tunined for function calling. [IMAGE ANALYSIS] In MemGPT (components shaded), a fixed-context LLM is augmented with a hierarchical memory system and functions that let it manage its own memory. The LLM processor takes main context (analogous to OS main memory/RAM) as input, and outputs text interpreted by a parser, resulting either in a yield or a function call. MemGPT uses functions to move data between main context and external context (analogous to OS disk memory). When the processor generates a function call, it can request control ahead of time to chain together functions. When yielding, the processor is paused until the next external event (e.g., a user message or scheduled interrupt). CONCLUSION Since I can use GPT-4, I am interested to spin MemGPT to solve the problems of finite context length that I am currently facing. It seems to offer a powerful (and generalizable) framework that resolves many of the limitations of other more simple solutions like summarization or vectorstore-backed chat retrieval. DISCUSSION Does this look lie a good solution to the finite context window problem of conversational QA agents? Looking forward to remarks or suggestions based on your personal experience and/or your understanding of this paper.
Reposting this paper with my friend's question to get all of your thoughts: https://arxiv.org/abs/2310.08560#:~:text=Using%20this%20technique%2C%20we%20introduce,between%20itself%20and%20the%20user
ok this works "arize-phoenix==0.0.44" thanks
🤙
Hello team, I have been running this llama_index_tracing_tutorial.ipynb notebook for the last few days and it hasn't given me a problem until today. I assume it has something to do with the new release 0.0.45. I am working on a presentation and would like to use this notebook to demo Phoenix. 🐞 https://github.com/Arize-ai/phoenix/issues/1619
