Unlocking Conversational Memory: Enhancing Lang Chain Model
We’re going to talk about how to make our big language models remember things better. These models, called LLMs for short, usually don’t remember past conversations. This is especially noticeable when using them through an API.
Let’s see how you can add memory to these models, the idea is going to be very simple for every input in order to remember things we will provide a history of our previous conversations that will constitute the memory for our large language model.
We have four different types of memories that are supported within Lang chain, each one of them has their own specific use case.
Conversation Buffer Memory
Conversation Buffer Window Memory
Conversation Token Buffer Memory
Conversation Summary Buffer Memory
Conversation Buffer Memory
This memory allows for storing messages and then extracts the messages in a variable.
Let’s take a look at using this in a chain by setting “Verbose=True” so that we can see the prompt.
Conversation Buffer Window Memory
Conversation buffer window memory acts like short-term memory. Instead of remembering all past conversations, it focuses on just a few. To use this memory effectively, we load a conversation buffer window memory object from LangChain. Then, we set a value for ‘K,’ which decides how many previous conversations it should remember. For instance, if ‘K’ is set to 2, it will only recall the last 2 conversations.
Conversation Token Buffer Memory
Token Buffer Memory stores a specific number of tokens in memory. This is important because when you make API calls to OpenAI, you’re charged based on the number of tokens you send or receive. Limiting the tokens can help manage costs.
Now, defining this memory works a bit differently. The conversation token buffer memory takes two inputs: the LLM model and the maximum token limit. This is because different LLMs tokenize text differently, so you need to specify which one you’re using to determine tokenization and token count.
The second parameter sets the total number of tokens you want to keep in memory. For example, if we set the maximum token limit to 500, it means the memory will retain some conversations, given the large token limit.
When you access the memory variable, you can see the history, including the human prompt and the AI response. Now, when I reduced the maximum token limit to 70, something interesting happened. We first ran it through the initial prompt and got a response. Then, we ran it through a second prompt. In this case, we can see that both prompts are tracked. However, due to the 70-token limit, when we access the memory, it only retains the AI response and discards the human prompt along with the previous conversation between the human and the AI.
Conversation Summary Buffer Memory
Conversation Summary Buffer Memory is a memory system that combines recent interactions and creates summaries instead of just forgetting them. It stores these summaries in memory and uses them alongside new interactions. Instead of counting interactions, it considers the length of tokens to decide when to clear old interactions.
Let me explain how you set up conversation summary memory. First, you load it from the memory in LangChain. Then, defining it is quite similar to setting up token buffer memory. You just need to specify the LLM you’re using and the maximum token limit. Whatever details you provide, it will summarize them within 100 tokens and store them in the memory buffer.
So, the first conversation we had or the story we loaded into memory keeps a summary of it. Then, as we continue with the next conversation, it adds a summary of that too. But here’s the thing: when you look at the memory again, it actually summarizes even the previous prompt. The newest prompt gets added, as long as we haven’t reached the token limit. However, everything before the newest prompt gets summarized. That’s why it included the summary: “Did Sam find the stolen necklace” This type of memory helps to keep track of all our previous conversations and provides a summary to the model as context.