Long-Term Memory (RAG)

AnyLLM features a long-term memory function based on the RAG (Retrieval-Augmented Generation) approach. In simple terms, it's the agent's ability to "remember" the content of your past sessions and use that knowledge in its current work.

Why is it needed?

Imagine working on a project for several days. Without long-term memory, every new launch is a fresh start. You have to explain the context to the agent all over again.

With the RAG feature, the agent can look into its dialogue archive (episodes.jsonl) and find solutions to similar problems or recall important details you agreed upon earlier. This transforms it from a simple executor into a full-fledged partner that learns as it works on your project.

Activation Methods

This feature is disabled by default. You can activate it in one of two ways:

1. Via anyllm.json (Recommended):

Add a rag object to your configuration file with two mandatory parameters: enable and mode.

{
  "rag": {
    "enable": true,
    "mode": "command" 
  }
}

2. Via startup flags:

Flags take precedence over the configuration file.

--rag-command: Enables RAG in manual mode.
--rag-llm: Enables RAG in automatic mode.

If the rag object is present in the config but is missing the enable or mode fields, the application will show an error requiring you to fix the configuration.

Modes of Operation

There are two modes of operation, designed for different workflows and model capabilities.

`command` Mode (Manual Search)

Recommendation: Ideal for small to medium-sized models (e.g., 7B-13B) and for those who want full control over the context.

In this mode, the agent will not automatically add retrieved information to its prompt. The memory does not affect token consumption in a normal conversation.

Instead, you get a powerful command, /search_history <query>, which allows you to manually search for information in past sessions.

Example: You've worked with the agent, and the next day you want to remember if you created a User class.

> /search_history user class
• Searching history for: 'user class'...

Found 1 relevant entries from past sessions:
--------------------------------------------------
Session: 20260203_151023_cfc60b2b (2026-02-03T15:11:50+00:00)
User: Create a User.php class with two parameters
Assistant: TOOL_CALL: write_file({"path":"User.php","content":"..."})
--------------------------------------------------

You see the result and remember that yes, such a class was created. You can now use this information in your next prompt.

`llm` Mode (Automatic Search)

Recommendation: For large and powerful models (40B+) that can handle a large context without getting lost in the "noise."

In this mode, before each of your prompts, the agent automatically searches its memory. The most relevant dialogue snippets found are added to its system prompt in a special <relevant_history> block.

Example: After the session from the example above, you start a new dialogue.

> Did I already create a user class?

// The agent automatically finds the past session in its memory.
// It understands what is being asked but doesn't blindly trust its memory.
// It decides to check the file system to give an accurate answer.

I will check if there is a User class in the project.
🛠  Using tool: search_content
│ Tool Output: Search results for 'class User':
src/Domain/User.php:5:class User

// Only after fact-checking does the agent give you an answer.
Yes, you have already created a User class. It is located in the file `src/Domain/User.php`.

This example perfectly demonstrates advanced behavior: the agent uses memory to understand the intent but uses tools to verify the current state.

Warning: This mode significantly increases the context size sent to the model with each step. This can increase the cost of requests to paid APIs and negatively affect the performance of small models, which can get "confused" by the extra information. Use it with caution.

Mode Summary

Mode	Auto-Context	Manual Command	Recommendation
`command`	No	✅	Small & Medium Models
`llm`	✅	✅	Large & Powerful Models

Edit this page on GitHub