BrumoSign in
Essay

AI that actually remembers: memory models explained

The gap between 'AI chatbot' and 'AI companion' is memory. Here's a plain-English explainer of how AI memory systems actually work in 2026.

April 23, 2026·10 min read

The raw problem

Large language models have a context window — a fixed number of tokens (words, roughly) they can see at a time. The window is big by historical standards (up to 2 million tokens for Gemini, 200k for Claude Opus), but it's still finite, and more importantly, it's per-request.

Close the tab, and the context is gone. The model doesn't remember anything you said because the model is stateless. Every request starts fresh.

That's the memory problem.

How memory gets bolted on

Memory in AI products is always a layer *around* the model, never inside it. The model is stateless; the product wraps it with storage and retrieval. There are three common patterns:

1. Full-context replay

Dump the entire conversation history into every request. Works for short conversations. Falls apart as the history gets long — you hit the context window ceiling, or the model starts getting confused by too much text, or the bill gets huge.

2. Summarization

Periodically generate a summary of the conversation so far, store the summary, and feed the summary into future requests instead of the full history. Cheaper. Loses fine detail. Good for capturing vibe/state, bad for remembering specifics like 'my sister's name is Mira'.

3. Retrieval-augmented (RAG)

Store structured facts about the user, and at query time, retrieve only the relevant facts based on what the user just said. Then feed those facts + recent context into the request. This is what most serious long-memory AI uses, including Brumo.

What 'structured facts' actually means

When Brumo reads your messages, an extraction step pulls out durable facts:

  • People — 'my sister Mira', 'my therapist Dr. Price'
  • Plans and promises — 'i'm cutting sugar starting monday', 'call mom friday'
  • Life events — 'got the promotion', 'mira is moving to LA'
  • Preferences — 'hates vegetables', 'wakes at 6am'
  • Moods — time-stamped emotional state data

Each fact is stored with a type and a timestamp. When a future message comes in, the retrieval step scores facts by relevance and pulls the top N into the prompt.

Why retrieval beats summarization

Summarization tells the model 'the user and I had this kind of conversation'. Retrieval tells the model 'the user's sister is named Mira, is moving to LA in May, and the user is stressed about it'. Much more useful, much more specific.

The trade-off: you only get what you retrieve. If the retriever misses a relevant fact, the model replies without knowing it. Good retrieval tuning is most of what makes memory feel like memory rather than 'the AI sometimes remembers stuff'.

How Brumo's memory works, specifically

For every incoming message, the backend:

  • Loads the last day or two of chat history (short-term context)
  • Runs a retrieval query over your stored facts, scoring by relevance to the new message
  • Pulls the top-scoring facts into the prompt alongside the short-term history
  • Generates the reply
  • Runs an extraction step on your new message to add new facts to storage

Every user has their own private fact store, scoped to their phone number. You can view it all on the Brain tab of the dashboard and delete anything.

Why most AI 'memory' feels bad

ChatGPT added a memory feature and most users don't think it's very memorable. Reasons:

1. Memory is opt-in and bounded

ChatGPT memory is a small number of fact slots per user, manually triggered. That's by design for privacy, but it means memory feels like a feature you have to manage, not a capability the AI has.

2. Tool framing vs. companion framing

Tools are episodic (one task at a time). Memory is most valuable in relational use, which ChatGPT isn't primarily designed for. The memory feels grafted on because it is.

3. Context vs. retrieval

Some products just stuff everything into context and call it memory. That works until it doesn't — the context window fills, the model gets confused, and the illusion breaks.

What 'good memory' feels like

When memory is working right, the AI brings things up unprompted in context. Three weeks after you mentioned your sister moving, you say 'ugh this weekend', and Brumo asks 'is that the mira goodbye thing?'

That's the signal that retrieval is tuned well. The AI knew what slice of memory to surface without you having to spell it out.

The future

Context windows will keep growing, but memory as a capability will still be an architectural layer on top. The models themselves will stay stateless. The interesting work is in the memory architecture — how facts get extracted, retrieved, decayed, corrected.

The companies that win the long-memory AI race will be the ones who treat memory as the product, not a feature.

Quick questions

Can I export my AI memory?

Depends on the product. Brumo lets you view and delete facts from the Brain tab, and full data export is available via email request.

Does bigger context window = better memory?

Only up to a point. Large context can help for single-session depth but isn't the same as persistent memory. Good memory architecture is separate from context size.

Can memory be hacked or leaked?

Same risk surface as any stored user data — depends on the vendor's security posture. Look for products that document encryption, access controls, and deletion rights.

The best way to experience Brumo is to text him.

Free, no install. Say hi and see what this is actually like in practice.

Talk to Brumo