A Claude Code plugin that captures tool outputs as structured observations, stores them in a local SQLite database, and re-injects relevant context when a new session starts. It's a plugin, not an MCP server — it fires automatically on session events instead of waiting for Claude to call it.

How is this different from CLAUDE.md or auto memory?

CLAUDE.md is static markdown you write by hand and caps out around 200 lines before it stops helping. Auto memory saves unstructured notes at unpredictable moments. Claude-mem compresses every tool call into typed, searchable observations with FTS5 keyword search — automatic capture with structured retrieval.

Does it cost extra to run?

It uses your existing Claude Code authentication for compression calls, so there's no separate subscription. The default model is Haiku, which keeps per-compression cost low (around 400 input + 150 output tokens per tool call).

Is it safe to run on a work machine?

Be careful. The February 2026 audit flagged the HTTP API on port 37777 as unauthenticated — any local process can read observations, view plaintext API keys in settings, or inject memories. Run it on personal dev machines only until those issues are fixed.

Claude-Mem: Giving Claude Code a Memory That Survives Sessions

TL;DR: Claude-mem is a Claude Code plugin that watches your sessions, compresses tool outputs into structured observations, and hands the relevant ones back when you start the next session. It fixes the “re-explain the project every morning” problem. It also has a security audit you should read before putting it on a work machine.

Every Claude Code session used to start the same way. I’d open the project, ask a question, and watch Claude re-discover the same file structure it had already mapped yesterday. Same files opened. Same grep commands. Same “let me understand the codebase” preamble.

Most of my context budget was spent re-learning what the previous session had already figured out.

Claude-mem is a plugin that fixes that. Not perfectly, not without caveats — but meaningfully. Here’s what it does, how it works, and what you should know before installing it.

What claude-mem actually is

It’s a plugin, not an MCP server. That distinction matters.

MCP servers sit idle until Claude decides to call them. They’re reactive.
Plugins fire automatically on lifecycle events — session start, every tool call, session end. They’re ambient.

Claude-mem hooks into five events (SessionStart, UserPromptSubmit, PostToolUse, Stop, SessionEnd), compresses what Claude did, and writes it to a local SQLite database at ~/.claude-mem/claude-mem.db. On the next session, it queries that database and injects the relevant parts back into context.

No cloud. No account. Your observations never leave your machine.

Installation — and one trap to avoid

Two commands inside a Claude Code session:

/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem

Then restart Claude Code.

Do not run npm install -g claude-mem. That installs the SDK only — no hook registration, no worker activation. The plugin does nothing. I wasted an hour on this before reading the docs properly.

Quick sanity check after install:

curl http://localhost:37777/api/health
# -> {"status":"ok"}

If you see that, hooks are wired and the worker is up. A web viewer also runs at http://localhost:37777 — useful to watch observations stream in while you work.

How the capture actually works

Every time Claude calls a tool, the output goes to a background worker. The worker sends it to a small model (Haiku by default) with a prompt that extracts a structured observation:

Field	What it holds
`type`	`decision`, `bugfix`, `feature`, `refactor`, `discovery`, `change`
`title`	One searchable line
`facts`	~50 tokens — cheap discrete points
`narrative`	~150-500 tokens — prose, loaded on demand
`concepts`	Semantic tags: `how-it-works`, `problem-solution`, `gotcha`, `trade-off`

Per-call capture matters. If your session crashes mid-refactor, tools that only compress at session end lose everything since the last completed session. Claude-mem has every observation up to the last tool call.

The part most “memory” tools get wrong

Dumping every past observation into the next session is the obvious first idea. It’s also useless. You end up with 35,000 tokens of history and maybe 2,000 of it is relevant to the current task. That’s a 6% signal rate, and Claude spends its context window chewing through noise.

Claude-mem uses a three-tier retrieval:

Search — 50-100 tokens per result. A compact index of matching observations.
Timeline — 100-200 tokens per result. Chronological context around a moment.
get_observations — 500-1,000 tokens per result. Full records, fetched by ID only when Claude actually needs the detail.

The idea: start cheap, drill down only when you have to. There’s an MCP tool literally called __IMPORTANT whose entire job is reminding Claude to follow this pattern — because without it, Claude will skip the cheap layers and fetch everything at full detail. Which defeats the architecture.

This is the most interesting design choice in the whole plugin.

Configuration worth knowing

Everything is tunable via http://localhost:37777/Settings or environment variables. The ones I actually touched:

CLAUDE_MEM_MODEL — compression model. Haiku is the default and fine. Switch provider with CLAUDE_MEM_PROVIDER if you want Gemini or OpenRouter instead.
CLAUDE_MEM_CONTEXT_OBSERVATIONS — how many observations get injected at session start. Default 50, range 1-200. I dropped it to 25 during the first week on a new project (see below).
CLAUDE_MEM_CONTEXT_FULL_COUNT — how many of those get expanded to full detail. Default 5.
CLAUDE_MEM_SKIP_TOOLS — excludes specific tools from capture. TodoWrite, AskUserQuestion, BashTool are skipped by default, which is the right call.

Privacy escape hatch: wrap content in <private> tags inside a prompt and it stays out of storage. Useful when you paste something you don’t want sitting in a SQLite file forever.

The security audit you should actually read

A February 2026 community audit flagged the HTTP API on port 37777 as unauthenticated. Any local process on your machine can:

Read every stored observation
View settings — including plaintext API keys
Inject arbitrary memories into Claude’s next context

The default binding was 0.0.0.0 rather than localhost, which meant cloud VMs without firewalls exposed the API to the internet. Path traversal bugs in smart_unfold and smart_outline compound the problem.

My rule: personal dev machines only. Not shared servers. Not cloud dev boxes. Not work laptops with other people’s code on them. The architecture is promising — the security posture isn’t there yet.

There are also reliability issues worth knowing about: an early ChromaDB-backed build leaked subprocesses badly (one user hit 184 orphaned processes chewing 16GB of RAM in 19 hours). The FTS5 backend — SQLite’s built-in full-text search — avoids that entirely. Use FTS5.

What actually changed in my workflow

Three weeks in, a 39 MB database, 6,814 observations across ten projects. What’s different:

No more re-explaining project structure. Claude walks in knowing the shape of the repo.
Debugging paths don’t restart from zero. If we chased a bug down a wrong alley yesterday, today’s session remembers that alley was wrong.
61% of my observations are typed discovery. Claude is mostly capturing what it learns, not what it changes. That was unexpected — I thought the value would be in remembering refactors. It’s actually in remembering comprehension.

One gotcha: the first week on a new project, your context fills faster than normal because every file is new and every tool call produces a discovery-type observation. Lower CLAUDE_MEM_CONTEXT_OBSERVATIONS to 20-30 until the discovery rate flattens. Then raise it back.

The real point

Session-based coding assistants have a blank-slate problem. Every morning, they forget you. The fixes people reach for — CLAUDE.md, auto memory, /compact — are partial. Markdown files rot. Auto memory saves at random. /compact trades your conversation for context room.

Claude-mem is the first approach I’ve used that treats memory as infrastructure instead of a manual habit. Capture happens automatically. Retrieval is tiered so the cheap layers do the work. Privacy stays local.

It’s not production-ready for regulated environments. But for a solo developer on a personal machine, the continuity is real. Three weeks of not re-explaining my own codebases has already paid for the setup time.

Install it on a laptop you own. Watch the observations stream in for a day. Decide for yourself.