Meet EverOS: An Open Source Markdown-First Agent Memory Runtime With Hybrid BM25 + Vector Retrieval and Automation Capabilities

EverMind released EverOSan open source in-memory runtime for AI agents. It is distributed under the Apache 2.0 license. It addresses the problem that agent developers first encountered: large language models are stateless. The conversation ends, and the context is gone.
EverOS suggests a different substrate. Instead of locking the memory inside the vector data, it writes the memory as plain Markdown files. Those files become the source of truth that agents read, edit, and search every time.
The TL;DR
- EverOS stores agent memory as editable Markdown, indexed by SQLite and LanceDB.
- Hybrid retrieval combines BM25, vector search, and scalar sorting in a single query.
- Cases grow into reusable Skills, giving agents a process, memory that changes.
- Benchmark scores are tight but reported by EverMind; confirm with your work.
- It is open source under Apache 2.0, with cloud and self-hosted equivalents.
What is EverOS?
EverOS is a Python library and runtime for local memory. It works as a server with CLI and FastAPI HTTP API, async-first throughout. You throw it into the existing agent loop instead of rebuilding your stack.
The design separates the two memory tracks. User-side memory holds Profiles, Episodes, Facts, and Foresights. Agent-side memory holds Charges and Skills. Keeping them separate is rare; Most libraries focus only on conversational history.
All records remain as a .md file. You can open, edit, grep, and Git version it, or view it in Obsidian. EverAlgo, a separate library, which handles subtraction algorithms. EverOS organizes and persists results.
The endpoint stack is compatible with the OpenAI-protocol. Connects to OpenAI, OpenRouter, vLLM, Ollama, or DeepInfra by changing the base URL. That keeps integration close to a single configuration change.
The runtime is default-start. Data must not leave your location, and all layers are auditable. A managed EverOS Cloud option exists for teams that prefer not to manage themselves. Both share the same SDK, retrieval engine, and memory format.
Architecture – Markdown, SQLite, and LanceDB
EverOS uses a three-piece storage stack. Markdown is the source of truth. SQLite manages state and queue. LanceDB handles vectors, BM25, and scalar filters.
This is intentionally easier than setting up a normal production memory. No MongoDB, Elasticsearch, Milvus, Redis, or Kafka required. For individual developers and small teams, that lowers operating costs.
Recovery is a hybrid. A single LanceDB query includes BM25 keyword matching, density vector search, and scalar filtering. EverMind markets this method of mass discovery as mRAG.
Cascade directory synchronization keeps files and directories aligned. Planning a .md file triggers a file viewer that resynchronizes the directory. Memory always looks without creating.
The retrieval is also orthogonal to all identifiers. You can test the search with user_id, agent_id, app_id, project_idagain session_id. That scope is important for multi-agent and multi-user applications where data fragmentation is required.
The Evolution of Memory – Cases That Become Skills
A different feature is process memory. EverOS records each completed task of an agent as a case. Repeated successful patterns are extracted offline into reusable Skills.
This is a ‘self-changing’ claim, clearly stated. Skills are shared across the agent team, without manual review and no hard code. The goal is for agents to progress through use instead of restarting each session.
Version 1.1.0 added more life mechanics. Introduce Knowledge APIs to source-based Markdown pages with taxonomy and topic search. It also added Reflection, an offline process that combines sets of episodes and develops profiles and skills between sessions.
The memory model is simple. Episodic memory answers ‘what’s up.’ Profile memory answers ‘who is this user.’ Procedural memory answers ‘how to do this task.’
Benchmark
The EverMind team reports 93.05% in LoCoMo, 83.00% in LongMemEval, and 93.04% in HaluMem. It also quotes sub-500ms p95 detection latency. LoCoMo and LongMemEval measure long-term conversational memory; HaluMem targets memory loss. These numbers are from the EverMind post.
The table below compares EverOS to other common approaches in concrete design dimensions:
| Size | EverOS | Naive RAG | Full content window | Some memory libraries |
|---|---|---|---|---|
| The source of truth | Plain Markdown .md files | Vector DB records | Information only | API or database status |
| Local stack | Markdown + SQLite + LanceDB | Vector DB + application code | Nothing | Commonly managed services |
| Retrieval | Hybrid BM25 + vector + scalar | Only the dense vector | Nothing (no return) | It varies |
| Process memory | Charges have been refined into skills | Nothing | Nothing | Unusual |
| Multimodal access | PDF, image, Office, URL in one call | Hand pipe | For context only | In part |
| LoCoMo accuracy | 93.05% (EverMind-reported) | – | N/A (content limitation) | It varies |
| License | Apache 2.0 | It varies | N/A | It varies / concerning |
Use Cases, With Real Examples
The library includes functional integration. They show what persistent memory does in real products.
Hive Orchestrator is a browser-native concept for CLI coding agents. Claude Code, Codex, Gemini, and OpenCode work together as true PTY processes through a shared team protocol.
Regrouping uses semantic memory to search for social value. Parents describe what they remember, children describe what they remember, and the system checks for connections.
Other examples include health care and hardware. It includes an Alzheimer’s memory assistant and wearable AI. Wearables listen to everyday life and turn it into a memory. A research associate with self-altering memories is also among the examples. The wider ecosystem adds the Claude Code plugin and an MCP-based memory layer for coding assistants.
Walking the Five Minute Code
Installation uses standard Python tools. EverOS requires Python 3.12 or newer. The local demo does not require API keys.
# Requires Python 3.12+
uv pip install everos # or: pip install everos
everos demo # local educational visualizer, no keys
everos init # paste OpenRouter + DeepInfra keys into .env
everos server start # starts the FastAPI server
curl # -> {"status":"ok"}Adding and searching memory are common HTTP calls. The example below stores true, forces an output, and retrieves it.
# 1) Add a short conversation
curl -X POST
-H 'Content-Type: application/json'
-d '{"session_id":"demo-001","app_id":"default","project_id":"default",
"messages":[{"sender_id":"alice","role":"user","timestamp":1750000000000,
"content":"I love climbing in Yosemite every spring."}]}'
# 2) Flush to force extraction (local demo)
curl -X POST
-H 'Content-Type: application/json'
-d '{"session_id":"demo-001","app_id":"default","project_id":"default"}'
# 3) Search it back
curl -X POST
-H 'Content-Type: application/json'
-d '{"user_id":"alice","app_id":"default","project_id":"default",
"query":"Where do I like to climb?","top_k":5}'Multimodal import is an optional extra. It includes everos[multimodal] add editing of images, PDFs, and audio. Office documents also require LibreOffice, which converts files to PDF before splitting them.
Try it: Active Memory Demo
The embedded demo below simulates the EverOS loop in your browser. Add captions, watch them being extracted and tagged, then search again with mixed returns. It shows and does not connect to the live server.



