100x Token Reduction Drives $98M Round for Stanford AI Spinout
Founded by Stanford researchers, Engram emerged from stealth with a $600 million valuation to replace traditional RAG with continuous neural memory.
On June 23, 2026, Stanford University spinout Engram emerged from stealth with a $98 million funding round to commercialize a continuous neural memory layer for large language models. The architecture pre-processes organizational data into a compressed state, eliminating the need for AI agents to re-read contextual documents with every query. The funding round, led by General Catalyst, Kleiner Perkins, and Sequoia Capital, values the 13-person startup at $600 million.
Prominent individual investors include OpenAI co-founder Andrej Karpathy, Berkeley AI Research co-director Pieter Abbeel, and Wiz CEO Assaf Rappaport.
Decoupling Inference from Memory
Current large language models suffer from what Engram calls the “genius stranger” problem. Models possess high baseline reasoning capabilities but exhibit “context amnesia” across separate sessions. They must be continuously force-fed context through their input windows.
Engram solves this by architecturally separating the AI’s reasoning capabilities from its memory. The system ingests documents, workflows, and historical decisions, compressing them into a reusable neural memory layer. This memory updates in real-time through online continual learning, bypassing the need to retrain the base model from scratch.
For developers evaluating how to add memory to AI agents, this approach offers an alternative to traditional Retrieval-Augmented Generation. Rather than retrieving and injecting raw text chunks on the fly, the model queries a pre-computed neural representation of the organization’s knowledge.
Token Efficiency and Performance
By keeping static organizational context out of the active prompt, Engram allows models to match or exceed frontier system performance while utilizing only 1% to 10% of the token volume. This addresses a major scaling bottleneck for teams trying to reduce LLM API costs in production environments.
| Capability | Traditional RAG | Engram Memory Layer |
|---|---|---|
| Context Injection | In-context text chunks | Compressed neural representations |
| Token Overhead | High per-query cost | 1% to 10% of frontier baselines |
| Knowledge Update | Vector database indexing | Real-time continuous learning |
Stanford Origins and Commercial Integration
Engram was founded in October 2025 by researchers from the Stanford University AI Lab. The core compression method stems from a foundational Stanford research project codenamed Cartridges, led by Engram CTO Sabri Eyuboglu. The founding team also includes CEO Dan Biderman, Stanford professor Chris Ré, and researchers Jessy Lin, Jack Morris, and Scott Linderman.
The startup has already secured integration partnerships with major enterprise software providers. Microsoft is currently testing the memory layer within Microsoft 365 to maintain persistent organizational context across applications. Notion is integrating the technology to build persistent workspaces, and legal AI platform Harvey uses the system to manage complex, multi-document legal reasoning tasks efficiently.
If you build applications that require persistent user or organizational context, evaluate whether your current retrieval strategy relies entirely on context window stuffing. Moving toward architectures that support continuous learning or compressed state representations will become necessary to maintain manageable inference costs as your document volume scales.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Continued Pretraining vs RAG: Two Ways to Add Knowledge
Continued pretraining bakes knowledge into model weights. RAG injects it at query time. When to use each, where each breaks down, and why you often need both.
$200M Series F Values Coralogix's Agent Observability at $1.6B
Coralogix has raised $200 million to build observability infrastructure for autonomous AI agents, deploying MCP support and schema-free telemetry data lakes.
$27M Funding Round Backs CopilotKit's App-Native Agent Stack
CopilotKit has raised $27 million to expand its generative UI framework and launch a self-hostable enterprise intelligence platform for app-native AI agents.
Groq Lands $650M to Scale Neocloud Inference Infrastructure
Following a $20 billion IP deal with Nvidia that drained its founding team, Groq has raised $650 million to rebuild as a dedicated inference cloud provider.
Meta AI Mode Grounds Search in Social Data via Llama 4
Meta's new AI Mode uses a fine-tuned Llama 4 model and RAG pipeline to synthesize public Facebook and Instagram posts into generative search responses.