Ai Engineering 3 min read

100x Token Reduction Drives $98M Round for Stanford AI Spinout

Founded by Stanford researchers, Engram emerged from stealth with a $600 million valuation to replace traditional RAG with continuous neural memory.

On June 23, 2026, Stanford University spinout Engram emerged from stealth with a $98 million funding round to commercialize a continuous neural memory layer for large language models. The architecture pre-processes organizational data into a compressed state, eliminating the need for AI agents to re-read contextual documents with every query. The funding round, led by General Catalyst, Kleiner Perkins, and Sequoia Capital, values the 13-person startup at $600 million.

Prominent individual investors include OpenAI co-founder Andrej Karpathy, Berkeley AI Research co-director Pieter Abbeel, and Wiz CEO Assaf Rappaport.

Decoupling Inference from Memory

Current large language models suffer from what Engram calls the “genius stranger” problem. Models possess high baseline reasoning capabilities but exhibit “context amnesia” across separate sessions. They must be continuously force-fed context through their input windows.

Engram solves this by architecturally separating the AI’s reasoning capabilities from its memory. The system ingests documents, workflows, and historical decisions, compressing them into a reusable neural memory layer. This memory updates in real-time through online continual learning, bypassing the need to retrain the base model from scratch.

For developers evaluating how to add memory to AI agents, this approach offers an alternative to traditional Retrieval-Augmented Generation. Rather than retrieving and injecting raw text chunks on the fly, the model queries a pre-computed neural representation of the organization’s knowledge.

Token Efficiency and Performance

By keeping static organizational context out of the active prompt, Engram allows models to match or exceed frontier system performance while utilizing only 1% to 10% of the token volume. This addresses a major scaling bottleneck for teams trying to reduce LLM API costs in production environments.

CapabilityTraditional RAGEngram Memory Layer
Context InjectionIn-context text chunksCompressed neural representations
Token OverheadHigh per-query cost1% to 10% of frontier baselines
Knowledge UpdateVector database indexingReal-time continuous learning

Stanford Origins and Commercial Integration

Engram was founded in October 2025 by researchers from the Stanford University AI Lab. The core compression method stems from a foundational Stanford research project codenamed Cartridges, led by Engram CTO Sabri Eyuboglu. The founding team also includes CEO Dan Biderman, Stanford professor Chris Ré, and researchers Jessy Lin, Jack Morris, and Scott Linderman.

The startup has already secured integration partnerships with major enterprise software providers. Microsoft is currently testing the memory layer within Microsoft 365 to maintain persistent organizational context across applications. Notion is integrating the technology to build persistent workspaces, and legal AI platform Harvey uses the system to manage complex, multi-document legal reasoning tasks efficiently.

If you build applications that require persistent user or organizational context, evaluate whether your current retrieval strategy relies entirely on context window stuffing. Moving toward architectures that support continuous learning or compressed state representations will become necessary to maintain manageable inference costs as your document volume scales.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading