100x Token Reduction Drives $98M Round for Stanford AI Spinout

On June 23, 2026, Stanford University spinout Engram emerged from stealth with a $98 million funding round to commercialize a continuous neural memory layer for large language models. The architecture pre-processes organizational data into a compressed state, eliminating the need for AI agents to re-read contextual documents with every query. The funding round, led by General Catalyst, Kleiner Perkins, and Sequoia Capital, values the 13-person startup at $600 million.

Prominent individual investors include OpenAI co-founder Andrej Karpathy, Berkeley AI Research co-director Pieter Abbeel, and Wiz CEO Assaf Rappaport.

Decoupling Inference from Memory

Current large language models suffer from what Engram calls the “genius stranger” problem. Models possess high baseline reasoning capabilities but exhibit “context amnesia” across separate sessions. They must be continuously force-fed context through their input windows.

Engram solves this by architecturally separating the AI’s reasoning capabilities from its memory. The system ingests documents, workflows, and historical decisions, compressing them into a reusable neural memory layer. This memory updates in real-time through online continual learning, bypassing the need to retrain the base model from scratch.

For developers evaluating how to add memory to AI agents, this approach offers an alternative to traditional Retrieval-Augmented Generation. Rather than retrieving and injecting raw text chunks on the fly, the model queries a pre-computed neural representation of the organization’s knowledge.

Token Efficiency and Performance

By keeping static organizational context out of the active prompt, Engram allows models to match or exceed frontier system performance while utilizing only 1% to 10% of the token volume. This addresses a major scaling bottleneck for teams trying to reduce LLM API costs in production environments.

Capability	Traditional RAG	Engram Memory Layer
Context Injection	In-context text chunks	Compressed neural representations
Token Overhead	High per-query cost	1% to 10% of frontier baselines
Knowledge Update	Vector database indexing	Real-time continuous learning

Stanford Origins and Commercial Integration

Engram was founded in October 2025 by researchers from the Stanford University AI Lab. The core compression method stems from a foundational Stanford research project codenamed Cartridges, led by Engram CTO Sabri Eyuboglu. The founding team also includes CEO Dan Biderman, Stanford professor Chris Ré, and researchers Jessy Lin, Jack Morris, and Scott Linderman.

The startup has already secured integration partnerships with major enterprise software providers. Microsoft is currently testing the memory layer within Microsoft 365 to maintain persistent organizational context across applications. Notion is integrating the technology to build persistent workspaces, and legal AI platform Harvey uses the system to manage complex, multi-document legal reasoning tasks efficiently.

If you build applications that require persistent user or organizational context, evaluate whether your current retrieval strategy relies entirely on context window stuffing. Moving toward architectures that support continuous learning or compressed state representations will become necessary to maintain manageable inference costs as your document volume scales.

100x Token Reduction Drives $98M Round for Stanford AI Spinout

Decoupling Inference from Memory

Token Efficiency and Performance

Stanford Origins and Commercial Integration

Keep Reading

Continued Pretraining vs RAG: Two Ways to Add Knowledge

$200M Series F Values Coralogix's Agent Observability at $1.6B

$27M Funding Round Backs CopilotKit's App-Native Agent Stack

Groq Lands $650M to Scale Neocloud Inference Infrastructure

Meta AI Mode Grounds Search in Social Data via Llama 4