Falcon Perception: TII's Open-Source Model for Dense Segmentation and OCR

The Technology Innovation Institute (TII) has released Falcon Perception and Falcon OCR, two open-source vision models built on an early-fusion Transformer architecture. For developers integrating vision into applications, these models eliminate the standard pipeline of routing image data through a frozen backbone before hitting a language decoder. Both models are available on Hugging Face under the permissive TII Falcon License 2.0.

Early-Fusion Architecture

Falcon Perception replaces the traditional modular vision pipeline with an early-fusion Transformer architecture. It processes image patches and text tokens in a shared parameter space from the first layer using a single 0.6-billion parameter unified backbone.

The model applies a hybrid attention mask to manage context. This configuration uses bidirectional attention for image tokens to establish global visual context alongside causal attention for prediction tokens during autoregressive generation. A lightweight token interface handles continuous spatial outputs, allowing the model to generate parallel high-resolution mask predictions.

Benchmark Performance

Falcon Perception targets dense image segmentation and open-vocabulary grounding driven by natural language instructions. On the SA-Co benchmark, the 0.6B parameter model scored a 68.0 Macro-F1, outperforming Meta’s SAM 3 at 62.3. TII introduced PBench as a diagnostic benchmark for compositional prompts testing spatial constraints, object relations, and text-reading capabilities. Falcon Perception scored a 57.0 average Macro-F1 on PBench. SAM 3 scored 44.4, and the larger Qwen3-VL-30B scored 52.7.

In the PBench Dense split for crowded scenes, Falcon Perception scored 72.6, well ahead of Qwen3-VL-30B’s 8.9. The architecture excels at OCR-guided grounding tasks. It can disambiguate specific objects by reading text directly off them, a task where traditional segmentation models struggle.

Document Intelligence

TII released Falcon OCR alongside the perception model. This 300-million parameter model focuses entirely on document text recognition and multi-column layouts. It achieved 80.3% on olmOCR and 88.64 on OmniDocBench. TII reports it has the highest throughput among open-source OCR models currently available. If you build a RAG application dealing with complex PDF layouts, this model provides a highly efficient text extraction layer.

Optimized Inference Stack

The custom attention patterns in Falcon Perception require specific optimizations for AI inference deployments. The release includes code for a vLLM docker server and MLX integration to run LLMs locally on Apple Silicon. The server relies on PyTorch’s FlexAttention to process variable-length sequences efficiently.

A paged inference engine utilizes virtual page tables to eliminate memory waste from padding. For repeated queries on the same image, an LRU High-Resolution Feature Cache skips redundant upsampling steps. On an NVIDIA H100, latencies measure roughly 100ms for prefill, 200ms for upsampling, and 50ms per instance for decoding. A cached upsample reduces the 200ms step to zero.

Integrating these models requires replacing multi-step vision pipelines with a single unified call. Update your inference infrastructure to support FlexAttention and paged KV caching to benefit from the zero-latency upsampling on repeated image queries.

Falcon Perception: TII's Open-Source Model for Dense Segmentation and OCR

Early-Fusion Architecture

Benchmark Performance

Document Intelligence

Optimized Inference Stack

Keep Reading

How to Run In-Loop Model Evaluations With olmo-eval

Google PHRM Achieves 6.09% MAPE in Passive Heart Rate Tracking

DharmaOCR 7B Proves Domain Alignment Beats Parameter Scaling

Third-Party Devices Gain Gemini Home APIs and Reference Specs

Apache 2.0 Gets 218B Command A+ as Cohere Acquires Reliant AI