ChatGPT Images 2.0 Adds Multilingual Text and Thinking Mode

OpenAI released ChatGPT Images 2.0 on April 21, 2026, replacing the previous gpt-image-1.5 model. The update introduces the gpt-image-2 model, featuring native web search, multilingual text rendering, and a new agentic reasoning mode that drafts image layouts before rendering. The release coincided with OpenAI’s permanent discontinuation of its Sora video model, marking a strategic shift toward high-utility static visuals and reasoning-based visual workflows.

Technical Architecture and API Pricing

The core of the update is a “Thinking” mode available to Plus, Pro, and Business subscribers. The model can search the web for real-time context and generate up to eight coherent images from a single prompt. It supports wide aspect ratios suitable for infographics and cinematic portraits, with resolutions up to 2K. Higher resolutions are currently available in beta via the API.

Feature	Details
Core Model	gpt-image-2
Output Limit	Up to 8 Images
Max Resolution	2K (Higher in Beta)
Input Cost	$8.00 per 1M tokens
Output Cost	$30.00 per 1M tokens

When formatting requests for these new layout capabilities, standard prompt engineering principles apply, as the model uses text-based reasoning to determine object placement before rendering pixels.

Multilingual Capabilities and Regional Adoption

The gpt-image-2 model natively renders non-Latin scripts, including Hindi, Bengali, Chinese, Japanese, and Korean. This multilingual text capability drove immediate adoption in India, where the application recorded 5 million downloads during the launch week compared to 2 million in the U.S. Sensor Tower data showed a 3.4% week-over-week increase in daily active users in India.

OpenAI facilitated this regional growth with a piloted “ChatGPT Go” subscription tier priced at $8 per month, designed to bridge the gap between free and premium versions. Users in the region primarily utilize the tool for high-fidelity avatars, cinematic portraits, and fantasy-themed imagery.

Global reception remains measured. Similarweb data showed only a 1.6% increase in global web traffic following the release. The modest Western response aligns with established competition from Google’s Nano Banana 2, which launched in February 2026 with similar dense text rendering capabilities.

If you integrate image generation into production applications, evaluate the $30 per million output token cost against the reduced need for multi-shot prompting. The model’s ability to plan layouts and verify text structure beforehand reduces the total number of API calls required to achieve a usable infographic or localized asset.

ChatGPT Images 2.0 Adds Multilingual Text and Thinking Mode

Technical Architecture and API Pricing

Multilingual Capabilities and Regional Adoption

Keep Reading

How to Build Cross-Modal RAG Pipelines With Gemini Embedding 2

ComfyUI Reaches $500M Valuation to Scale Node-Based GenAI

Google Photos Adds 3D-Aware Re-composition via Auto Frame

AWS SageMaker adds NVIDIA Blackwell G7e inference instances

Nano Banana 2: Putting Your Own Photos Inside Gemini Images