ChatGPT Images 2.0 Adds Multilingual Text and Thinking Mode
OpenAI released ChatGPT Images 2.0 with the gpt-image-2 model, adding agentic web search, 2K resolution, and non-Latin script rendering capabilities.
OpenAI released ChatGPT Images 2.0 on April 21, 2026, replacing the previous gpt-image-1.5 model. The update introduces the gpt-image-2 model, featuring native web search, multilingual text rendering, and a new agentic reasoning mode that drafts image layouts before rendering. The release coincided with OpenAI’s permanent discontinuation of its Sora video model, marking a strategic shift toward high-utility static visuals and reasoning-based visual workflows.
Technical Architecture and API Pricing
The core of the update is a “Thinking” mode available to Plus, Pro, and Business subscribers. The model can search the web for real-time context and generate up to eight coherent images from a single prompt. It supports wide aspect ratios suitable for infographics and cinematic portraits, with resolutions up to 2K. Higher resolutions are currently available in beta via the API.
| Feature | Details |
|---|---|
| Core Model | gpt-image-2 |
| Output Limit | Up to 8 Images |
| Max Resolution | 2K (Higher in Beta) |
| Input Cost | $8.00 per 1M tokens |
| Output Cost | $30.00 per 1M tokens |
When formatting requests for these new layout capabilities, standard prompt engineering principles apply, as the model uses text-based reasoning to determine object placement before rendering pixels.
Multilingual Capabilities and Regional Adoption
The gpt-image-2 model natively renders non-Latin scripts, including Hindi, Bengali, Chinese, Japanese, and Korean. This multilingual text capability drove immediate adoption in India, where the application recorded 5 million downloads during the launch week compared to 2 million in the U.S. Sensor Tower data showed a 3.4% week-over-week increase in daily active users in India.
OpenAI facilitated this regional growth with a piloted “ChatGPT Go” subscription tier priced at $8 per month, designed to bridge the gap between free and premium versions. Users in the region primarily utilize the tool for high-fidelity avatars, cinematic portraits, and fantasy-themed imagery.
Global reception remains measured. Similarweb data showed only a 1.6% increase in global web traffic following the release. The modest Western response aligns with established competition from Google’s Nano Banana 2, which launched in February 2026 with similar dense text rendering capabilities.
If you integrate image generation into production applications, evaluate the $30 per million output token cost against the reduced need for multi-shot prompting. The model’s ability to plan layouts and verify text structure beforehand reduces the total number of API calls required to achieve a usable infographic or localized asset.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Build Cross-Modal RAG Pipelines With Gemini Embedding 2
Learn how to process text, images, video, and audio into a single semantic vector space using Google's natively multimodal Gemini Embedding 2 model.
ComfyUI Reaches $500M Valuation to Scale Node-Based GenAI
Comfy Org has secured $30 million in Series B funding led by Craft Ventures to scale its node-based AI generation platform as the control layer for creators.
Google Photos Adds 3D-Aware Re-composition via Auto Frame
Google has updated the Auto frame feature in Google Photos with a 3D-aware generative diffusion model to allow post-capture camera angle adjustments.
AWS SageMaker adds NVIDIA Blackwell G7e inference instances
Amazon SageMaker AI now offers G7e instances on NVIDIA RTX PRO 6000 Blackwell GPUs, with 96GB memory and 2.3x faster inference over G6e.
Nano Banana 2: Putting Your Own Photos Inside Gemini Images
Google's Nano Banana 2 powers personalized image generation in the Gemini app, letting users include themselves and loved ones using their Google Photos library.