ChatGPT’s new Images 2.0 model is surprisingly good at generating text

4 hours ago 1

It utilized to beryllium casual capable to separate betwixt human-made and AI-generated imagery — conscionable 2 years ago, you couldn’t usage representation models to create a paper for a Mexican restaurant without inventing caller culinary delights similar “enchuita,” “churiros,” “burrto,” and “margartas.”

Now, erstwhile I inquire the marque caller ChatGPT Images 2.0 exemplary for a paper of Mexican food, it creates thing that could instantly beryllium utilized successful a edifice without customers noticing that something’s off. (However, ceviche priced astatine $13.50 mightiness marque maine question the prime of the fish).

Image Credits:ChatGPT Images 2.0

For comparison, here’s the effect I got from DALL-E 3 2 years ago. (At the time, ChatGPT did not make images):

Image Credits:Microsoft Designer (DALL-E 3)

AI representation generators person historically struggled to spell due to the fact that they mostly utilized diffusion models, which enactment by reconstructing images from noise.

“The diffusion models […] are reconstructing a fixed input,” Asmelash Teka Hadgu, laminitis and CEO of Lesan AI, told TechCrunch successful 2024. “We tin presume writings connected an representation are a very, precise tiny part, truthful the representation generator learns the patterns that screen much of these pixels.”

Researchers person since explored different mechanisms for representation generation, similar autoregressive models, which marque predictions astir what an representation should look similar and relation much similar an LLM.

Unfortunately, OpenAI declined to reply a question successful a property briefing this week astir what benignant of exemplary is powering ChatGPT Images 2.0.

Techcrunch event

San Francisco, CA | October 13-15, 2026

The institution did, however, explicate that the caller exemplary has “thinking capabilities,” which springiness it the quality to hunt the web, marque aggregate images from 1 prompt, and double-check its creations — this allows Images 2.0 to make selling assets successful assorted sizes, arsenic good arsenic multi-paneled comic strips.

OpenAI besides says that Images has a stronger knowing of non-Latin substance rendering successful languages similar Japanese, Korean, Hindi, and Bengali. The model’s cognition cuts disconnected successful December 2025, which could interaction however accurately it tin make definite prompts involving caller news.

“Images 2.0 brings an unprecedented level of specificity and fidelity to representation creation. It tin not lone conceptualize much blase images, but it really brings that imaginativeness to beingness effectively, capable to travel instructions, sphere requested details, and render the fine-grained elements that often interruption representation models: tiny text, iconography, UI elements, dense compositions, and subtle stylistic constraints, each astatine up to 2K resolution,” OpenAI said successful a property release.

These capabilities mean that representation procreation isn’t arsenic accelerated arsenic typing a question to ChatGPT, but generating thing analyzable similar a multi-paneled comic inactive takes conscionable a fewer minutes.

All ChatGPT and Codex users volition beryllium capable to entree Images 2.0 starting Tuesday; paid users volition beryllium capable to make much precocious outputs. The institution volition besides marque the gpt-image-2 API available, with pricing babelike connected the prime and solution of outputs.

When you acquisition done links successful our articles, we whitethorn gain a tiny commission. This doesn’t impact our editorial independence.

Read Entire Article