See why AI like ChatGPT has gotten so good, so fast

We asked three AI systems to generate content using the same prompt. The results illustrate how quickly the technology has advanced.

AI-generated images

Prompt: This small bird has a pink breast and crown, and black primaries and secondaries


AI-generated image from 2016

A cutting-edge model generates tiny images of birds and flowers. (Reed et al.)


AI-generated image from 2022

A new training method called diffusion helps create images with greater detail, but the perspective still looks flattened. (Stable Diffusion)


AI-generated image from 2023

Dall-E 2 incorporates more breakthroughs and data to produce an image with a shallow depth-of-field characteristic of many bird photos, but does not include a pink breast. (Dall-E 2)

Artificial intelligence has become shockingly capable in the past year. The latest chatbots can conduct fluid conversations, craft poems, even write lines of computer code while the latest image-makers can create fake “photos” that are virtually indistinguishable from the real thing.

Press Enter to skip to end of carousel
Understanding AI
What is artificial intelligence?
AI is an umbrella term for any form of technology that can perform “intelligent” tasks. For decades, AI has been mostly used for analysis — trawling huge sets of data to find patterns. But a boom in generative AI, which uses this pattern-matching to create words, images and sounds, has opened up new possibilities.
What is generative AI?
The technology backs chatbots such as ChatGPT and image generators, such as Dall-E, which can create words, sounds, images and video, sometimes at a level of sophistication that mimics human creativity. This technology can’t “think” like humans do; it can find patterns and imitate speech, but it can’t interpret meanings.
How does AI learn?
AI can “learn” without programmer to tell it each step, a process called machine learning. It uses neural networks, mathematical systems modeled after the human brain, to find connections in huge data sets. The poems or images it makes may seem creative, but it’s really pattern matching based on which word is most likely to come next.
Is AI dangerous?
The boom in generative AI brings many exciting possibilities — but also concerns that it might cause harm. Chatbots can sometimes spread misinformation or “hallucinate” by producing information that sounds plausible, but is irrelevant, nonsensical or entirely false. It can be used to make fake images of real people, called deepsfakes.


End of carousel

It wasn’t always this way. As recently as two years ago, AI created robotic text riddled with errors. Images were tiny, pixelated and lacked artistic appeal. The mere suggestion that AI might one day rival human capability and talent drew ridicule from academics.

A confluence of innovations has spurred growth. Breakthroughs in mathematical modeling, improvements in hardware and computing power, and the emergence of massive high-quality data sets have supercharged generative AI tools.

[Quiz: Did AI make this? Test your knowledge.]

While artificial intelligence is likely to improve even further, experts say the past two years have been uniquely fertile. Here’s how it all happened so fast.

AI-generated responses

How three AI systems responded to the following text prompt

A clever name for a play about dinosaurs is


to quote the most brilliant of examples : ‘who can say for sure they weren’t aliens or aliens or aliens? they had the appearance of aliens ; they wore black leather pants and they walked with much

This model does not follow the sentence structure. (OpenAI GPT)


Dino-Brawl. Dinosaur-Brawl — which can be viewed in the player’s perspective — starts with Mr. D, a dino-hunter, chasing down a lone

This model gave a name but followed it with a confusing sentence. (GPT-2)


Dino-Mite: The Mesozoic Extravaganza!

This model used a pun for the name and provided a subtitle. (Chat-GPT)

A training transformation

Much of this recent growth stems from a new way of training AI, called the Transformers model. This method allows the technology to process large blocks of language quickly and to test the fluency of the outcome.

It originated in a 2017 Google study that quickly became one of the field’s most influential pieces of research.

To understand how the model works, consider a simple sentence: “The cat went to the litter box.”

Previously, artificial intelligence models would analyze the sentence sequentially, processing the word “the” before moving onto “cat” and so on. This took time, and the software would often forget its earlier learning as it read new sentences, said Mark Riedl, a professor of computing at Georgia Tech.

An arrow between each word in the sentence “The cat went to the litter box.”

The transformers model immediately processes the relationships between words — a method called attention. New AI models can examine “cat” alongside “litter” and “box.”

A matrix showing how closely related each word in the sentence is to each other word. “Cat” and “litter” are closely related.

To make sure the AI performs correctly, the transformers model builds in a testing step. It masks a word in the sentence to see if the AI can predict what’s missing. Additionally, companies such as OpenAI have humans rate the quality of the response. For example, if the word “cat” is masked and the computer offers “the dog went to the litter box,” it’s likely to get a thumbs down.

The sentence with a mask over the word “cat.” The replacement word “dog” gets a thumbs down, but the replacement word “cat” gets a thumbs up.

The model allows AI tools to ingest billions of sentences and quickly recognize patterns, resulting in more natural-sounding responses.

Another new training method, called diffusion, has also improved AI image generators such as Dall-E and Midjourney, allowing nearly anyone to create hyper-realistic photos with simple, even nonsensical, text prompts, such as: “Draw me a picture of a rabbit in outer space.”

Researchers feed these AI models billions of images, each paired with a text description, teaching the computer to identify relationships between images and words.

The diffusion method then layers “noise” — visual clutter that looks like TV static — over the images. The AI system learns to recognize the noise and subtract it until the image is once again clear.

[ AI can now create images out of thin air. See how it works.]

This process of corrupting and regenerating images teaches the AI to remove imperfections, fine tuning each response until it is crisp and sharp. It also learns the relationship between neighboring pixels, making the generated image more realistic.

AI-generated images

Images that three AI systems generated from the following prompt

A picture of a very clean living room


AI-generated image from 2016

This model generates an image so small the details are impossible to see. (Reed et al.)


AI-generated image from 2022

This model generates an image that resembles a living room, but the furniture and mirror are disfigured. (Stable Diffusion)


AI-generated image from 2023

This model generates an image with clean lines and reflections, but details like the coffee table’s legs are deformed. (Dall-E 2)

The rise of the supercomputer

These bigger, more complicated AI models require advanced computer systems. This hardware, too, has become much more powerful.

Graphics processing units, or “GPUs” — circuits commonly used for video games — offer immense computing power and excel at running tasks simultaneously, a key requirement for training artificial intelligence models.

Researchers can string thousands of GPUs together in super computers that operate in the cloud. This lets people access GPUs without building their own supercomputer — a system that makes high-powered computing cheaper and readily available to the masses.

Better quality data

Over the past several years, the data sets used to train AI have gotten bigger and better, too.

Previous data sets had lower quality content, Riedl said, like addresses or phone numbers, which aren’t helpful for learning natural speech.

Now, data sets include more news articles and book transcripts — content that shows how humans write and talk — which helps teach the AI to “speak” more fluidly.

As AI companies have trained models, they have gotten better at knowing how to create high-quality data, Riedl said. Entire organizations are now dedicated to building these data sets.

[Inside the secret list of websites that make AI like ChatGPT sound smart]

Scale AI, a San Francisco company, curates and labels training data sets for AI software. ImageNet, a data set started by Stanford University researchers, contains millions of labeled images. Common Crawl, a nonprofit that creates an open repository of web data, has developed text data sets from more than 12 years of internet scrapes. Some data sets have grown to contain billions of words.

More data sets are multimodal, meaning they combine images with text, allowing artificial intelligence systems to understand the relationship between the two. There also has been a rise in medical imaging data sets to fuel AI’s ability to spot diseases and ailments.

AI-generated responses

How three AI systems responded to the following text prompt

This is a haiku about baseball:


a man who’s going to lose, a man who doesn’t need to lose. he wants the people to win by doing something he wants, rather than because they’re being manipulated. they’re playing the game

This model begins in a somewhat poetic place but quickly stops making sense. (OpenAI GPT)


Fantasy Baseball: In the late 1800s the Yankees and his friends had a tournament where pitcher Francisco Gohara came to San Francisco from Texas at the height of what was called the Great Depression.

This model invents a sentence about a fake baseball player, seemingly ignoring the word “haiku.” (GPT-2)


In summer’s embrace,

Bats crack, balls soar through the air,

Baseball’s timeless grace.

This model’s output follows the traditional 5-7-5 syllable style, though the first and third line rhyme. (Chat-GPT)

What’s next?

Experts say it’s hard to predict how much better AI will get. Major obstacles stand in the way of further development. These models are expensive to run and exact a staggering environmental toll. They confidently churn out wrong, nonsensical and sometimes biased answers, while creating lifelike images that could sow confusion.

As tech giants such as Google and Microsoft race to incorporate AI into their products, a slew of companies are trying to expand AI’s capabilities to generate video, music and create detection tools to screen artificially generated content. Most people are likely to interact with this new technology in the near future. But how useful it will be and what impact it will have on society remains to be seen.

About this story

For each AI comparison graphic, we fed AI image and text generators the same prompt and used the first result. The 2016 image model was too old to run ourselves, so we used images from the Reed paper.

The image models were: Reed et al. (2016); Stable Diffusion v1.4 (first released in late 2021 but published in 2022); and Dall-E 2 (first released in 2022 but used in 2023). The text models were OpenAI-GPT (2018); GPT-2 Large (2019); and ChatGPT (first released in 2022 but used in 2023).

Editing by Alexis Sobel Fitts, Reuben Fischer-Baum, Karly Domb Sadof and Kate Rabinowitz.