AI Beats Average Humans At Creativity Test, But Creative Geniuses Still Reign Supreme

ChatGPT can now best the average person when it comes to creative tasks, according to recent research. That being said, if you’re among the most creative humans, your job is probably safe.

Researchers from the University of Montreal ran the largest direct comparison between human and machine creativity to date, pitting 100,000 people against nine of the world’s most advanced AI systems. The results? GPT-4 scored higher than typical humans on a standard creativity test. Google’s GeminiPro matched average human performance.

While all of that may be a bit distressing for biological beings reading this, it isn’t time to throw in the creativity towel on humanity just yet. When the AI systems were stacked against the top 10% of creative people, every AI model failed to measure up.

The test itself was deceptively simple: name 10 words as different from each other as possible. Someone who writes “car, dog, tree” shows less creative range than someone who comes up with “microscope, volcano, whisper.” The further apart the words are in meaning, the higher the creativity score.

“The persistent gap between the best-performing humans and even the most advanced LLMs indicates that the most demanding creative roles in industry are unlikely to be supplanted by current artificial intelligence systems,” the researchers wrote in their paper, published in Scientific Reports.

The Repetition Problem Nobody Expected

Despite beating average humans overall, GPT-4 kept using the same words over and over. The word “microscope” appeared in 70% of its responses. “Elephant” showed up 60% of the time. GPT-4-turbo was even worse, dropping “ocean” into more than 90% of its answers.

Humans? The most common word was “car” at just 1.4%. Then “dog” at 1.2% and “tree” at 1.0%. Real people naturally avoid repeating themselves. AI tends to fall back on the same high-probability words unless you adjust the settings.

The research team, led by Antoine Bellemare-Pepin and François Lespinasse, tested whether they could fix this. They adjusted something called “temperature,” which is essentially a dial that controls how random or predictable the AI’s word choices are. After the temperature was increased GPT-4 stopped repeating itself so much. Its creativity scores jumped, reaching a level higher than 72% of all human participants.

That’s useful for anyone trying to get better creative output from ChatGPT. But it also reveals something fundamental: AI creativity is a setting you can turn up or down, not an inherent capability.

When Newer Doesn’t Mean Better

OpenAI released GPT-4-turbo after the original GPT-4, presumably as an improvement. On this creativity test, though, it performed worse. Much worse.

The researchers found that newer versions don’t automatically get more creative. Sometimes they get less creative. The researchers suggest this might happen because newer versions are optimized for speed and cost, potentially trading creativity for efficiency.

Another noteworthy finding: Vicuna, a smaller open-source model, beat several larger, more expensive commercial alternatives. Bigger doesn’t mean more creative either.

The 100,000-Person Experiment

The study pulled participants from the United States, United Kingdom, Canada, Australia, and New Zealand: all English speakers balanced for age and gender. Everyone took the same test: list 10 unrelated words.

Researchers then fed identical instructions to nine different AI models, collecting 500 responses from each. They tested everything from household names like GPT-4 and Claude to lesser-known open-source models like Pythia and StableLM.

The team also pushed beyond simple word lists. They had the AI write haikus, movie synopses, and short fiction stories, then measured how diverse the ideas were. GPT-4 consistently beat GPT-3.5 on creative writing. However, human writers still produced work with greater variety and originality, especially in poetry and plot summaries.

What This Actually Means

If you’re a professional writer, designer, or artist, this research suggests you’re not about to be replaced. AI can match, and sometimes exceed, what an average person produces. But the best human creators operate on a different level entirely.

That gap matters. Most companies don’t hire average creators for their most demanding work. They hire the top performers, the people who can generate truly original ideas. Current AI can’t touch that tier.

For everyone else using ChatGPT to brainstorm or draft content, there’s a practical takeaway: if you want more creative results, tell the AI to increase its temperature setting (usually between 1.0 and 1.5 works well). You’ll get less repetition and more diverse outputs.

Source : https://studyfinds.org/ai-average-human-creativity/