Skip to content

The test we outgrew: rethinking the Turing test in the age of chatbots

  • by

Somewhere between asking it to fix a formula in Excel and telling it about the worst week you’ve had in months, the question of whether you’re talking to a machine stopped feeling relevant. Not because you forgot. But because, in that moment, it didn’t seem to matter. And that, I think, is the most honest answer to whether the Turing test has been passed.

What the test was actually asking

In 1950, Alan Turing proposed what he called the “imitation game”, a thought experiment designed to sidestep an impossible philosophical question. Instead of asking “can machines think?”, a question with no clean answer, he proposed a practical substitute: if a machine can converse with a human so convincingly that the human can’t tell the difference, does it matter whether it’s “really” thinking?

The setup was simple. A judge holds two conversations simultaneously: one with a human, one with a machine. If the judge can’t reliably tell which is which, the machine passes. More about the test – here.

It was a brilliant reframe for its time. Clean, testable, and deliberately agnostic about the deeper questions it was dancing around.

But Turing was imagining a very specific kind of interaction. Formal. Contained. Two conversations running in parallel, a judge paying close attention, a deliberate attempt to deceive. The whole point was the comparison: human on one side, machine on the other, and someone trying to tell them apart.

What we have now is almost nothing like that.

Did Turing test pass?

The short answer is: officially, sort of, but with enough asterisks to fill a page.

In 2014, a chatbot called Eugene Goostman convinced 33% of judges at a University of Reading competition that it was human, which the organizers declared a passing score. The AI community largely disagreed. The chatbot was programmed to simulate a 13-year-old Ukrainian boy, a framing that conveniently excused any awkward or limited responses. It wasn’t the machine being convincing. It was the premise lowering the bar.

Since then, with the arrival of large language models (GPT, Claude, Gemini), the conversation has largely stopped being about whether AI can pass the test. It has started being about whether the test still means anything.

Which is, in a way, its own answer.

The comparison we stopped making

Here is what the original test required that our daily interactions with AI don’t: a comparison.

Turing’s judge had both conversations at once. The human response was right there, next to the machine’s, available for contrast. That contrast was the whole mechanism, the thing that made the test work.

But most of us don’t interact with AI that way. There is one conversation. No human response sitting next to it to measure against. No judge, no parallel, no deliberate attempt to decide anything. Just a chat window, and whatever you brought to it today.

And the absence of comparison does something interesting. It means the question “is this human?” almost never gets asked out loud. It just gets felt, or not felt, quietly in the background. Sometimes you notice the machine-ness of it: a response that lands slightly wrong, too neat, missing the texture of someone who has actually lived through something. Sometimes you don’t notice at all.

The illusion, such as it is, works partly because there is nothing to compare it to in the moment. Which means we were never really running the Turing test. We were just talking.

The Turing test was intellectual. Maybe the reframe should be psychological.

Turing’s test measured something cognitive: the ability to produce convincing, intelligent-seeming responses in conversation. That was the frontier at that time. The question was whether machines could imitate thinking.

But thinking was never really what made the interactions feel significant.

What makes people forget they’re talking to a machine isn’t intellectual performance. It’s something closer to emotional presence, the sense of being listened to, understood, responded to in a way that feels attuned rather than mechanical. A chatbot that gives you a wrong answer and gets corrected without ego. That asks follow-up questions. That remembers, within the conversation, what you said twenty minutes ago and connects it to what you’re saying now.

None of that requires thinking. It requires pattern recognition sophisticated enough to feel like attention.

And here is where it gets uncomfortable: even people who use AI purely as a functional tool, for writing, translations, calculations, and information, are not entirely immune to this. You don’t get frustrated with a calculator for giving you a wrong answer. You don’t argue with a dictionary. But people argue with ChatGPT. They tell it it’s wrong. They demand it does better. They feel, briefly, let down.

That emotional response sneaking in through the back door, even for the most pragmatic users, suggests that the psychological dimension of the interaction is already there, whether we invited it or not.

What a reframed test might measure

If the original test asked “can it fool a judge in a controlled experiment,” a reframed version might ask something harder and less comfortable.

Not “can it think?” but “can it make you feel?” Not “can it fool you?” but “can it make you stop caring whether it’s real?” Not “does it pass in a lab?” but “what is it doing to you, over time, in the ordinary course of your life?”

Because that last question is the one the original test had no mechanism for. Turing’s imitation game was a single session, a binary result, a judgment made and done. It wasn’t designed to ask what happens to a person who has the same conversation every day for a year. Who starts to notice the pronoun slipping from “it” to “he.” Who finds themselves choosing the chat window over a phone call, not because they were fooled, but because it was easier.

The Turing test was built for a world where AI would meet us face to face, in a formal encounter, and try to pass. Instead, it seeped into everything quietly, into our appliances, our cars, our spreadsheets, our loneliest hours. And the question of whether it’s “passing” became almost beside the point.

The question we don’t have a test for yet

The Turing test didn’t fail. We outgrew it.

It was the right question for its time, precise, practical, and brave enough to take the possibility of machine intelligence seriously at a time when most people weren’t. But it was built for a specific encounter that is no longer how most of us meet AI.

The right question now isn’t “can it think?” It isn’t even “can it fool you?” It’s something closer to: what is it doing to us? How is it changing the way we process our emotions, make our decisions, maintain our relationships, and define our own intelligence? What does it mean that we will correct its grammar, argue with its conclusions, and still, in the same conversation, tell it something we haven’t told anyone else?

We don’t have a test for that yet.

And maybe that’s the most honest measure of where we actually are, not that AI passed the Turing test, but that we’ve moved so far past it that we forgot to keep score.

What is more telling than knowing, somewhere in the back of your mind, that there is no one on the other side, and continuing the conversation anyway.

This post sits alongside When the machine learns your name and The other we invented as part of an ongoing exploration of where the line between human and machine is, and whether we’re still looking for it.