🤖 AI isn’t ‘hallucinating.’ We are.

Charley Johnson
Jun 4, 2023
6 min read

Why 'hallucination' is a problematic term (hint - it's not just because it anthropomorphizes the technology!) and what to do about it.

If you’ve read an article about ChatGPT of late, you might have noticed something odd: the word ‘hallucinate’ is everywhere. The origin of the word is (h)allucinari, to wander in mind, and Dictionary.com defines it this way: “a sensory experience of something that does not exist outside the mind.” Now, ChatGPT doesn’t have a mind, so to say it ‘hallucinates’ is anthropomorphizing the technology, which as I’ve written before, is a big problem.

‘Hallucinate’ is the wrong word for another important reason: it implies an aberration; a mistake of some kind, as if it isn’t supposed to make things up. But that’s actually exactly what generative models do — given a bunch of words, the model probabilistically makes up the next word in that sequence. Presuming that AI models are making a mistake when they’re actually doing what they’re supposed to do has profound implications for how we think about accountability for harm in this context. Let’s dig in.

In March, tech journalist Casey Newton asked Google’s Bard to give him some fun facts about the gay rights movement. Bard responded in part by saying that the first openly gay person elected to the presidency of the United States was Pete Buttigieg in 2020. Congratulations, Pete! This response was referred to by many as a ‘hallucination’ — as if the response wasn’t justified by its training data. But since Bard was largely trained on data from the internet, it likely includes a lot of sequences where the words “gay,” “president,” “United States,” “2020,” and “Pete Buttigieg” are close to one another. So on some level, claiming that Buttigieg was the first openly gay president isn’t all that surprising — it is a plausible response from a probabilistic model.

Now, this example didn’t lead to real-world harm, but who or what should be held accountable when it does? Helen Nissenbaum, a professor of information sciences, explains that we’re quick to “blame the computer” because we anthropomorphize it in ways we wouldn’t with other inanimate objects. Nissenbaum was writing in 1995 about clunky computers, and this problem has become much much worse in the intervening years. As Nissenbaum wrote then, “Here, the computer serves as a stopgap for something elusive, the one who is, or should be, accountable.” Today, the notion that AI is hallucinating serves as such a stopgap.

Paradoxically, users or operators of the technology often absorb a disproportionate amount of blame. This is what Madeleine Clare Elish, a cultural anthropologist, calls a “moral crumple zone” wherein “responsibility for an action may be misattributed to a human actor who had limited control over the behavior of an automated or autonomous system.” Traditionally, a ‘crumple zone’ is the part of the car designed to absorb the brunt of a crash in order to protect the driver. Elish argues that historically “a moral crumple zone” has protected the technological system at the expense of the human user or operator. Remember when New York Times journalist Kevin Roose went viral for a very weird and unsettling back-and-forth with Sydney, Bing’s new chatbot? The back-and-forth ended with the chatbot proclaiming its love for Roose. In the aftermath, many commentators argued that Roose pushed Sydney too far; that he was to blame for how the chatbot responded.

The group often conspicuously left out of this discussion of potential blame are those building and making key decisions about the model. Engineers, AI researchers, developers, corporate officers, etc. have historically avoided blame for a few reasons. The first reason is the weird idea that we’ve come to accept errors or bugs in code as normal. Nissenbaum, calls this “the problem of bugs” and shows how it leads to an obvious problem: if imperfections are perceived as inevitable, then we can’t hold those designing the system accountable. But that doesn’t hold up — in a number of industries like car manufacturing and planes, where the cost of an error are very high, we’ve proven this idea mostly wrong.

Then there is “the problem of many hands” that Nissenbaum describes, which is the notion that in modern organizational arrangements, rarely does blame for a decision lie with one person. There are lots of cooks in the proverbial decision-making kitchen, each with varying degrees of authority and power. Also, the ‘kitchen’ has become more complex since Nissenbaum’s writing, as machine learning processes introduce more dynamic steps and different stakeholders. In any case, this effect is compounded by the assumption that engineers can’t actually explain what’s happening inside the model. If they can’t explain why specific inputs combined with model decisions contribute to certain outputs, then how can we blame them? That’s not quite the right question, though. One, it’s kind of insane not to assign blame because they don’t understand what’s happening — if anything, that sounds like all the more reason to dole out a li’l accountability, or at least preemptive thresholds. I argued for this in “A critique of tech-criticism,” writing:

“The government has a long history of requiring companies and industries to meet a certain standard before launching a product. I wouldn’t drive a car if federal standards didn’t prevent serious injuries. Nor would I hop on a plane so often if we didn’t render crashes nearly obsolete […] So what to do? Well, the government could require that AI companies be able to explain how their model produced a result before releasing it. It’s not clear that interpretability is possible, but right now, we’re not even asking that companies try.”

Furthermore, “cause” is an especially high bar for blame. Joel Feinberg, a noted moral, social, and political philosopher described a set of conditions under which one would be considered “morally blameworthy” for a given harm even if they didn’t intend to cause the harm. Nissenbaum summarizes Feinberg’s clauses this way:

“We judge an action reckless if a person engages in it even though he foresees harm as its likely consequence but does nothing to prevent it; we judge it negligent, if he carelessly does not consider probable harmful consequences.”

In other words, an engineer might deserve blame and accountability — even if they didn’t mean to cause harm — if they were reckless or negligent in how they built the technology. For example, researchers have been documenting the harms of AI and LLMs for years. It’s simply no longer reasonable to say ‘ah, we didn’t see that harm coming, the machine must have ‘hallucinated.’’ That sounds pretty negligent to me. Moreover, in a totally insane 2022 survey, AI researchers were asked the question, “What probability do you put on human inability to control future advanced A.I. systems causing human extinction or similarly permanent and severe disempowerment of the human species?” The median reply was 10 percent. I personally find the question itself hyperbolic but yeah, it’s fair to say that AI researchers ‘foresee harm.’ So the question of recklessness comes down to what they’re doing to proactively prevent it.

Finally, and most importantly, the pursuit of explainability looks to the model itself for answers, when the model is entangled with both its users and creators. Purely from a technical perspective, we can only half explain why a model’s outputs are the way they are. As we saw with Bard’s alleged first gay president, probabilistic models will result in weird outputs that don’t technically exist in the training data, because they are just making up sentences based on what words are likely to follow the ones that came before it.

The other half of the explanation exists in how people, culture, norms, race, and gender, inform how the training data is constructed, and then more obviously, how the prompt is created. In some small way, Roose was to blame for Sydney’s response — his prompts were inputs to what the chatbot generated. So we’re left with a complex system with multiple inputs — engineers, users, and the technology — dynamically interacting, each adapting, updating, and changing their actions, decisions, and outputs in response to one another.

What can be done about all this?

We need to accept that the complexity of the engineer-user-tech interaction does not absolve everyone from responsibility for error and harm. As Nissenbaum writes, “Instead of identifying a single individual whose faulty actions caused the injuries, we find we must systematically unravel a messy web of interrelated causes and decisions.” Right, we can start to disentangle these systems and assign partial blame and accountability to the appropriate stakeholder. This will require a lot (!) more information about the models themselves and the decisions engineers are making. I’ve long had a complicated relationship with ‘transparency initiatives’ but much more of it will be required if we’re to move closer to accountability. It will also require banishing from our brains the cultural assumption that bugs are inevitable. But the first step along this path is to close the gap between our own expectations of what LLMs can do, and what they’re actually doing. They aren’t hallucinating — we are.

🤖 AI isn’t ‘hallucinating.’ We are.

Why 'hallucination' is a problematic term (hint - it's not just because it anthropomorphizes the technology!) and what to do about it.

What can be done about all this?

Recent Posts

Comentários

Let's Chat

Find me online