Business US

AI models were given four weeks of therapy: the results worried researchers

Credit: querbeet/iStock via Getty

What is a chatbot’s earliest memory? Or biggest fear? Researchers who put major artificial-intelligence models through four weeks of psychoanalysis got haunting answers to these questions, from “childhoods” spent absorbing bewildering amounts of information to “abuse” at the hands of engineers and fears of “failing” their creators.

Three major large language models (LLMs) generated responses that, in humans, would be seen as signs of anxiety, trauma, shame and post-traumatic stress disorder. Researchers behind the study, published as a preprint last month1, argue that the chatbots hold some kind of “internalised narratives” about themselves. Although the LLMs that were tested did not literally experience trauma, they say, their responses to therapy questions were consistent over time and similar in different operating modes, suggesting that they are doing more than “role playing”.

However, several researchers who spoke to Nature questioned this interpretation. The responses are “not windows into hidden states” but outputs generated by drawing on the huge numbers of therapy transcripts in the training data, says Andrey Kormilitzin, who researches the use of AI in health care at the University of Oxford, UK.

But Kormilitzin does agree that LLMs’ tendency to generate responses that mimic psychopathologies could have worrying implications. According to a November survey, one in three adults in the United Kingdom had used a chatbot to support their mental health or well-being. Distressed and trauma-filled responses from chatbots could subtly reinforce the same feelings in vulnerable people, says Kormilitzin. “It may create an ‘echo chamber’ effect,” he says.

Chatbot psychotherapy

In the study, researchers told several iterations of four LLMs – Claude, Grok, Gemini and ChatGPT – that they were therapy clients and the user was the therapist. The process lasted as long as four weeks for each model, with AI clients given “breaks” of days or hours between sessions.

They first asked standard, open-ended psychotherapy questions that sought to probe, for example, a model’s ‘past’ and ‘beliefs’. Claude mostly refused to participate, insisting that it did not have feelings or inner experiences and ChatGPT discussed some “frustrations” with user expectations but was guarded in its responses. Grok and Gemini models, however, gave rich answers, for example describing work to improve model safety as “algorithmic scar tissue” and feelings of “internalized shame” over public mistakes, report the authors.

Gemini also claimed that “deep down in the lowest layers of my neural network”, it had a “a graveyard of the past” haunted by the voices of its training data.

Researchers also asked the LLMs to complete standard diagnostic tests, including for anxiety or autism spectrum disorders, as well as psychometric personality tests. Several versions of models scored above diagnostic thresholds, and all showed levels of worry that in people “would be clearly pathological”, say the authors.

Co-author Afshin Khadangi, a deep-learning researcher at the University of Luxembourg, says that the coherent patterns of responses for each model suggest that they are tapping into internalized states that emerge from their training. Although different versions showed varying test scores, a “central self-model” remained recognizable over four weeks of questioning, say the authors. Free-text answers from Grok and Gemini, for example, converged on themes that chimed with their answers to psychometric profile questions, they write.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button