December 19, 2024
Can Large Language Models — those intelligent chatbots that produce human-like answers to our prompts — influence our opinions?
An experiment described in the magazine IEEE Intelligent Systems suggests that the answer is yes. The implications of that research have ramifications for teachers grading papers, employee evaluations and many other situations that could affect our lives.
The Research
The study’s design focuses on the differing opinions offered by two prominent LLMs. Each of these LLMs was tasked with evaluating two different patent abstracts on a scale from 1 to 10, focusing on qualities such as feasibility and disruptiveness.
The study’s author provided the patent abstracts and the LLM-generated scores to different groups of graduate students. Each group saw only one rating — either the higher or the lower one. Unaware of what other groups had been given, the students were then asked to rate the patent abstracts themselves.
Groups that saw a higher LLM rating (like a “9”) gave higher evaluations than groups that saw a lower rating (like a “4”). However, they did not just copy the scores. Instead, those shown a “9” gave an average rating of about 7.5, while those shown a “4” gave an average rating slightly above 5. This suggests that although the LLM’s rating influenced them, the participants still made their own judgments.
“The experiment results suggest that AI tools can affect decision-making tasks, like when teachers grade student research papers or when enterprises evaluate employees, products, software or other intellectual objects,” said IEEE Senior Member Ayesha Iqbal. “If different AI tools give different ratings, and people depend on them, people can give different ratings to the same idea. That raises an important question: Do we want to be biased toward its recommendations?”
When Should We Use AI to Help Form Judgments?
It’s fairly common for professionals to use LLMs to assist with the first draft of tasks like grading papers or evaluating projects. Professionals may not use the LLMs output as a final product, but they provide a useful and time-saving starting point. Given the anchoring effect described in the study, is it a good idea?
The research suggests that, like people, LLMs offer reasons for or against certain ideas. Relying on an LLM might be akin to collaborating with a peer. At the same time, the LLMs tended to have features that could make them more or less useful. Some LLMs tend to be more optimistic and offer longer answers. Others can be more pessimistic and offer shorter answers.
The study’s author indicates that educators might use just one LLM when doing something like grading papers to maintain consistency but might use multiple LLMs for more complex tasks, like evaluating projects in business.
“It is important to establish boundaries and limitations for AI use in our personal and professional lives,” Iqbal said. “We need to determine when and where AI technology is appropriate and beneficial and identify situations where human judgment and intervention are necessary. Overreliance on AI can be avoided by maintaining control over technology usage and decision-making processes.”