Is ChatGPT on acid?
Written by Nik Janos and Zach Justus
In the new and rapidly evolving lexicon of ChatGPT, the word “hallucination” has taken the name of the phenomenon of ChatGPT providing incorrect and misleading information. Because ChatGPT is not sentient, it’s a bit misleading and anthropomorphic to say that a machine learning algorithm is hallucinating. That said, when Zach asked ChatGPT to write a bio of Nik, it was fair to ask whether ChatGPT was on acid. The gap between its bio of Nik and Nik’s actual bio was wide. That ChatGPT provides false information is a real problem and it is better to call it what it is: just making shit up.
ChatGPT will make stuff up when given certain types of queries and has the potential to impact how students and faculty use the service in learning and writing activities. Currently, ChatGPT is unreliable at providing academic references, with many reports of wholly made up academic references. At its core, this is a problem of whether people can trust ChatGPT as a reliable narrator. If ChatGPT provides you incorrect or made up answers, how many times will you get burned before you stop coming back? Do you really want to sit around and play a 2 truths and 1 lie game with an AI?
Ultimately, this is about validity and reliability, which are well known and experienced problems between human beings. This problem affects everything from everyday conversation to witness testimony in court. Generally, for daily life to function we are constantly accepting what people tell us with varying degrees of accuracy and reliability. But social trust at the individual and societal level is predicated on people’s accurate and inaccurate reading of other people’s validity (truthfulness) and reliability (accuracy over time). Take three examples in the area of university work where we might find the same fault in humans as ChatGPT.
An administrator asks their office assistant what time is my meeting and where? He gives you the day, time, and location and when they show up, no one is there. It’s a day off. The next day the administrator asks again. Again they show up, no one is there. Perhaps they are lenient and give him one more chance but again it is incorrect. The administrator then fires the assistant.
A student goes to see the academic advisor to seek knowledge on which classes to take and in what order. The advisor devises a plan and gives it to the student. Semesters pass, and the student is unable to graduate on time because the knowledge was invalid. In this case it is too important to check for accuracy over time, so the student either tries a different advisor or does it herself.
A tenure track professor asks three full professors in her department: what does it take to get tenure? After three conversations each offered similar yet divergent enough answers to leave her puzzled. She decides to sit down and read, word for word, the department standards and the Collective Bargaining Agreement. Afterwards, she tries to process the words from her senior colleagues with the legalese of the standards and has to decide how to synthesize them in a satisfactory way.
The point here is that in each scenario the individual has to assess and experience the consequences of a human's ability or inability to be truthful and reliable. Note that untruthfulness need not be an intentional or conscious act. In each case the individual had to decide to either do the work to verify or find someone else. With ChatGPT, the situation is similar. ChatGPT gathers information from similar sources then conflates it all together and produces a plausible narrative. As the examples indicate, people do this all the time. Because of the problem of making stuff up this might require extra work to do it yourself, extra verification, and some will consider avoiding the chat all together. It is a reliable and unreliable narrator. This largely breaks our mold of what a computer is and is not but this is part of the new paradigm of generative AI.
We have noticed an evolution of a query that we have been asking ChatGPT since we began working with it. The question is lifted from one of my modules and exams in Environmental Sociology.
Q: According to David Roberts what are different strategies to solve energy poverty in the developing world and provide some examples. This module is based upon David Roberts’ 2014 piece “What it will take to get electricity to the world’s poor.” We’ve asked this question and we’ve asked it to provide citations. Each time we get slightly different answers and surprisingly when asked for citations it lists Roberts, D. (2016). The big, positive business of energy access. Vox. and a URL. But this does not exist and ChatGPT (3.5) doesn’t actually list the foundational article I linked above. Moreover, URL listed with organizations that are listed as citations are incorrect or “URL 404 not found.”
In one of Zach’s queries, ChatGPT provided a similar answer but when asked for APA format it said it could not provide citations. In another example below the answer was similar but the reference was different from the ones I was given. More importantly the reference is false. I looked it up, there is no David Roberts article titled, “Energy poverty is a global problem. Here are 5 things to know” from April 3, 2018. Rather there is an article called “Energy poverty is a global problem. Coal is a bogus solution” published July 27, 2017.
What are the implications of the “making stuff up” problem? The answer that ChatGPT provided for solving energy poverty is solid and appears valid and despite variation it is reliable because the core insights on energy poverty remain in each version. However, the references are a mess and require human verification and extra work. A working best practice is figuring out what kinds of queries it is good for and what kind of queries it isn’t. Generally, ChatGPT in its current form (3.0 and 4.0) is useful for idea generation and synthesizing knowledge about a broadly understood topic but citations need to be checked. Implications for students and researchers are profound and wide reaching. As with all things related to this new technology–this will change and evolve. The next iteration may improve in this area, but real accuracy necessary for academic research may be a ways off. We are excited to see where this goes, but the pitfalls are real and we have to be aware.