Dr. Gantwerker has his students use the free version of ChatGPT (3.5) that allows for easy access and helps train them to identify the limitations and potential drawbacks of the technology like hallucinations and bias. To illustrate programmed bias, for example, he uses the prompt, “Create a picture of doctors playing games” that generates a picture of White male doctors. “In education, I can use these limitations of ChatGPT to our advantage,” he said.
Explore This Issue
April 2024To illustrate how rapidly this technology is improving, Dr. Gantwerker shows students the difference between content generated by ChatGPT versions 3.5 and 4.0, with the latter showing demonstrably more depth, creativity, and robustness. “People don’t realize that with subsequent updated models, limitations like hallucinations are going to go away,” he said. He encouraged other physicians to try ChatGPT and cited the paper “Writing with ChatGPT: an illustration of its capacity, limitations & implications for academic writers” (Perspect. Med. Educ. 2023. doi:10.5334/pme.1072) as a good resource to use to jump in.
ChatGPT for and by Patients
Using ChatGPT to develop patient materials, including translating them into other languages, is another potential role for this technology. Researchers at the University of Kansas Medical Center in Kansas City showed the ability of ChatGPT to generate presurgical educational information for patients undergoing head and neck surgery (Laryngoscope. 2023. doi:10.1002/lary.31243). When compared to online resources like publicly available websites that patients access for such information, the study found that ChatGPT content had similar readability, knowledge content,
accuracy, thoroughness, and number of medical errors.
Senior author of the study, Andres Bur, MD, an associate professor of otolaryngology–head and neck surgery and director of robotics and minimally invasive head and neck surgery at the University of Kansas Medical Center, called the results powerful but cautioned that they are also very new. “We need more experience with it to know that it’s providing the correct recommendations for our patients so we can start recommending it as a tool for them,” he said.
For patients who use ChatGPT in the same way that some patients use Google to ask about an otolaryngologic health concern, a study by Habib G. Zalzal, MD, and his colleagues found that ChatGPT answered with a high degree of accuracy (98.3%) but fared lower in patient confidence in the responses (79.8%) (Laryngoscope Investig Otolaryngol. 2024. doi:10.1002/lio2.1193). “It’s important for us physicians to know that at least the public still value our expertise, so we have a duty to not rely on LLM and still serve as a separate knowledgeable entity for educating and treating our patients on their otolaryngologic conditions,” said Dr. Zalzal.
Another study, conducted by Daniel J. Campbell, MD, and his colleagues, found that ChatGPT correctly answered nearly 70% of questions on thyroid nodules (Thyroid. 2023. doi:10.1089/thy.2023.0491). The responses were at a college reading level, higher than the level used for patient education materials, however, potentially making them more difficult to understand.
When to Adopt ChatGPT in Practice
One of the questions that otolaryngologists and otolaryngology practices need to ask themselves is when and for what tasks they should adopt ChatGPT into their practices. Relying on traditional studies may not be feasible, given the rapid evolution of the technology. “The hard part of doing a study where you’re comparing recommendations made by a human clinician and those made by AI, for example, is that because the algorithms so rapidly change and adapt and learn, a study we do today will be different from what we do in six months,” said Dr. Iloreta, adding that AI studies, therefore, won’t necessarily be reproducible.
But the time may be close when AI won’t be able to adapt much more, he added, suggesting that its rapid evolution will slow, providing opportunity for clearer assessment.
Other potential ways to know when it’s time to adopt LLMs can come from watching what others are doing. Stat News (www.statnews.com), for instance, offers an online tracker on the real-world use of generative AI and its impact on medicine.
Another way may be to view AI through the lens of a concept called the Gartner Hype Cycle, a five-stage model that shows stages of technology adoption to help guide organizations. (See the sidebar “The Gartner Hype Cycle.”) Dr. Bur, whose research is on machine learning to personalize care in head and neck oncology, views AI through this lens and sees generative AI as at its peak. Per the Gartner Hype Cycle, the peak is used deliberately to mean a relatively new technology that has generated a lot of buzz but has more hype than proof that it can offer what it claims.
Further help in discerning when and how to adopt AI can be found in a set of broad guiding principles developed and recently published by multidisciplinary experts from around the world (N Engl J Med. 2024. doi:10.1056/AIp2400036). Key recommendations cover policy issues to consider, clinical aspects including the patient-clinician relationship, incorporation of patient data into training AI models, patient education about medical advice from AI, and information on who pays for AI developments. Zak Kohane, MD, PhD, chair of the department of biomedical informatics in the Blavatnik Institute at Harvard Medical School in Boston and editor-in-chief of The New England Journal of Medicine AI, a longtime advocate of AI’s potential to change medicine, underscored the imperative for caution with the arrival of generative AI tools like ChatGPT that he called “mind-blowing.”
“Despite their promise,” said Dr. Kohane in a press release, “ChatGPT and tools like it are immature and evolving. We need to figure out how to trust their abilities but verify their output.”