Explore This Issue
December 2025About a year ago, I was caring for a patient with a difficult hospital course: fibula free flap reconstruction for a mandibular osteoradionecrosis defect, poor wound healing, and tracheostomy. I visited him to perform an ultrasound study of his microvasculature as part of a novel research study and felt encouraged as I watched the velocity–time curves of his anastomosed vessels and the excellent perfusion of the skin paddle. He was trying to communicate by occluding his tracheostomy, without success. Frustrated, he wrote on his whiteboard: “Why can’t AI solve this?
To me, rebuilding a functional and symmetric mandible from a fibula is one of the most striking examples of surgical ingenuity and master technique. To him, the hospitalization was a reminder of what modern surgery could not restore. In his eyes, the communication barrier underscored a gap: Could our technology meet a human need at the bedside?
The term artificial intelligence itself has become omnipresent during my years in residency. I first heard about ChatGPT during my internship when colleagues were using it to summarize scientific papers for quick interpretation. Since then, nearly everyone I know has folded some form of AI into their daily routine. Even patients often mention it in clinic, as an accessible way to research suspected conditions or ask post-operative questions. WebMD and Dr. Google have morphed into ChatGPT.
Clinician caution around AI usually reflects missing validation and oversight, not necessarily aversion to technology. To capture the skepticism AI sometimes meets, consider this perspective from Eric Gantwerker, MD, a pediatric otolaryngologist at Cohen Children’s Medical Center at Northwell Health: “The AI you use today is the worst AI you will ever use—it gets better by the day, and hallucinations and fake references are much improved from even six months ago. People who avoid AI have often not tried the right tools or adapted them to the proper use case for their workflow.”
From my background in computer science, I realize that these systems generate probabilistic outputs from patterns in data, and that can be useful only when performance is proven in our specific patient populations. In medicine, where decisions must be reproducible and accountable, AI outputs need to be externally validated.
However, even if many models are not fully interpretable, that is acceptable if we have transparent performance metrics, documented failure modes, and clinician oversight. The need for proof is echoed by Ian McCulloh, PhD, Johns Hopkins computer science professor, who frames adoption in concrete terms: “Perhaps the biggest barrier to clinical adoption is the uncertainty over whether AI actually improves patient outcomes or is just a trend. If we expect AI to influence care, measure it like any intervention: prospective trials when feasible, clear benchmarks, and routine monitoring.” He points to a simple “Byrne Test”: When stakes and feasibility warrant, run a randomized trial and treat AI like any other clinical intervention, “the same way we test pharmaceuticals and devices.”

Leave a Reply