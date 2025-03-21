



Credit: Unsplash/CC0 Public Domain

When Adam Rodman was a second-year medical student in the 2000s, he visited the library for a patient whose illness had left doctors stumped. Rodman searched the catalog, copied research papers, and shared them with the team. “It made a big difference in that patient’s care,” Rodman said. “Everyone said, “This is so great. This is evidence-based medicine.” But it took two hours. I can do that today in 15 seconds.” Rodman, now an assistant professor at Harvard Medical School and a doctor at Beth Israel Deaconess Medical Center, these days carries a medical library in his pocket—a smartphone app created after the release of the large language model ChatGPT in 2022. OpenEvidence—developed in part by the Medical School faculty—allows him to query specific diseases and symptoms. It searches the medical literature, drafts a summary of findings, and lists the most important sources for further reading, providing answers while Rodman is still face-to-face with his patient. Artificial intelligence in various forms has been used in medicine for decades—but not like this. Experts predict that the adoption of large language models will reshape medicine. Some compare the potential impact with the decoding of the human genome, even the rise of the internet. The impact is expected to show up in doctor-patient interactions, physicians’ paperwork load, hospital and physician practice administration, medical research, and medical education. Most of these effects are likely to be positive: increasing efficiency, reducing mistakes, easing the nationwide crunch in primary care, bringing data to bear more fully on decision-making, reducing administrative burdens, and creating space for longer, deeper person-to-person interactions. But there are serious concerns, too. Current data sets too often reflect societal biases that reinforce gaps in access and quality of care for disadvantaged groups. Without correction, these data have the potential to cement existing biases into ever-more-powerful AI that will increasingly influence how health care operates. Another important issue, experts say, is that AIs remain prone to “hallucination,” making up “facts” and presenting them as if they are real. Then there’s the danger that medicine won’t be bold enough. The latest AI has the potential to remake health care top to bottom, but only if given a chance. The wrong priorities—too much deference to entrenched interests, a focus on money instead of health—could easily reduce the AI “revolution” to an underwhelming exercise in tinkering around the edges. “I think we’re in this weird space,” Rodman said. “We say, ‘Wow, the technology is really powerful.’ But what do we do with it to actually change things? My worry, as both a clinician and a researcher, is that if we don’t think big, if we don’t try to rethink how we’ve organized medicine, things might not change that much.” Shoring up the ‘tottering edifice’ Five years ago, when asked about AI in health care, Isaac Kohane responded with frustration. Teenagers tapping away on social media apps were better equipped than many doctors. The situation today couldn’t be more different, he says. Kohane, chair of the Medical School’s Department of Biomedical Informatics and editor-in-chief of the New England Journal of Medicine‘s new AI initiative, describes the abilities of the latest models as “mind-boggling.” To illustrate the point, he recalled getting an early look at OpenAI’s GPT-4. He tested it with a complex case—a child born with ambiguous genitalia—that might have stymied even an experienced endocrinologist. Kohane asked GPT-4 about genetic causes, biochemical pathways, next steps in the workup, even what to tell the child’s parents. It aced the test. “This large language model was not trained to be a doctor; it’s just trained to predict the next word,” Kohane said. “It could speak as coherently about wine pairings with a vegetarian menu as diagnose a complex patient. It was truly a quantum leap from anything that anybody in computer science who was honest with themselves would have predicted in the next 10 years.” And none too soon. The U.S. health care system, long criticized as costly, inefficient, and inordinately focused on treatment over prevention, has been showing cracks. Kohane, recalling a faculty member new to the department who couldn’t find a primary care physician, is tired of seeing them up close. “The medical system, which I have long said is broken, is broken in extremely obvious ways in Boston,” he said. “People worry about equity problems with AI. I’m here to say we have a huge equity problem today. Unless you’re well connected and are willing to pay literally thousands of extra dollars for concierge care, you’re going to have trouble finding a timely primary care visit.” Early worries that AI would replace physicians have yielded to the realization that the system needs both AI and its human workforce, Kohane said. Teaming nurse practitioners and physician assistants with AI is one among several promising scenarios. “It is no longer a conversation about, ‘Will AI replace doctors,’ so much as, ‘Will AI, with a set of clinicians who may not look like the clinicians that we’re used to, firm up the tottering edifice that is organized medicine?'” Building the optimal assistant How LLMs were rolled out—to everyone at once—accelerated their adoption, Kohane says. Doctors immediately experimented with eye-glazing yet essential tasks, like writing prior authorization requests to insurers explaining the necessity of specific, usually expensive, treatments. “People just did it,” Kohane said. “Doctors were tweeting back and forth about all the time they were saving.” Patients did it too, seeking virtual second opinions, like the child whose recurring pain was misdiagnosed by 17 doctors over three years. In the widely publicized case, the boy’s mother entered his medical notes into ChatGPT, which suggested a condition no doctor had mentioned: tethered cord syndrome, in which the spinal cord binds inside of the backbone. When the patient moves, rather than sliding smoothly, the spinal cord stretches, causing pain. The diagnosis was confirmed by a neurosurgeon, who then corrected the anatomic anomaly. One of the perceived benefits of employing AI in the clinic, of course, is to make doctors better the first time around. Greater, faster access to case histories, suggested diagnoses, and other data is expected to improve physician performance. But plenty of work remains, a recent study shows. Research published in JAMA Network Open in October compared diagnoses delivered by an individual doctor, a doctor using an LLM diagnostic tool, and an LLM alone. The results were surprising, showing an insignificant improvement in accuracy for the physicians using the LLM—76% versus 74% for the solitary physician. More surprisingly, the LLM by itself did best, scoring 16 percentage points higher than physicians alone. Rodman, one of the paper’s senior authors, said it’s tempting to conclude that LLMs aren’t that helpful for doctors, but he insisted that it’s important to look deeper at the findings. Only 10% of the physicians, he said, were experienced LLM users before the study—which took place in 2023— and the rest received only basic training. Consequently, when Rodman later looked at the transcripts, most used the LLMs for basic fact retrieval. “The best way a doctor could use it now is for a second opinion, to second-guess themselves when they have a tricky case,” he said. “How could I be wrong? What am I missing? What other questions should I ask? Those are the ways we know from psychological literature that complement how humans think.” Among the other potential benefits of AI is the chance to make medicine safer, according to David Bates, co-director of the Center for Artificial Intelligence and Bioinformatics Learning Systems at Mass General Brigham. A recent study by Bates and colleagues showed that as many as one in four visits to Massachusetts hospitals results in some kind of patient harm. Many of those incidents trace back to adverse drug events. “AI should be able to look for medication-related issues and identify them much more accurately than we’re able to do right now,” said Bates, who is also a professor of medicine at the Medical School and of health policy and management at the Harvard T.H. Chan School of Public Health. Another opportunity stems from AI’s growing competence in a mundane area: note-taking and summarization, according to Bernard Chang, dean for medical education at the Medical School. Systems for “ambient documentation” will soon be able to listen in on patient visits, record everything that is said and done, and generate an organized clinical note in real time. When symptoms are discussed, the AI can suggest diagnoses and courses of treatment. Later, the physician can review the summary for accuracy. Automation of notes and summaries would benefit health care workers in more than one way, Chang said. It would ease doctors’ paperwork load, often cited as a cause of burnout, and it would reset the doctor-patient relationship. One of patients’ biggest complaints about office visits is the physician sitting at the computer, asking questions and recording the answers. Freed from the note-taking process, doctors could sit face-to-face with patients, opening a path to stronger connections. “It’s not the most magical use of AI,” Chang said. “We’ve all seen AI do something and said, ‘Wow, that’s amazing.’ This is not one of those things. But this program is being piloted at different ambulatory practices across the country and the early results are very promising. Physicians who feel overburdened and burnt out are starting to say, ‘You know what, this tool is going to help me.'” The bias threat For all their power, LLMs are not ready to be left alone. “The technology is not good enough to have that safety level where you don’t need a knowledgeable human,” Rodman said. “I can understand where it might have gone aground. I can take a step further with the diagnosis. I can do that because I learned the hard way. In residency you make a ton of mistakes, but you learn from those mistakes. “Our current system is incredibly suboptimal but it does train your brain. When people in medical school interact with things that can automate those processes—even if they’re, on average, better than humans—how are they going to learn?” Doctors and scientists also worry about bad information. Pervasive data bias stems from biomedicine’s roots in wealthy Western nations whose science was shaped by white men studying white men, says Leo Celi, an associate professor of medicine and a physician in the Division of Pulmonary, Critical Care and Sleep Medicine at Beth Israel Deaconess Medical Center. “You need to understand the data before you can build artificial intelligence,” Celi said. “That gives us a new perspective of the design flaws of legacy systems for health care delivery, legacy systems for medical education. It becomes clear that the status quo is so bad—we knew it was bad and we’ve come to accept that it is a broken system—that all the promises of AI are going bust unless we recode the world itself.” Celi cited research on disparities in care between English-speaking and non-English-speaking patients hospitalized with diabetes. Non-English speakers are woken up less frequently for blood sugar checks, raising the likelihood that changes will be missed. That impact is hidden, however, because the data isn’t obviously biased, only incomplete, even though it still contributes to a disparity in care. “They have one or two blood-sugar checks compared to 10 if you speak English well,” he said. “If you average it, the computers don’t see that this is a data imbalance. There’s so much missing context that experts may not be aware of what we call ‘data artifacts.’ This arises from a social patterning of the data generation process.” Bates offered additional examples, including a skin cancer device that does a poor job detecting cancer on highly pigmented skin and a scheduling algorithm that wrongly predicted Black patients would have higher no-show rates, leading to overbooking and longer wait times. “Most clinicians are not aware that every medical device that we have is, to a certain degree, biased,” Celi said. “They don’t work well across all groups because we prototype them and we optimize them on, typically, college-aged, white, male students. They were not optimized for an ICU patient who is 80 years old and has all these comorbidities, so why is there an expectation that the numbers they represent are objective ground truths?” The exposure of deep biases in legacy systems presents an opportunity to get things right, Celi said. Accordingly, more researchers are pushing to ensure that clinical trials enroll diverse populations from geographically diverse locations. One example is Beth Israel’s MIMIC database, which reflects the hospital’s diverse patient population. The tool, overseen by Celi, offers investigators de-identified electronic medical records—notes, images, test results—in an open-source format. It has been used in 10,000 studies by researchers all around the world and is set to expand to 14 additional hospitals, he said. Age of agility As in the clinic, AI models used in the lab aren’t perfect, but they are opening pathways that hold promise to greatly accelerate scientific progress. “They provide instant insights at the atomic scale for some molecules that are still not accessible experimentally or that would take a tremendous amount of time and effort to generate,” said Marinka Zitnik, an associate professor of biomedical informatics at the Medical School. “These models provide in-silico predictions that are accurate, that scientists can then build upon and leverage in their scientific work. That, to me, just hints at this incredible moment that we are in.” Zitnik’s lab recently introduced Procyon, an AI model aimed at closing knowledge gaps around protein structures and their biological roles. Until recently, it has been difficult for scientists to understand a protein’s shape—how the long molecules fold and twist onto themselves in three dimensions. This is important because the twists and turns expose portions of the molecule and hide others, making those sites easier or harder for other molecules to interact with, which affects the molecule’s chemical properties. Today, predicting a protein’s shape—down to nearly every atom—from its known sequence of amino acids is feasible, Zitnik said. The major challenge is linking those structures to their functions and phenotypes across various biological settings and diseases. About 20% of human proteins have poorly defined functions, and an overwhelming share of research—95%—is devoted to just 5,000 well-studied proteins. “We are addressing this gap by connecting molecular sequences and structures with functional annotations to predict protein phenotypes, helping move the field closer to being able to in-silico predict functions for each protein,” Zitnik said. A long-term goal for AI in the lab is the development of “AI scientists” that function as research assistants, with access to the entire body of scientific literature, the ability to integrate that knowledge with experimental results, and the capacity to suggest next steps. These systems could evolve into true collaborators, Zitnik said, noting that some models have already generated simple hypotheses. Her lab used Procyon, for example, to identify domains in the maltase glucoamylase protein that bind miglitol, a drug used to treat type 2 diabetes. In another project, the team showed that Procyon could functionally annotate poorly characterized proteins implicated in Parkinson’s disease. The tool’s broad range of capabilities is possible because it was trained on massive experimental data sets and the entire scientific literature, resources far exceeding what humans can read and analyze, Zitnik said. The classroom comes before the lab, and the AI dynamic of flexibility, innovation, and constant learning is also being applied to education. The Medical School has introduced a course dealing with AI in health care; added a Ph.D. track on AI in medicine; is planning a “tutor bot” to provide supplemental material beyond lectures; and is developing a virtual patient on which students can practice before their first nerve-wracking encounter with the real thing. Meanwhile, Rodman is leading a steering group on the use of generative AI in medical education. These initiatives are a good start, he said. Still, the rapid evolution of AI technology makes it difficult to prepare students for careers that will span 30 years. “The Harvard view, which is my view as well, is that we can give people the basics, but we just have to encourage agility and prepare people for a future that changes rapidly,” Rodman said. “Probably the best thing we can do is prepare people to expect the unexpected.”

Harvard University





Provided byHarvard University This story is published courtesy of the Harvard Gazette, Harvard University’s official newspaper. For additional university news, visit Harvard.edu. Citation:

AI is up to the challenge of reducing human suffering, experts say. Are we? (2025, March 21)

retrieved 21 March 2025

from https://medicalxpress.com/news/2025-03-ai-human-experts.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no

part may be reproduced without the written permission. The content is provided for information purposes only.