Generative AI, Like ChatGPT, Shows Great Promise and Risks

Melanie Padgett Powers
Search for other papers by Melanie Padgett Powers in
Current site
Google Scholar
Full access

In January 2023, ChatGPT seemed to pop up overnight. In staff meetings, casual conversations, and conference sessions, the artificial intelligence (AI) chatbot—a software application that mimics human conversation through text or voice interactions—was the hot, new topic in seemingly every industry. Although ChatGPT was actually released as a free prototype in November 2022 by the AI laboratory OpenAI, which created it, it took a few months to hit the mainstream. Reaction ranged from fascination and high hopes to fear that the “robots” were taking over.

ChatGPT is only one example of a generative AI tool, a type of AI that generates content, including images and text. ChatGPT is a type of large language model, which is a model trained on large quantities of text, and it can perform a variety of tasks. The November release last year was based on Generative Pre-trained Transformer (GPT)-3.5. In March 2023, OpenAI released GPT-4 as a paid version.

Generative AI tools carry a lot of promise in health care, potentially increasing efficiency and improving patient communication, but experts caution that these are very early days. ChatGPT and similar tools come with risks and warnings about inaccuracy and bias, among other areas. In addition, clinicians must be sure tools they are using adhere to the Health Insurance Portability and Accountability Act (HIPAA).

“There's a lot of promise for artificial intelligence and machine learning,” said nephrologist Girish N. Nadkarni, MD, MPH, director of The Charles Bronfman Institute for Personalized Medicine and system chief of the Division of Data-Driven and Digital Medicine (D3M) at the Icahn School of Medicine at Mount Sinai in New York. “One is the significant promise in helping to cut down the drudgery of medicine…. More specific to nephrology, if you think about patients on dialysis, there's a lot of documentation burden there.”

Generative AI use cases

Telehealth use and virtual communication with patients skyrocketed during the COVID-19 pandemic, a convenience and patient expectation that is not going away, said nephrologist Karandeep Singh, MD, MMSc, associate chief medical information officer of artificial intelligence at the University of Michigan Medical School. But the increased administrative burden can contribute to clinician burnout, he said. Could chatbots increase productivity, lessening that burden?

ChatGPT and similar tools are good at creating boilerplate text that is tailored to the circumstance or context, Singh said. “Maybe I can reply to double the number of messages if all I need to do is to say the short version of what I want to say and then have this tool turn that short version into something that's coherent and more palatable as a letter to a patient,” he suggested.

Chatbots will likely be integrated as part of electronic health record (EHR) systems. They could be instructed to listen in on an ambient conversation between a patient and clinician and write up a summary of the discussion. Or, a chatbot could draft a letter or follow-up instructions to a patient based on the information in their EHR. In fact, Epic and Microsoft are already piloting a program to draft responses to patients at three sites: University of California San Diego Health, Stanford Health Care in California, and University of Wisconsin Health, Madison (1). It is part of a larger partnership between the two companies to implement AI tools into Epic's EHR system.

Singh also envisions a nephrologist using an AI chatbot for diet management for a variety of patients. For example, if a clinician wants to create a meal plan for a patient who speaks Punjabi, incorporating Indian food that is low in phosphorus and potassium, it could take a lot of research time. But a generative AI tool, such as ChatGPT, could quickly create a sample meal plan based on the parameters that a clinician inputs. The clinician would then fact-check the results. “I think that's the sort of thing where you don't put patient information in there, but you can tailor something to that person in a way that would have been much harder before,” Singh said.

It could also be prompted to translate the text into another language and at a specific reading level, said Daniel Rizk, an MD/MS dual-degree student and mentee of Singh's in his machine learning lab. “I've tested [ChatGPT] with Spanish, and its translation is quite good, but further, you can say, ‘Write this at a sixth-grade level,'” Rizk explained. “You can tailor the language so that it can match the patient even a little bit better, which, I think, is just another component of saving time and meeting the patient where they are.”

In the area of practice management, these tools can be used to create physician schedules. “You can put in your set of constraints—like this person can't work on this day; this person can't work on that day—and come up with a schedule that works for everyone. You can get a reasonable first draft of the schedule,” Singh said. He also believes as the tools advance, software will be consolidated, which can save practices money. “Small practices that are used to paying for a lot of these tools separately are going to be thinking more of ‘What tools am I using that are specific to one task?’ and ‘Could I instead use something more general like a large language model to actually accomplish that task?’”

The use of AI is only going to increase, and it is critical that clinicians keep updated.

Other potential use cases include chatbots taking patients through a series of questions to help them schedule an appointment or generating summaries of an existing EHR for a clinician to review, said Lili Chan, MD, associate professor in the Division of Nephrology and D3M at the Icahn School of Medicine at Mount Sinai. As part of a research project, Chan is using AI—not specifically generative AI—to see if it can identify novel risk factors, such as social determinants of health (SDOH), from EHRs of dialysis patients. “The goal is to develop some way of increasing the recognition of these factors to the overall practice,” she said. “So maybe we could pull them from the EHR and do some form of a flag of the patient or reporting to the physician.” Her team is in the early stages, surveying patients about their SDOH, before testing the AI system.

Generative AI concerns

GPT-3.5 is free and open for anyone to try; however, Chan cautioned that many use cases in health care are “still quite a bit away” because there have been no firm validation studies. In particular, there needs to be research on the safety and efficacy of these tools for patient communication, she said. In addition, clinicians must not input any patient information into ChatGPT or similar tools, as they are not HIPAA compliant. Experts agree that clinicians should always fact-check any answers from generative AI. ChatGPT, in particular, is prone to “hallucinations,” in which it does not have the answer and can respond with false information. The answers can often sound and look accurate, such as fabricated journal citations or book titles and authors.

“The current crop of models is prone to something called hallucinations. So, they only know what's in their training set,” Nadkarni said. “They obviously can extrapolate a lot, but if there's not something in their training set and you ask [them] a question, because they want to generate a response, they make up stuff…. That might be fine in creative writing… but in medicine, that might be dangerous because [they] can make up a diagnosis.”

Another concern is bias. Generative AI is pulling information from the data sets it was given. It does not know how to discern truth. Instead, it is “predicting” which words come after one another based on the context. “A lot of these models can potentially be generated from data that [are] inherently biased,” Chan said. “And so, when they generate this information, it will also result in data that [are] biased. I think that is quite a large concern.”

Chan pointed to previous research that examined stigmatizing language in the EHR, showing that words such as “noncompliant” and “nonadherent” were used more often for Black patients. “If we're now using an existing EHR to describe a patient, then those biases or stigmatizing language in those notes are going to be perpetuated within that generated text from AI,” she explained.

Regulation needed

Despite all of the attention on AI, there are no federal regulations for the technology. At a Senate committee hearing in May 2023, some Democrats and Republicans said they support creating a federal agency to regulate AI. Even Samuel Altman, chief executive officer of OpenAI, told the committee that he supports regulations, according to news reports (2, 3). For now, Nadkarni said, “Health care [practitioners] and health systems need to have some sort of oversight, ethics, and governance to ensure they are being used safely and efficaciously and appropriately.”

Experts also agreed that health care clinicians need to become educated on how AI works, along with the benefits and risks. Even if clinicians do not seek out AI, it will be incorporated into future tools that they frequently use. Microsoft, for example, announced in March 2023 that it is incorporating its new AI tool, Copilot, into its 365 suite of programs (4). Furthermore, patients will be using AI and bringing information to their doctors, much like they use Google, Singh said.

The use of AI is only going to increase, and it is critical that clinicians keep updated, Rizk added. “There's this kind of wave of AI that's coming for the majority of professions in America and more globally,” he said, “and I think it's really important that doctors are stakeholders in this conversation and deployment.”