Generative Artificial Intelligence in Patient Education: ChatGPT Takes on Hypertension Questions

Introduction Uncontrolled hypertension significantly contributes to the development and deterioration of various medical conditions, such as myocardial infarction, chronic kidney disease, and cerebrovascular events. Despite being the most common preventable risk factor for all-cause mortality, only a fraction of affected individuals maintain their blood pressure in the desired range. In recent times, there has been a growing reliance on online platforms for medical information. While providing a convenient source of information, differentiating reliable from unreliable information can be daunting for the layperson, and false information can potentially hinder timely diagnosis and management of medical conditions. The surge in accessibility of generative artificial intelligence (GeAI) technology has led to increased use in obtaining health-related information. This has sparked debates among healthcare providers about the potential for misuse and misinformation while recognizing the role of GeAI in improving health literacy. This study aims to investigate the accuracy of AI-generated information specifically related to hypertension. Additionally, it seeks to explore the reproducibility of information provided by GeAI. Method A nonhuman-subject qualitative study was devised to evaluate the accuracy of information provided by ChatGPT regarding hypertension and its secondary complications. Frequently asked questions on hypertension were compiled by three study staff, internal medicine residents at an ACGME-accredited program, and then reviewed by a physician experienced in treating hypertension, resulting in a final set of 100 questions. Each question was posed to ChatGPT three times, once by each study staff, and the majority response was then assessed against the recommended guidelines. A board-certified internal medicine physician with over eight years of experience further reviewed the responses and categorized them into two classes based on their clinical appropriateness: appropriate (in line with clinical recommendations) and inappropriate (containing errors). Descriptive statistical analysis was employed to assess ChatGPT responses for accuracy and reproducibility. Result Initially, a pool of 130 questions was gathered, of which a final set of 100 questions was selected for the purpose of this study. When assessed against acceptable standard responses, ChatGPT responses were found to be appropriate in 92.5% of cases and inappropriate in 7.5%. Furthermore, ChatGPT had a reproducibility score of 93%, meaning that it could consistently reproduce answers that conveyed similar meanings across multiple runs. Conclusion ChatGPT showcased commendable accuracy in addressing commonly asked questions about hypertension. These results underscore the potential of GeAI in providing valuable information to patients. However, continued research and refinement are essential to evaluate further the reliability and broader applicability of ChatGPT within the medical field.


Introduction
Hypertension is a global epidemic affecting 31.1% of the adult population worldwide and is often regarded as the "silent killer" due to its contribution to the onset and exacerbation of major debilitating conditions such as myocardial infarction, chronic kidney disease, and stroke [1,2].As the most significant preventable risk factor for all-cause mortality, the management of hypertension is crucial; yet, only a mere 21% of those affected have their blood pressure optimally controlled [2].Optimizing hypertension control could lead to a 49% decrease in cardiac mortality and a significant 62% reduction in cerebrovascular mortality [3].In recent years, with technological advancements and the increased accessibility and availability of online resources, there has been a noticeable surge in reliance on online platforms as a source of medical information [4].However, accessing online resources for medical information can be a double-edged sword.While offering a convenient gateway to health-related information at the click of a button, the availability of unreliable and false information can hinder patient care and pose a potential threat to timely diagnosis and prompt management of medical conditions [4].The emergence of generative artificial intelligence (GeAI) technology, coupled with its widespread availability to the public, has ushered in a new era of acquiring medical information.This shift has sparked conflicting opinions within the healthcare community, navigating the delicate balance between acknowledging the constructive role that GeAI can play in enhancing health literacy while also recognizing the potential for misuse and misinformation [4].

Materials And Methods
In this nonhuman-subject qualitative study, the accuracy and reproducibility of responses offered by GeAI on hypertension questions were assessed.ChatGPT, developed by OpenAI in 2024, was selected as the GeAI of choice due to its popularity and accessibility for public use.Three internal medicine residents enrolled in an ACGME-accredited program gathered questions commonly asked by patients on the subject of hypertension, including the risk factors associated with having high blood pressure, management of hypertension, and complications of uncontrolled blood pressure.To ensure the validity and relevance of the selected questions, a board-certified internal medicine physician with over eight years of experience in diagnosing and managing hypertension evaluated each question, eliminating questions deemed less relevant, and finalized the question list used for this study.Each question then underwent three separate runs through ChatGPT, resulting in three independent sets of responses to each question.Responses generated by ChatGPT were first assessed for reproducibility.For a response to be deemed reproducible, all three answers to the same question needed to convey the same message.Any variance in the message resulted in the classification of that corresponding response as non-reproducible.Following the assessment of reproducibility, the study delved into evaluating response accuracy.To evaluate the accuracy of a response, it was checked against the recommended guidelines from reputed resources such as the American Heart Association or the National Institute of Health.The majority response to each question was first selected; an answer repeated at least two out of three times was selected as the majority response and compared against the recommended guidelines.Responses were categorized as appropriate if they aligned with the guidelines or inappropriate if they deviated.As a final step, a board-certified internal medicine physician conducted a final review of each question and answer and evaluated the appropriateness of responses from a clinical standpoint.For the assessment of response accuracy, therefore, each response had two sets of evaluations, one based on the recommended guidelines and one based on the clinical opinion of the board-certified physician.An overall accuracy score was calculated by averaging the accuracy score obtained from comparing the responses versus guidelines as well as the evaluation of responses by the physician.Descriptive statistics were used to examine the accuracy and reproducibility of the responses generated by ChatGPT.

Results
Initially, 130 frequently asked questions on the subject of hypertension were gathered by three internal medicine residents enrolled in an ACGME-accredited program.These questions were then revised for relevance by a board-certified internal medicine attending physician with over 8 years of experience in diagnosing and managing hypertension, resulting in a final set of 100 questions selected for the purpose of this study (Table 1).ChatGPT responses were assessed against recommended guidelines and exhibited robust accuracy with an appropriateness rate of 93%, while 7% of responses were deemed inappropriate.A parallel evaluation by a board-certified physician revealed a similar degree of appropriateness of the responses offered by ChatGPT, with 92% of the responses being classified as appropriate, while 8% were inappropriate (Figure 1).Therefore, averaging the accuracy of the responses by ChatGPT resulted in an overall accuracy of 92.5%.In the reproducibility evaluation, each question underwent three ChatGPT runs, generating three sets of responses for each question and totaling 300 responses.Analysis of these 300 responses revealed that ChatGPT provided irreproducible answers in 7 instances (3.6% of the total responses).However, when calculating reproducibility per question, it was found that 93% of the questions had reproducible responses while 7% of the questions had irreproducible responses, highlighting instances where responses could not be consistently replicated across multiple ChatGPT runs (Figure 2).

Discussion
This study aligns with the continuous endeavor to assess the precision of health-related content offered by GeAI [4].These efforts are imperative; identifying reliable online information sources can help with health literacy while limiting medical misinformation.The significance of misinformation in healthcare has been demonstrated by Bremner et al. in a study focused on psychological trauma, where it was found that only 42% of the information obtained by patients through online search engines to be accurate [28].Particularly in the medical field, even slight misinformation can contribute to misconceptions, inadvertently influencing medical decisions [29].Due to the novelty of GeAI as a patient education tool, there needs to be more direct comparisons in the literature.Nonetheless, this study's findings align with other studies that explored the potential role that GeAI can play in advancing health literacy [4].While the information provided by GeAI is compatible with current practices, it lacks the human touch.Though this does not impact the quality of the information provided, this could, in fact, affect patients' utilization and reliance on GeAI.This is particularly true when patients are at their most vulnerable, for example, when learning about a chronic medical condition such as hypertension, to the point that up to 60% of Americans do not feel comfortable relying on AI for health-related information.[30].There were limitations to the current study.First, the evaluation was based on a predefined set of questions, which may cover only some hypertension-related topics.Furthermore, it is worth noting that the questions posed to ChatGPT were exclusively in English, and employing a different language could yield varied results.Additionally, the study focused on qualitative assessment, and user interactions and context were not considered, which may affect the quality of responses.Furthermore, the ChatGPT responses were assessed by staff who are trained clinicians, and further studies have to be performed, perhaps those including patient subjects, to determine the receptive quality of the information offered by GeAI from a patient's standpoint, including but not limited to, presence of jargon, and the clarity of the language.

Conclusions
This study emphasizes the potential of AI-generated information in providing appropriate and readily accessible medical knowledge, serving as a potential patient educational tool in the near future.However, overall accuracy and the lack of complete consistency remain questionable areas where GeAI, such as ChatGPT, needs improvement before being readily used for a sensitive purpose like health education.While GeAI as an accessible resource for health literacy may eventually facilitate increased involvement of patients in their care and ultimately improve compliance and long-term outcomes in chronic disorders, continued research on the subject of GeAI in healthcare is critical to further validate the accuracy and reproducibility of AI-generated health information which can potentially allow for generalizability of the role of GeAI in health literacy.Lastly, considering the sensitivity of health-related information and the importance of accurate information, it is recommended that GeAI be complemented by human supervision and oversight to ensure accurate and reliable information is offered.

FIGURE 1 :FIGURE 2 :
FIGURE 1: Appropriateness of responses provided by ChatGPT to hypertension-related queries

TABLE 1 : Evaluation of ChatGPT responses to questions about hypertension
BP: Blood pressure; ER: Emergency room; EKG: Electrocardiogram; CT scan: Computed tomography scan.