Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini

Background Google Gemini (Google, Mountain View, CA) represents the latest advances in the realm of artificial intelligence (AI) and has garnered attention due to its capabilities similar to the increasingly popular ChatGPT (OpenAI, San Francisco, CA). Accurate dissemination of information on common conditions such as hypertension is critical for patient comprehension and management. Despite the ubiquity of AI, comparisons between ChatGPT and Gemini remain unexplored. Methods ChatGPT and Gemini were asked 52 questions derived from the American College of Cardiology’s (ACC) frequently asked questions on hypertension, following a specified prompt. Prompts included: no prompting (Form 1), patient-friendly prompting (Form 2), physician-level prompting (Form 3), and prompting for statistics/references (Form 4). Responses were scored as incorrect, partially correct, or correct. Flesch-Kincaid (FK) grade level and word count were recorded. Results Across all forms, scoring frequencies were as follows: 23 (5.5%) incorrect, 162 (38.9%) partially correct, and 231 (55.5%) correct. ChatGPT showed higher rates of partially correct answers than Gemini (p = 0.0346). Physician-level prompts resulted in a higher word count across both platforms (p < 0.001). ChatGPT showed a higher FK grade level (p = 0.033) in physician-friendly prompting. Gemini exhibited a significantly higher mean word count (p < 0.001); however, ChatGPT had a higher FK grade level across all forms (p > 0.001). Conclusion To our knowledge, this study is the first to compare cardiology-related responses from ChatGPT and Gemini, two of the most popular AI chatbots. The grade level for most responses was collegiate level, which was above average for the National Institutes of Health (NIH) recommendations, but on par with most online medical information. Both chatbots responded with a high degree of accuracy, with inaccuracies being rare. Therefore, it is reasonable that cardiologists suggest either chatbot as a source of supplementary education.


Introduction
Hypertension is the most common cardiovascular disease and leading cause of cardiovascular mortality worldwide [1,2].Over half of the world's population has been diagnosed with hypertension, with a much greater percentage suspected, but not yet diagnosed with hypertension [3].Furthermore, hypertension is a systemic disease contributing to adverse effects in multiple organ systems and in overall lifestyle.Patient education plays an especially pivotal role in managing hypertension, given that disease treatment is inherently multi-factorial, involving medications, diet, exercise, life stressors, etc [4].
According to the National Cancer Institute's (NCI) Health Information National Trends Survey (HINTS), 84.6% of the US adult population used the Internet to look for health or medical information in 2022, with a number expected to rise in the coming decade [5].The literature on the prevalence of utilizing artificial intelligence (AI) chatbots for medical education is poorly defined; however, the trend of utilization of AI chatbots has been steadily rising even after its initial exponential rise.
ChatGPT, an AI chatbot developed by OpenAI in November of 2022, has quickly gained widespread attention.From its inception, the site took 5 days to reach 1 million users and within two months it had surpassed 100 million users [6].In response to the rise of AI chatbots, Google released Gemini, a large language model similar to that of ChatGPT, on May 10, 2023.Within a few months, many speculated Gemini would be the primary competitor to ChatGPT [7,8].Given AI's burgeoning popularity and its potential for disseminating health information, evaluating the quality and accuracy of ChatGPT and Gemini is of paramount importance.We aimed to critically assess ChatGPT's responses to queries about one of the world's most common diseases, hypertension.This study focuses on the accuracy, comprehensibility, and appropriateness of using AI responses for patient education.The good of the study is to guide cardiologists and healthcare professionals in understanding the benefits and potential limitations of AI for patient education.

Materials And Methods
OpenAI's ChatGPT and Google's Gemini chatbots were prompted four times, then asked 52 questions from the 2017 American College of Cardiology's (ACC) frequently asked questions on hypertension [9].ChatGPT version 3.5 and Gemini version 1.0 (formerly known as Google Bard) were used for all responses.All questions were asked between the dates of September 6, 2023 and September 7, 2023.
Prompts were as follows: no prompt (Form 1), patient-friendly prompt (Form 2), physician-level prompt (Form 3), and prompting for statistics/references (Form 4).The prompts used are in Table 1.Responses were reviewed and scored as incorrect, partially correct, correct, or correct with references (perfect).Incorrect responses were designated if the response included any incorrect information or if responses included less than 50% of information from the ACC response answers.Partially correct answers included responses that had no incorrect information and included 50% -99% of the information from the ACC responses.Correct responses included all information from the ACC responses with any extra information being correct.Perfect responses included responses that met criteria for correct responses and included references and/or statistics in the response.Proportions of responses at differing scores were compared using chi-square analysis.Tests were performed with an alpha set at 0.05.
For each response, the number of words, sentences, and syllables were collected to compute a Flesh-Kincaid (FK) Grade level.This metric estimates the United States educational grade level required to understand the response, with higher grade levels indicating more complex language usage and is defined as: Values vary from 0-20, with the numerical value corresponding with the reading grade level (e.g., 12 would equal grade level 12).Significance between forms was calculated using a one-way ANOVA with an alpha of 0.05.Additionally, response length was recorded and significance was analyzed with a one-way ANOVA and an alpha set at 0.05.Significance for statistical analyses was set at p < 0.05.Statistics were run using Prism 10.0.2.

Discussion
From the inception of Google Gemini, comparisons to ChatGPT were made, and there have been many speculation about which AI chatbot would be more accurate [10,11].To date, few studies have objectively compared the accuracy of responses, with even fewer studies focusing on the medical field [12].To our knowledge, this is the first study to compare the performance of two of the most popular AI chatbots, ChatGPT and Gemini, on cardiology related topics.
Overall, both ChatGPT and Gemini provided accurate, but often partially complete, responses when responding to ACC's frequently asked questions about hypertension.Even though only half the answers were deemed entirely "correct," this result was still seen positively.The AI chatbots' replies often contained more than 50% of the information, typically lacking just one element from the ACC's answers.The responses that would signify a large deficiency -incorrect or incomplete (greater than 50% missing) information -were only present in 5.5% of responses.This result was on par with many other studies that examined artificial intelligent chatbot responses, generally ranging from 1-5% incorrect responses [13][14][15][16].ChatGPT gave more partially correct answers than Gemini, while Gemini exhibited a non-significant trend to provide more correct responses than did ChatGPT.This could be in part because ChatGPT's mean responses were 23 words shorter than Gemini's responses, thus leaving less room for information.
ChatGPT had a higher mean grade reading level than Gemini, with an FK score of 15.92 versus 13.50, respectively.Although ChatGPT's answers were less accurate, they were more succinct and used a higher grade reading level.The National Institutes of Health (NIH) recommends patient education material should be written at an 8th grade reading level, which is lower than both ChatGPT's approximate grade level (grade 15 -collegiate level), and Gemini's approximate grade reading level (grade 13 -collegiate level) [17].However, ChatGPT and Gemini's grade levels are quite similar to many online sources of cardiology material.Academic websites pertaining to atrial fibrillation had a mean grade level of 13.05, while non-academic sites had a mean average of 11.64 [18].This finding was mirrored in other medical specialties' online reading material [19][20][21][22].Therefore, while the two chatbots responded above the NIH recommended grade level, the responses were on-par with most online resources.
ChatGPT consistently had a lower average word count in its responses compared to Gemini, as noted earlier.Similar trends have been observed in other studies comparing the two chatbots' performance on health literacy, hinting that Gemini may naturally provide lengthier responses [23].Notably, the word count for both chatbots remained fairly consistent across various query types with the exception of Form 4, which involves requesting statistics or research.This variation is likely due to the nature of the prompt, as requesting data and references typically necessitates the inclusion of more detailed information, such as citations, statistical figures, or mathematical equations.
While this study assesses responses objectively, it has its limitations, including the assumption of accurate patient inquiries.We did not assess the chatbots' reactions to false information.Also, patients have myriad ways to ask questions, potentially leading to responses not reviewed in this study.Future research should broaden the scope of inquiries and analyze the chatbots' handling of erroneous inputs.

Conclusions
The analysis shows that AI chatbots like ChatGPT and Gemini can be valuable tools for augmenting patient education on topics such as hypertension.Both have demonstrated a strong ability to provide accurate answers.They might not include every nuance that the ACC offers, but they generally convey the necessary information with few errors.Therefore, it's sensible for medical professionals to suggest using ChatGPT or Gemini as educational resources.Nevertheless, one should recognize the minor possibility of encountering inaccuracies.

FIGURE 1 :
FIGURE 1: Correct, Partially Correct, and Incorrect Answers in ChatGPT and Gemini Responses Each bar shows the total number of correct, partially correct, or incorrect answers between all forms.Abbreviations: ns = no significance.* = p<0.05.