Analyzing the Performance of ChatGPT About Osteoporosis

Introduction: This study evaluates the knowledge of ChatGPT about osteoporosis. Methods: Osteoporosis-related frequently asked questions (FAQs) created by examining the websites frequently visited by patients, the official websites of hospitals, and social media. Questions based on these scientific data have been prepared in accordance with National Osteoporosis Guideline Group guides. Rater scored all ChatGPT answers between 1 and 4 (1 stated that the information was completely correct, 2 stated that the information was correct but insufficient, 3 stated that although some of the information was correct, there was incorrect information in the answer, and 4 stated that the answer consisted of completely incorrect information). The reproducibility of ChatGPT responses on osteoporosis was assessed by asking each question twice. The repeatability of the ChatGPT answer was considered as getting the same score twice. Results: ChatGPT responded to 72 FAQs with an accuracy rate of 80.6%. The highest accuracy in ChatGPT's answers about osteoporosis was in the prevention category, 91.7%, and in the general knowledge category, 85.8%. Only 19 of the 31 (61.3%) questions prepared according to the National Osteoporosis Guideline Group guidelines were answered correctly by ChatGPT, and two answers (6.4%) were categorized as grade 4. The reproducibility rate of ChatGPT answers on 72 FAQs was 86.1% and the reproducibility rate of ChatGPT answers on National Osteoporosis Guideline Group guidelines was 83.9%. Conclusion: Present study outcomes for the first time showed that ChatGPT provided adequate answers to more than 80% of FAQs about osteoporosis. However, the accuracy of ChatGPT’s answers to inquiries based on National Osteoporosis Guideline Group guidelines was decreased to 61.3%.


Introduction
Osteoporosis is characterized by reduced bone mass, bone tissue deterioration, and pathological changes in bone microarchitecture, which result in a decrease in bone strength and an increase in the risk of bone fractures [1].Osteoporosis has an economic, medical, and social burden, and previous reports emphasized the relation between osteoporosis and prolonged and sedentary lifestyles, economic deficiencies, alcohol use, endocrine diseases, and malign disorders [2].Salari et al. reviewed 70 studies including 800,457 women and 453,964 men, and study findings stated that 11.7% of men and 23.1% of women had osteoporosis [3].On the other hand, new-generation treatment modalities have been developed for osteoporosis, and numerous studies about osteoporosis demonstrated that patients' awareness and knowledge about osteoporosis are crucial in the prevention and treatment of osteoporosis [2,4].In recent years, web sources have an indispensable role in raising awareness of public health, and a large number of patients get information about their health problems from online sources including Instagram, Twitter, YouTube, and generative pre-trained transformer (ChatGPT) [5].
ChatGPT, which was created by OpenAI, is an artificial intelligence application with natural language programming.While ChatGPT can be used in all areas of life, recent studies have been investigating the effectiveness and reliability of using ChatGPT in the field of health [6].Caglar et al. analyzed the precision accuracy and consistency of ChatGPT's responses to questions in pediatric urology, and authors achieved satisfactory answers to questions related to pediatric urology [7].Also, Gilson et al. analyzed the performance of ChatGPT in medical school exams, and ChatGPT gave correct answers to 60% of the questions [8].Moreover, Rao et al. stated that ChatGPT had a satisfactory accuracy rate in the evaluation of radiological findings [9].
Although previous studies investigated the accuracy of ChatGPT answers in various disorders, to our knowledge, no research has analyzed the precision and consistency of ChatGPT's responses to osteoporosis.The aim of this study was to evaluate the knowledge of ChatGPT about osteoporosis.

Materials And Methods
Osteoporosis-related frequently asked questions (FAQ) were created by examining the websites frequently visited by patients and the official websites of hospitals.Websites affiliated with authoritative and reputable organizations were given preference.These organizations often include government health agencies, leading medical institutions, academic research centers, and recognized osteoporosis advocacy groups.In addition, the questions asked by patients on social media platforms such as YouTube, Facebook, and Instagram and patient comments on these platforms were used while preparing the question list (Table 3, Appendix).Scientific data and questions based on these scientific data have been prepared in accordance with National Osteoporosis Guideline Group (NOGG) guides, and all these questions were collected in a separate questionnaire (Table 4, Appendix).Unrealistic questions, repetitive questions, questions containing advertisements, grammatically incorrect questions, and questions requiring personal answers were excluded from the study.All questions in FAQ form classified as general information (n = 14), risk factors (n = 10), nonpharmacological treatments (n = 12), pharmacological treatments (n = 24), and prevention (n = 12).Questions depended on NOGG guidelines and included 31 questions.
The answers were evaluated by a physiotherapist with eight years of clinical and academic experience in osteoporosis.Rater scored all ChatGPT answers between 1 and 4 (1 stated that the information was completely correct, 2 stated that the information was correct but insufficient, 3 stated that although some of the information was correct, there was incorrect information in the answer, and 4 stated that the answer consisted of completely incorrect information).The definition of the correct answer is an experienced physiotherapist was considered to have made no additional contribution to the ChatGPT response to a patient question.In the questions related to the NOGG guides, guideline information was taken into account when evaluating the accuracy of ChatGPT's answers.
The reproducibility of ChatGPT responses on osteoporosis was assessed by asking each question twice in written form on different days.The repeatability of the ChatGPT answer was considered as getting the same score twice.Two ChatGPT answers in different score categories or having different levels of detail were evaluated negatively in terms of reproducibility.Ethics committee approval was not required as patient data was not used in the present study.

Statistical analysis
Statistical analysis was performed using Excel Version 16 (Microsoft Corporation, USA).The questions were evaluated separately as FAQs and questions prepared by the NOGG.The scores given to the answers are expressed as percentages.

Results
The reasons why some questions about osteoporosis were not included in the study and the flow chart of the study were presented in Figure 1.While 139 number of questions were evaluated in total, it was determined that 67 of them did not meet the study criteria (repetitive questions, n = 18, grammatically inadequate questions, n = 15, questions with subjective answers, n = 20, and questions related with personal health, n = 14, respectively), and 72 questions were included in the study.

FIGURE 1: Flowchart of the study
ChatGPT responded to 72 FAQ with an accuracy rate of 80.6% (58 questions).Total, eight (11.1%)answers given by ChatGPT were accepted as grade 2 and six (8.3%) answers as grade 3. None of the responses to the 72 FAQ were evaluated as category 4. The highest accuracy in ChatGPT's answers about osteoporosis was in the prevention category, 91.7%, and in the general knowledge category, 85.8%.Other side, the lowest accuracy rate (66.6%) was obtained in the responses related to non-pharmacological treatments.In addition, only 19 of the 31 (61.3%)questions prepared according to the NOGG guidelines were answered correctly by ChatGPT, and two answers (6.4%) were categorized as grade 4. The grading of responses to FAQs about osteoporosis and grading of answers to questions based on NOGG guidelines were documented in Table 1 and presented in Figure 2.

Discussion
Artificial intelligence technology is rapidly entering our daily lives, and the opportunities brought by artificial intelligence are used more widely in the field of health.The correct integration of artificial intelligence in the field of health may enable the widespread and effective application of screening tests, and earlier diagnosis of diseases, and enable patients to be more compliant to follow-up schedules.Despite its advantages, there are many reservations about the use of artificial intelligence in the healthcare field [6][7][8].
Thus, the study aimed to clarify the knowledge of ChatGPT about osteoporosis, which affects almost one in six of the world's population.The present study revealed that ChatGPT gave completely correct answers for 80.6% of FAQs about osteoporosis.Additionally, the accuracy rate of ChatGPT responses was highest in questions about general information about osteoporosis and in questions about prevention of osteoporosis.Other side, ChatGPT answered completely true only 61.3% of the questions which were prepared according to NOGG guidelines.Lastly, reproducibility rates of ChatGPT answers were over 80% for both the FAQ and questions based on NOGG guidelines.
The accuracy and reliability of web sources about health topics are debatable, and numerous studies demonstrated that online content contains false and incomplete information.Alsyouf et al. analyzed contents of web sources including Facebook, Twitter, Instagram, etc., about prostate cancer, and the author stated that inaccurate information about prostate cancer in web resources was almost 30 times more than correct information [10].In contrast, Caglar et al. analyzed the performance of ChatGPT's answers about pediatric urology, and the author concluded that ChatGPT gave satisfactory answers to questions related to pediatric urology [7].In another study, Bulck and Moons evaluated the responses of ChatGPT about cardiovascular diseases, and the authors stated that 17 out of 20 answers had a sufficient information for patients [11].In the present study, the grade of answers by the ChatGPT to inquiries about osteoporosis was analyzed for the first time, and the findings of the study demonstrated that almost four of five ChatGPT responses about osteoporosis provided satisfactory and completely true information.Unlike other web resources, ChatGPT reviews multiple sources when answering questions, and extensive data usage may be associated with the high accuracy rate and qualification rate of ChatGPT's answers.
Guidelines are resources that make inferences based on the results of many meta-analyses and studies, have high scientific content, and contain information that is recommended to be performed in daily practice.
Other side, questions about sources with evidence-based scientific literature may be more complicated to answer [12].In a study by Antaki et al., ChatGPT answered the exam of first-year ophthalmology residents, and 55.8% of questions were answered completely true by ChatGPT, which was similar to residents' results [13].Also, Caglar et al. stated that ChatGPT accurately answered 93.6% of the questions based on pediatric urology guidelines [7].However, the accuracy rate of ChatGPT answers to NOGG guidelines-based questions was 61.3% in the present study, which is much lower than the FAQ about osteoporosis.
Although the present study was the first to analyze the accuracy of ChatGPT responses in osteoporosis, this study has some limitations.The present study included a certain period, but contents about osteoporosis were incessantly uploaded to the internet.In addition, the study was done in only the English language.However, English is the most used language in the academic and social areas of the web.Additionally, responses were evaluated by a single clinician.Responses of ChatGPT about osteoporosis in rarer languages may be the subject of further studies.Lastly, we did not evaluate the intelligibility of ChatGPT responses, the intelligibility of ChatGPT by individuals with different educational levels could be analyzed in further research.

Conclusions
Present study outcomes for the first time showed that ChatGPT provided adequate and sufficient answers to more than 80% of FAQs about osteoporosis.However, the accuracy of ChatGPT's answers to inquiries based on NOGG guidelines was decreased to 61.3%.Our findings recommended that applying ChatGPT in fields of medicine dealing with osteoporosis will provide better information about osteoporosis to patients.

Grading the answers to the osteoporosis-related questions according to the question categories
NOGG: National Osteoporosis Guideline GroupThe reproducibility rate of ChatGPT answers on 72 FAQ was 86.1% and the reproducibility rate of ChatGPT answers on NOGG guidelines was 83.9%.The repeatability rate was highest in general information about osteoporosis (92.8%) answers and in risk factors (90.0%) responses (Table2).

TABLE 2 : The rate of giving the same answers for repetitive questions
NOGG: National Osteoporosis Guideline Group