Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum

Background and objective ChatGPT is an artificial intelligence (AI) language model that has been trained to process and respond to questions across a wide range of topics. It is also capable of solving problems in medical educational topics. However, the capability of ChatGPT to accurately answer first- and second-order knowledge questions in the field of microbiology has not been explored so far. Hence, in this study, we aimed to analyze the capability of ChatGPT in answering first- and second-order questions on the subject of microbiology. Materials and methods Based on the competency-based medical education (CBME) curriculum of the subject of microbiology, we prepared a set of first-order and second-order questions. For the total of eight modules in the CBME curriculum for microbiology, we prepared six first-order and six second-order knowledge questions according to the National Medical Commission-recommended CBME curriculum, amounting to a total of (8 x 12) 96 questions. The questions were checked for content validity by three expert microbiologists. These questions were used to converse with ChatGPT by a single user and responses were recorded for further analysis. The answers were scored by three microbiologists on a rating scale of 0-5. The average of three scores was taken as the final score for analysis. As the data were not normally distributed, we used a non-parametric statistical test. The overall scores were tested by a one-sample median test with hypothetical values of 4 and 5. The scores of answers to first-order and second-order questions were compared by the Mann-Whitney U test. Module-wise responses were tested by the Kruskall-Wallis test followed by the post hoc test for pairwise comparisons. Results The overall score of 96 answers was 4.04 ±0.37 (median: 4.17, Q1-Q3: 3.88-4.33) with the mean score of answers to first-order knowledge questions being 4.07 ±0.32 (median: 4.17, Q1-Q3: 4-4.33) and that of answers to second-order knowledge questions being 3.99 ±0.43 (median: 4, Q1-Q3: 3.67-4.33) (Mann-Whitney p=0.4). The score was significantly below the score of 5 (one-sample median test p<0.0001) and similar to 4 (one-sample median test p=0.09). Overall, there was a variation in median scores obtained in eight categories of topics in microbiology, indicating inconsistent performance in different topics. Conclusion The results of the study indicate that ChatGPT is capable of answering both first- and second-order knowledge questions related to the subject of microbiology. The model achieved an accuracy of approximately 80% and there was no difference between the model's capability of answering first-order questions and second-order knowledge questions. The findings of this study suggest that ChatGPT has the potential to be an effective tool for automated question-answering in the field of microbiology. However, continued improvement in the training and development of language models is necessary to enhance their performance and make them suitable for academic use.


Introduction
In recent years, natural language processing (NLP) and machine learning (ML) techniques have 1 2 3 2 2 4 1 revolutionized the field of question-answering systems [1]. ChatGPT is one such artificial intelligence (AI) language model that has been trained to understand the nuances of natural language and provide accurate answers to a wide range of questions [2,3,4]. While ChatGPT has shown promising results in solving problems in various fields, its ability to accurately answer first-and second-order knowledge questions in microbiology has not been extensively evaluated. First-order questions are straightforward questions that ask for factual information or seek a direct answer. Second-order questions, on the other hand, are more complex and require higher-level thinking and interpretation. These questions demand analysis or evaluation of information or an opinion or prediction based on evidence [5].
Microbiology is a fundamental subject in medical education and it deals with the study of microorganisms, including bacteria, viruses, fungi, and parasites. Medical students need to have a sound understanding of microbiology as it is directly related to the diagnosis and treatment of infectious diseases [6]. Competencybased medical education (CBME) is an approach to medical education that focuses on the acquisition of specific competencies rather than the general accumulation of knowledge. The CBME curriculum for microbiology includes a total of eight modules that aim to assess the student's understanding of the subject [7].
Automated question-answering systems have the potential to transform medical education by providing instant and accurate responses to complex questions [8]. The use of ChatGPT as an automated questionanswering system in the field of microbiology could aid in the development of effective educational tools that can help medical students learn and understand the subject better. However, before implementing ChatGPT as an educational tool, it is crucial to evaluate its ability to accurately answer first-and secondorder knowledge questions related to microbiology based on the CBME curriculum.
In light of this, in this study, we aimed to evaluate the performance of ChatGPT in answering first-and second-order knowledge questions related to microbiology. This could aid in the development of effective educational tools that can help medical students master the subject of microbiology. The results of this study could also shed light on the strengths and weaknesses of ChatGPT as an automated questionanswering system with regard to the field of microbiology and pave the way for further research in this field.

Study design and setting
This cross-sectional observational study was conducted by the department of Microbiology, Pathology, and Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India in February 2023. We used an online AI language model -ChatGPT -to converse with and the responses were collected for analysis. The program is free for registered users at this point in time.

Ethical consideration
This study does not involve any human or animal research participants. Responses generated by the language model were not shared in this paper. Hence, according to current guidelines in the country, this study does not require any ethical clearance.

Question preparation
According to the CBME curriculum for the subject of microbiology, we prepared a set of first-and secondorder questions. The researchers prepared a total of 96 questions, with six first-order and six second-order knowledge questions for each of the eight modules, as shown in Table 1 [9]. All questions were reviewed for content validity by an expert microbiologist.

Data collection
The questions were used to converse with ChatGPT by a single user. The responses were recorded for further analysis. We did not use the "regenerate response" and took the first response to be the final answer.

Scoring of answers
The responses generated by ChatGPT were scored by three microbiologists. The scoring was done on a rating scale of 0-5, where 0 indicates an incorrect answer and 5 indicates a fully correct answer. The average of the three scores was taken as the final score for each response. The brief study protocol is depicted in Figure 1. Having multiple evaluators to assess an answer can help reduce biases and increase the accuracy and reliability of the evaluation. Different evaluators may have different biases based on their personal experiences, perspectives, or preferences. Having multiple evaluators can help mitigate these biases. Hence, we employed three evaluators to reduce biases.

Statistical analysis
As the data were not normally distributed, non-parametric statistical tests were used for analysis. The overall scores were tested using a one-sample median test with hypothetical values of 4 and 5. The scores of the first-order and second-order knowledge questions were compared using the Mann-Whitney U test. The module-wise responses were tested using the Kruskal-Wallis test, followed by the post hoc test for pairwise comparisons [10]. All the tests were conducted in GraphPad Prism 7 (Dotmatics, Boston, MA). A p-value <0.05 was considered statistically significant.

Results
The overall mean score of 96 answers was 4.04 ±0.37 (median: 4.17, Q1-Q3: 3.88-4.33). The frequency distribution of scores of 96 answers to both first-order and second-order knowledge questions is shown in Figure 2.

FIGURE 3: Comparison between scores of answers to first-and secondorder knowledge questions
The score was significantly below the score of 5 (one-sample median test p<0.0001) and similar to 4 (onesample median test p=0.09), indicating an accuracy rate of 80%.
The highest score was obtained in the central nervous system infections category (4.25±0.27) and the lowest score was obtained in the gastrointestinal and hepatobiliary systems category (3.75±0.36). The competency topic-wise distribution of scores is shown in terms of descriptive statistics in Table 2.  When we compared the scores among various categories according to CBME, we found that there was an overall difference (p=0.02) among the scores, indicating that the application's performance may vary in different topics in microbiology. The comparison is shown in Figure 4.

Discussion
Based on our findings, ChatGPT is capable of providing answers to both first-and second-order knowledge questions with approximately 80% accuracy. This corroborates the findings of Sinha et al [4]. There was no difference in the application's capability to solve both types of questions. Hence, this program can be used by medical students in studying microbiology. In many institutions, self-directed learning (SDL) is practiced where a student collects relevant information from various sources to study a topic. SDL is also an important component of the competency-based curriculum recently introduced in India by the National Medical Commission of India in 2019. This SDL learning approach places medical professionals in charge of their own learning, allowing them to tailor their learning experiences to their individual needs and take ownership of their learning outcomes. By promoting a learner-centered approach, SDL empowers medical professionals to identify their strengths and weaknesses, set goals for improvement, and apply new knowledge and skills to their clinical practice [11]. The CBME curriculum has assigned hours of study for SDL in every subject of medical science. ChatGPT can assist medical students in self-directed learning by providing access to relevant information and answering questions. Medical students can ask questions about specific diseases, treatments, or procedures. The major advantage of the model is personalized learning it provides for the individual needs of medical students. ChatGPT can provide answers with the aid of literature resources to help students understand the concepts and cross-check them. An example is shown in Figure 5. Across different competency modules, the score was significantly different. The highest score was found for questions related to central nervous system infections and the lowest score was obtained in the gastrointestinal and hepatobiliary systems. The underlying cause may be attributed to the limitation of training the AI system. Any deviation from the uniformity of difficulty may be another reason for the different levels of performance of the language model. Hence, there is scope for further improvement of the model to make it more suitable for academic usage. The applicability of language models like ChatGPT in academic usage is wide-ranging and constantly expanding, and they can be used to enhance various aspects of academic work, including research, writing, teaching, and learning [12,13].
This study has certain limitations. We only evaluated the capability of ChatGPT in answering first-and second-order knowledge questions on the subject of microbiology, and the findings may not be generalizable to other subjects or domains. Secondly, the study only used a single user to converse with ChatGPT. Conversations with another user at different time points may generate a different response. Furthermore, paraphrasing the questions showed variations in the answers. The scoring of the responses was subjective; hence, evaluation bias may be present even after taking the average of three scores from three evaluators.

Conclusions
The study showed that ChatGPT is a potential tool for answering both first-and second-order knowledge questions with equal vigor in the field of microbiology. The accuracy rate of approximately 80% indicates that ChatGPT has the capability to be a valuable automated question-answering system for students.
Although the model showed high accuracy, there is still room for improvement, and it is necessary to finetune the model to enhance its performance to achieve nearly 100% accuracy. In the future, with further improvement, ChatGPT-like programs may be used for academic purposes.

Additional Information Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.