Critical Appraisal of Anesthesiology Educational Research for 2017

Background Critical appraisals provide a method for establishing the status of an area of study or evaluating the effectiveness of literature within it. The purpose of this study was to review and appraise studies published in 2017 on medical education in anesthesiology and to provide summaries of the highest-quality medical education research articles in the field. Methods Three Ovid MEDLINE databases, Embase.com, Education Resources Information Center (ERIC), and PsycINFO, were searched followed by a manual review of articles published in the highest impact factor journals in both the fields of anesthesiology and medical education. Abstracts were double-screened and quantitative articles subsequently scored by three randomly assigned raters. Qualitative studies were scored by two raters. Two different rubrics were used for scoring quantitative and qualitative studies, both allowed for scores ranging from 1-25. Results A total of 864 unique citations were identified through the search criteria. Of those, 62 articles met the inclusion criteria, with 59 quantitative and three qualitative. The top 10 papers with the highest scores were reported and summarized. Discussion As the first article to critically review the literature available for education in anesthesiology, we hope that this study will serve as the first manuscript in an annual series that will help individuals involved in anesthesiology education gain an understanding of the highest-quality research in the field. Once this process is repeated, trends can be tracked and serve as a resource to educators and researchers in anesthesiology for years to come.


Introduction
The need for medical education research in the field of anesthesiology is vital. To ensure the best educational and clinical outcomes, medical education should be based on the best available evidence so that science can shape the nature of our practice [1]. Yet, in general, medical education research is underfunded [2] and the studies that do exist are often criticized for lacking rigor [3].
The purpose of this study is to review and appraise all of the studies published in 2017 on medical education in anesthesiology and to provide summaries of the highest-quality medical education research articles in the field. We assert that a regular, critical review of the literature in anesthesiology education will highlight rigorous research being performed in the field, reinforce best practices, and identify areas actively needing further investigation. In addition, synthesizing key findings for a time-pressed audience may foster the application of the knowledge gained from these studies to daily practice. Gaps in anesthesiology education research and literature may also be discovered through this critical review.
This study is based on a series of critical appraisals conducted over the last 10 years in emergency medicine (EM) with the purpose described as providing a "valuable resource for EM educators and researchers invested in the scholarship of teaching" [4]. Similarly, we hope that this will be the first article in a yearly series that will allow us to track the state of research in medical education in anesthesiology.

Article identification
To identify all articles in anesthesiology education, a medical librarian (MM) searched three Ovid MEDLINE databases (MEDLINE, In-Process & Other Non-Indexed Citations, Epub Ahead of Print), Embase.com, Education Resources Information Center (ERIC; via FirstSearch), and PsycINFO (via EBSCOhost). These databases were selected to cast a suitable net over the health sciences, education, and psychology literature. Each search consisted of a set of anesthesiology and education terms. Appropriate controlled terms were used in MEDLINE, Embase, and ERIC and supplemented with a search of article titles and abstract keywords. The PsycINFO search relied entirely on article title and abstract. All searches were initially run on January 30, 2018, and rerun on October 3, 2018, to allow time for studies published in 2017 to be indexed in each database. Animal and non-English studies were excluded from the search results, and all searches were limited to publication year 2017 with publications pre-printed in 2017 excluded. The Ovid MEDLINE search is available in Table 1.

(exp anesthesiology/ or exp anesthetists/ or (anesthe* or anaesthe*).tw.) and (exp education/ or education.sh.
or (academic* or class or classes or course* or curricul* or educat* or fellow or fellows or fellowship or instruct* or intern or interns or internship or learn or learner or learning or resident or residents or residenc* or school* or student* or teach* or train* or workshop*).ti.) and english.la. not (exp animals/ not humans/) 2 limit 1 to yr="2017" Also, in November 2018, we conducted a manual review of the highest impact factor journals in both the fields of anesthesiology and medical education, as identified in Journal Citation Reports (Clarivate Analytics), to ensure that our searches did not exclude any relevant articles. For medical education, the list included Academic Medicine (Impact Factor: 4.801), Medical Education (Impact Factor: 3.617), Advances in Health Sciences Education (Impact Factor: 1.46) , Medical Teacher (Impact Factor: 2.450), and Simulation in Healthcare (Impact Factor: 2.340). For anesthesiology, the list included Anesthesiology (Impact Factor: 5.163), Anesthesia & Analgesia (Impact Factor: 3.827), and British Journal of Anaesthesia (Impact Factor: 6.499). In this manual search, we also included the Journal of Education in Perioperative Medicine since it is the journal focused on medical education in anesthesiology.

Inclusion and exclusion criteria
We followed the same inclusion and exclusion criteria used by Heitz et al. in the critical appraisal of education in emergency medicine manuscript [3]. We included all levels of learners (students, residents/trainees, and practicing clinicians) and articles applicable to both physicians and nurses in the field of anesthesiology. Authors that applied and verified the inclusion criteria included both experts in anesthesiology education and anesthesiologists. Studies were defined as a) hypothesis-testing investigations, b) evaluations of education interventions, or c) explorations of educational problems. Publications were excluded if they were: a) not studies (editorials, commentaries); b) short reports that lacked enough information to be evaluated; c) not relevant to anesthesiology learners; d) single-site survey studies; or e) studies that examined outcomes limited to an expected learning effect without a comparison group.

Data collection
To create the list of articles to be included in the critical appraisal, one author (LZ) reviewed all abstracts and applied the inclusion and exclusion criteria. Two additional authors (AG, FC) were each assigned half of the abstracts to independently apply the inclusion and exclusion criteria to their assigned abstracts. If the initial reviewer (LZ) and the second reviewer (AG or FC) were in agreement, then the article was excluded. Differences of opinion were reconciled by a third reviewer (AG or FC), who was not initially assigned the abstract. The list of articles and abstracts were maintained in a Microsoft Excel 2010 database (Microsoft Corporation, Washington, United States).

Scoring
The quantitative and qualitative scoring rubrics developed by Heitz et al. were used to score each article. We piloted the quantitative scoring rubric by having all authors review five randomly chosen papers from the list of included abstracts. Through a series of conference calls and email communications, the authors worked to create a shared mental model and notes were added to the scoring rubric to help maintain stable definitions for all criteria.
Each quantitative article that met inclusion criteria was randomly assigned to three authors, resulting in each author independently scoring 23 articles. Qualtrics (2019; Utah, US) was used to capture all scoring data, which then was exported into Excel 2010 for analysis. Mean scores were calculated through Excel 2010 and the articles with the top 10 mean scores were selected. Inter-rater reliability was assessed with an intraclass correlation coefficient using a one-way random-effect model in SPSS 25.0 (IBM Corp., Armonk, NY, US). Since this study did not involve human subjects, Institutional Review Board approval was not sought.
Two authors (AG, LZ), who have expertise in qualitative research methods, scored all qualitative articles. Each item was discussed and the two authors (AG, LZ) agreed upon scoring for each item. Table 2 and Table 3 show the scoring rubrics used for the quantitative and qualitative articles, respectively.

Domain
Item Item score

Max score 25
Introduction (select all that apply) 3 Appropriate description of background literature 1 Clearly frame the problem 1   Both rubrics allowed for scores ranging from 1-25, with the highest possible score set to 25 to make the scores comparable despite the difference in study type.

Results
A total of 864 unique citations were identified through the search criteria. Of those, 62 articles met the inclusion criteria (59 quantitative and three qualitative; see the Appendix for the full list of articles included in the critical appraisal). The intraclass correlation coefficient (ICC) found an average measure of ICC(1) = 0.717 (95%CI = (0.549, 0.830)) for all quantitative study articles scored.
The mean score for all 59 quantitative articles included was 15.60 out of a possible 25 points, with the score for articles ranging from 6.67 to 21.33. The top 10 scored articles had a mean score of 20.43, with scores ranging from 19.33 to 21.33.
The average score for the qualitative papers was 5.38, with scores ranging from 2 to 8.5. The score of 19.33 was chosen as the threshold for inclusion in the top 10 since that was the lowest score for the top 10 quantitative papers, thus no qualitative papers were included.

Top 10 papers
An annotated bibliography of the top 10 papers is listed below in alphabetical order by first author.

Description
Using a prospective, randomized controlled design with a blinded outcome assessment, this study aimed to determine the impact of simulator-based transesophageal echocardiography (TEE) training on the ability of novice operators to perform and interpret a focused critical care transesophageal echocardiography (TEE).

Significance
One major contribution of this work is the development of an exam-quality scoring tool that included the assessment of the quality of the images acquired as well as the interpretation of the images. There can be many applications of such a tool, including the assessment of learners, quality control for practicing clinicians, and further evaluation of training interventions.

Description
This study compared stress levels and non-technical skills, measured by the Anesthetist's Non-Technical Skills (ANTS) score, between trainees who were in the "hot-seat" role during simulation-based training as compared to those who were observers. The authors found that stress levels, measured via salivary cortisol, were lower for observers than hot-seat participants and that "observers of SBT [simulation-based training] achieved an equivalent level of nontechnical performance."

Significance
As the authors note, these findings have the potential to make simulation less resourceintensive for institutions to implement and to impact the design of simulation learning experiences. However, further work is needed to attempt to replicate these results in other settings. Challenging authority during an emergency-the effect of a teaching intervention. Crit Care Med, 2017, 45:e814-e820 [7].

Description
This study looked at the impact of an educational intervention on the ability of residents to intervene when an incorrect decision that could impact patient safety was made by a superior during a simulated experience.

Significance
The hierarchical nature of healthcare makes it hard for trainees to challenge authority even when a clear mistake that can impact patient outcomes is about to occur. This study showed that a simple, low-cost educational intervention could improve the frequency and quality of a resident's willingness and ability to challenge an incorrect patient care decision made by a superior.

Description
Using a randomized design, this study sought primarily to determine whether there was a difference in performance for residents exposed to varying levels of simulated mortality during training scenarios. Residents in the variable death group had improved nontechnical skills while the always and never death groups showed no difference.

Significance
While mortality in simulation is still controversial, this study starts to show how the thoughtful use of mortality, when it is related to the performance of the learner, can improve nontechnical skills without causing higher levels of anxiety. This may help educators make more informed decisions about whether or not to include patient mortality in simulation.

Description
The authors showed that intraoperative handover training and display of a checklist in the OR improved the communication of residents and certified registered nurse anesthetists (CRNAs) during intraoperative transfers of anesthesia care.

Significance
With duty-hour restrictions came the potential increase in handovers among trainees. This study helps to address a gap in the standardization of intraoperative handovers through training and the creation of a checklist to improve communication. These themes have high generalizability, with the potential to reduce preventable adverse events. Future areas of study might explore the qualitative handover factors beyond the quantitative checklist items and may offer valuable insight into the retention and clarity of information transferred.

Description
This randomized control study showed that a serious game designed to teach orthotopic liver transplantation (OLT) anesthetic management improved resident performance in simulated orthotopic liver transplantation (OLT).

Significance
This study found adding a serious game to an existing educational curriculum was a feasible and cost-effective way to enhance learning in anesthesiology residents. The use of a serious game to enhance education can potentially be used for any topic in any field, making the findings widely applicable.

Description
This study showed that asking learners to guess (generative retrieval) the answers to questions before the answer was given helped them learn normal cardiovascular ultrasound anatomy through TEE images.

Significance
While this study focuses on learning TEE, the technique of generative retrieval could be used for any subject in anesthesiology and beyond. This can have implications to the way in which curricula are designed to allow learners the opportunity to guess even before they are taught new material.  [12].

Description
This study showed that a high-fidelity simulation-based study could be used to justify the same principal conclusions as a clinical study.

Significance
This study demonstrated the ability to apply simulation research to clinical settings when studies and the simulation experiences are carefully constructed. The authors suggest that studies on human factors, teamwork, and communication lend themselves particularly well to investigations using a simulated environment. Even though the study is about whether an intervention can be tested through simulation, the results also support the connection between simulation and real life, which has implications for the use of simulation in training.

Description
The purpose of this study was to evaluate WOOP (Wish, Outcome, Obstacle, Plan), a validated tool for improving learner self-regulation as a means of improving study habits in residents on an intensive care unit (ICU) rotation.

Significance
The WOOP is a free and easily used self-regulation tool that this study shows to have potential to help resident learners. The application of the principles of cognitive psychology to education is a frontier for medical education. Future areas of investigation could include using the WOOP in rotations with potentially less well-defined content (i.e. general OR rotations) or evaluating other tools to improve self-regulation.

Description
This study compared two strategies (mannequin-and computer-based simulation modalities) for teaching lung-protective ventilation strategies with low tidal volume to anesthesiology residents. The authors found that "mannequin-based simulation seemed more effective than computer-based simulation for improving knowledge and skills related to mechanical ventilation."

Significance
This study provides a methodologically rigorous model for assessing varying modalities of simulation training. Further, it offers insight into training models for mechanical ventilation.

Discussion
To our knowledge, this manuscript is the first to critically review anesthesiology education literature with the goal of quantitatively and qualitatively assessing studies for scientific rigor and academic and clinical merit. We envision this manuscript as the first annual installment to help practitioners better understand the state of research in the field and contribute to the increased application of evidence-based practices in anesthesiology education.
Since this is only the first review of its kind, we cannot establish trends over time; however, there were a few commonalities among the studies we reviewed that are of note. First, looking at the scores in each category included on the rubric for quantitative articles, less than 25% (n=15) of articles included a control group, less than 20% (12) included random assignment, and only 24% (15) included power analysis. This shows a majority of the articles reviewed lacked basic rigor. While innovative concepts might require piloting and sometimes less rigorous methodology to establish feasibility, only 23% (14) of articles were scored as an innovative assessment or intervention. This is further evidence that supports the concerns about the rigor of medical education research [3]. In addition, none of the very few qualitative articles achieved a score high enough to be included in our top list. Since medical education research is trying to build on our understanding of how and why things work, qualitative research could help with the fundamental exploration needed to answer these questions.
While great care was taken to ensure rigor in this appraisal, this study is not without limitations. Even though rigorous search methods were applied to locate articles relevant to anesthesiology education, the searches may have erroneously omitted or excluded some articles that should have been included. Particularly susceptible to this type of omission are those articles published in a journal where the focus is on a field outside of anesthesiology or medical education. However, the top 10 articles come from nine different journals showing variety among the journals represented. In addition, a total of 39 different journals were represented by the 63 articles included in the critical appraisal review.
Another potential limitation is the nature of the rating process and the assessment tools. Even though we did rater training and worked to stabilize the definitions for each criterion included in the rubric, there were elements that were subject to interpretation and may have resulted in differences in scores. However, since the judgment of the reviewers is inherent to the process of a critical appraisal, some bias is inherent to the process. Nonetheless, there was high interrater reliability of our assessment, considering ICC(1) values tend to be very low.
In addition, the allocation of points within the quantitative study scoring rubric favored studies that included an educational intervention. This systematic bias in the scoring instrument left some high-quality articles of non-intervention studies with low scores. For example, the Baker et al. study [15] examining retaliation in faculty and trainee evaluations is highly relevant to anesthesiology education and had a sample size of over 25,000 evaluations. However, it lost points for not having a control group, not using a pre-/post-model, and only including one institution while other studies with a very small sample size that included those elements scored higher.
As previously stated, we hope to continue this initiative on an annual basis. To better ensure that the highest-quality studies are being highlighted, regardless of the type of study design or methodology chosen, we aim to develop a refined rubric to mitigate our identified limitations.

Conclusions
As the first article to critically review the literature available for education in anesthesiology, we hope that this study will serve as the first manuscript in an annual series that will help individuals involved in anesthesiology education gain an understanding of the highest-quality research in the field. Once this process is repeated, trends can be tracked and serve as a resource to educators and researchers in anesthesiology for years to come.