Minimum United States Medical Licensing Examination and Comprehensive Osteopathic Medical Licensing Examination Scores Often Do Not Align

Background Many residency programs do not accept the Comprehensive Osteopathic Medical Licensing Examination (COMLEX-USA) alone for osteopathic applicants. Furthermore, among those programs that do accept the COMLEX-USA, it is unknown how programs scale their minimum COMLEX-USA scores compared to their minimum United States Medical Licensing Examination (USMLE) scores. Objective Our objective was to examine the variation of relative within-program differences between minimum USMLE Step and COMLEX-USA Level scores required for consideration by United States residency programs. Methods We performed a cross-sectional analysis of the Fellowship and Residency Electronic Interactive Database Access (FREIDA) database from April 2023, including the 10 specialties with the most training spots in 2022. These specialties were internal medicine, family medicine, pediatrics, emergency medicine, psychiatry, surgery, anesthesiology, obstetrics-gynecology, orthopedic surgery, and neurology. Within-program differences were calculated by subtracting the minimum USMLE Step 1 and 2 scores from the converted minimum USMLE Step 1 and 2 scores calculated from the minimum COMLEX-USA Level 1 and 2 scores using two conversion tools. We present differences as medians with interquartile ranges (IQR). Additionally, we report the proportion of programs with greater than 10-point differences for each step (1 and 2). Results Of the 3,364 accredited programs from the examined specialties, we included 1,477 in the Step 1 analysis and 1,227 in the Step 2 analysis with complete data. The median within-program difference between the minimum Step 1 score and the predicted Step 1 score was 12.0 (IQR 2.0, 17.0) using the Barnum and colleagues’ conversion tool and -1.7 (IQR -6.2, 6.3) using the Smith and colleagues’ tool. The median differences for Step 2 were 2.0 (IQR -8.0, 12.0) and -6.5 (IQR -13.9, -1.5) for each tool, respectively. Using the Barnum and Smith conversion tools, 937 (63%) and 435 (29%) programs had a greater than 10-point Step 1 score difference, respectively. Similarly, for Step 2, 564 (46%) and 515 (42%) programs had a greater than 10-point difference with each conversion tool. Conclusion There is wide variation in the within-program differences between minimum USMLE and predicted minimum USMLE (from COMLEX-USA) scores. Many programs have greater than 10-point differences, which may be a source of bias in osteopathic applicant selection.


Introduction
United States (US) postgraduate residency programs often screen applicants using filters in the Electronic Residency Application Service (ERAS) [1,2].Programs use filters to select competitive applicants and reduce the number of applications for review [3].Common filters utilized by programs include minimum board examination scores [1,4].The Comprehensive Osteopathic Medical Licensing Examination (COMLEX-USA) Levels are a series of board examinations required for osteopathic students and graduates to complete training and progress to medical licensure [5].In addition to the COMLEX-USA, osteopathic applicants are eligible to complete the US Medical Licensing Examination (USMLE) Steps, the series of board examinations required for graduates of US allopathic medical schools [6].Therefore, for osteopathic applicants, these minimum board examination filters include both the COMLEX-USA Levels and USMLE Steps for those who take it [1,4].
These minimum board score filters create several barriers for osteopathic applicants applying to residency programs.First, although not needed for medical licensure or admittance to residency, many programs expect osteopathic applicants to complete the USMLE to be considered [1].For example, among 1,095 programs responding to the 2022 National Resident Matching Program (NRMP) Program Director Survey, 56% and 46% required a passing or target score on the USMLE Steps 1 and 2, respectively, for osteopathic applicants [1].Although this expectation allows programs to compare scores between applicants directly, the prerequisite to complete the USMLE for osteopathic applicants creates a significant financial and opportunity cost for those who choose to complete it [7].
Furthermore, for those osteopathic applicants who choose not to complete the USMLE, the COMLEX-USA scores are scaled differently and are not directly comparable to USMLE scores [8].Therefore, tools exist for programs to convert COMLEX-USA scores to predicted USMLE scores facilitating a more direct comparison of board examination performance [8][9][10].However, these tools produce estimates with wide standard errors [9][10][11], and they are fundamentally prone to selection bias given that they are derived from osteopathic applicants who took both examinations but are targeted for use on those who do not take the USMLE [8][9][10].
Nevertheless, COMLEX-USA and USMLE scores correlate well among osteopathic students who take both [8][9][10]12], and conversion tools are an objective method for programs to compare scores between the two examinations [8][9][10].Therefore, COMLEX-USA to USMLE conversion tools are an objective method for programs to select minimum board examination scores that are comparable between the examinations.However, the prevalence of COMLEX-USA to USMLE conversion tool use among postgraduate training programs is unknown.Since the use of these conversion tools would likely result in small within-program differences between the reported minimum USMLE Step scores and predicted minimum Step scores converted from their minimum COMLEX-USA Level scores, our purpose was to examine the variation of these within-program minimum board examination score differences.We hypothesized that within-program minimum board examination score differences would be small given that these conversion tools are freely available and allow programs to compare relative performance between the different board examinations [8][9][10].

Study design and setting
We performed a cross-sectional analysis of the

Participants
We included the 10 medical specialties with the most training spots in 2022, including internal medicine, family medicine, pediatrics, emergency medicine, psychiatry, surgery, anesthesiology, obstetricsgynecology, orthopedic surgery, and neurology [13].We only included these larger specialties to limit potential bias from smaller specialties with fewer osteopathic applicants.We excluded programs in the Step 1 analysis that did not report both minimum Step 1 and Level 1 scores and excluded programs in the Step 2 analysis that did not report both minimum Step 2 and Level 2 scores.

Variables
Variables analyzed included minimum board examination scores (Levels 1 and 2, Steps 1 and 2), specialty, program type (university hospital, community hospital, community-based university-affiliated hospital, military-based, other), percent of osteopathic residents (3-year average), and osteopathic recognition.The primary outcome was the within-program differences between the reported minimum USMLE Step scores and predicted minimum Step scores converted from their minimum COMLEX-USA Level scores.

Statistical analysis
Minimum Step scores were compared to minimum predicted Step scores calculated from Level scores using two conversion tools for each Step examination [8][9][10].Barnum

and colleagues presented a table of
Step/Level ranges that align [8].Therefore, for each minimum Level score, the score was converted to a predicted Step score using a linear scale within each corresponding range.In contrast, Smith and colleagues presented regression formulas that we used to convert each minimum Level score to a predicted Step score [9,10].
We described continuous variables with medians and interquartile ranges (IQR) since all were nonnormal as determined by the Shapiro-Wilk normality test.We described categorical variables with counts and percentages.Confidence intervals (CI) for the median within-program differences between each program's minimum USMLE and predicted USMLE scores were calculated from the standard deviation (SD) of medians from 1,000 bootstrap resamples.Bootstrap methods have the advantage of not assuming a normal distribution [14].We examined the variation of within-program differences between minimum Step and Level scores by plotting the within-program differences between minimum Step and minimum predicted Step scores using histograms.Additionally, we reported the proportion of programs with board score differences greater than various thresholds, but in particular, greater than ten points on the USMLE.We felt this threshold was reasonable to identify potential outliers since the NRMP charting outcome reports stratify USMLE scores by ten-point increments, and a difference of ten points would likely impact match success given the known distributions of matched and unmatched senior osteopathic student applicants by the Step score in the included specialties [15].CIs for the median differences between within-program predicted USMLE scores from the two conversion tools for each Step examination were calculated from the SD of median differences from 1,000 bootstrap resamples.
We did not perform any a priori sample size calculations because our objective was to examine the variation of within-program minimum board score differences and not perform hypothesis testing between reported and predicted USMLE scores.Nevertheless, where reported, CIs were considered significant if they did not include zero.Analyses were performed with R (Version 4.3.02023-04-21, R Foundation for Statistical Computing, Vienna, Austria) [16].Packages and package versions are listed in the appendix.The study protocol was reviewed by Allegheny Health Network Institutional Review Board staff and was not considered human subject research.

Results
Of the 3,364 programs in the FREIDA database from the included specialties,  The ranges of within-program differences between minimum USMLE and predicted USMLE scores were wide (Figure 1).The median within-program difference between the minimum Step 1 score and the predicted Step 1 score was 12.0 (IQR 2.0, 17.0) using the Barnum and colleagues' conversion tool and -1.7 (IQR -6.2, 6.3) using the Smith and colleagues' tool.The median differences for Step 2 were 2.0 (IQR -8.0, 12.0) and -6.5 (IQR -13.9, -1.5) for each tool, respectively (Table 2).The number of Step 1 outliers (difference >10 points) was 937 (63%) using the Barnum and colleagues' conversion tool and 435 (29%) using the Smith and colleagues' tool.The number of Step 2 outliers was 564 (46%) using the Barnum and colleagues' conversion tool and 515 (42%) using the Smith and colleagues' tool (Table 2, Figure 2).Predicted minimum USMLE scores differed between the Barnum and Smith conversion tools, with a median difference of -11.9 (95% CI -13.7, -10.1) for Step 1 and -9.3 (-10.2, -8.5) for Step 2 (Figures 3, 4).The histograms display the distribution of within-program differences between minimum USMLE scores and predicted minimum USMLE scores converted from COMLEX-USA scores using two conversion tools.Step 2 -6.5 (-6.6, -6.4) (-13.9, -1.5) 515 / 1,227 42   The figure displays the difference between the predicted minimum USMLE Step 2 scores from the two conversion tools.Lines connect estimates for each program.Bootstrap methods were used to estimate the median difference between the predicted minimum scores for each tool.

Discussion
Within-program differences between the minimum USMLE and predicted minimum USMLE (from COMLEX-USA) scores varied greatly between programs, and the variation differed by conversion tool (Table 2, Figure 1).For example, up to 63% of programs had a minimum board score difference exceeding ten points (Table 2) [15].However, the median difference between the predicted USMLE scores between the studied conversion tools was about ten for both Steps 1 and 2 (Figures 3, 4); therefore, variable use of different conversion tools between programs may contribute to this variation.Nevertheless, these results suggest that minimum USMLE and COMLEX-USA scores often do not align and have the potential to unfairly influence osteopathic applicant selection in many programs (Figure 1).However, this unfair influence is bidirectional, both favoring and hindering osteopathic applicants without USMLE scores depending on the program (Figure 1).
COMLEX-USA to USMLE conversion tools provide programs with an objective method to establish comparable minimum board scores, facilitating comparisons of academic performance between osteopathic and allopathic applicants [8][9][10].However, these conversion tools are less ideal than judging all applicants by their performance on the same examinations [17].Therefore, many programs require or prefer completion of the USMLE for osteopathic applicants to be considered equally [18,19].Since these COMLEX-USA to USMLE conversion tools have significant limitations [11] and completion of two similar series of board examinations is needless redundancy [7], more equitable solutions regarding minimum board examination scores are necessary to allow fairness in the match process for osteopathic applicants.
Lastly, these results generate further questions regarding how programs interpret applicant COMLEX-USA scores.The wide variability in differences between minimum USMLE and predicted minimum USMLE scores (converted from COMLEX-USA scores) suggests that there may be wide variability between programs regarding how they interpret the full range of applicant COMLEX-USA scores.Furthermore, this variability in interpreting COMLEX-USA scores may contribute to the lower match success for osteopathic students who do not complete the USMLE [20].Stakeholders in the match process must work together to find equitable solutions to this variability [17,21].

Limitations
First, many programs do not report both minimum USMLE and COMLEX-USA scores.Therefore, the results of this study cannot be generalized to all residency programs, including programs that do not accept COMLEX-USA alone or do not disclose minimum board score requirements for both examinations.Similarly, programs were more often from family medicine and community-based, university-affiliated hospitals limiting generalizability.Also, this observational data cannot provide causal conclusions about the use of application filters or COMLEX-USA to USMLE conversion tools and match outcomes, nor can it provide definitive evidence for bias in the match process.
We only evaluated the two most recently published COMLEX-USA to USMLE conversion tools; however, older tools exist [22,23].Therefore, our results may be biased by some programs using older conversion tools.Also, we assessed the conversion tools individually, which may bias the results since not all programs would be expected to use the same conversion tool.Furthermore, we observed significant differences in predicted minimum USMLE scores between the studied conversion tools (Figures 3, 4), suggesting that these tools are not interchangeable.Nevertheless, an ideal COMLEX-USA to USMLE conversion tool does not exist because the tools are derived from candidates who have taken both exams but is intended for use by programs to interpret the COMLEX-USA scores of those osteopathic applicants who have not taken the USMLE [8][9][10].Furthermore, osteopathic candidates who complete both examinations perform better on the COMLEX-USA than those who did not take the USMLE, suggesting they are likely more competitive applicants [8].For example, Barnum and colleagues reported that the mean COMLEX-USA Level 1 score of their study cohort used to derive their tool was 25.4 points greater than the overall Level 1 score [8].Similarly, their cohort's mean Level 2 score was 44.0 points greater than the overall mean [8].However, the impact of these observed within-program minimum board examination score differences may not be practically significant given the fewer applicants that apply to each specialty with scores near common minimum score thresholds [15].Lastly, Step 1 and Level 1 have transitioned to pass/fail scoring; therefore, it is unknown how this transition may impact the use of minimum board examination scores and scaling of minimum USMLE and COMLEX-USA scores by programs in the future [24].

Conclusions
There is wide variation in the within-program differences between minimum USMLE and predicted USMLE (converted from COMLEX-USA) scores.These results suggest that within-program differences between minimum USMLE and COMLEX-USA scores may be a source of bias in osteopathic applicant selection.

Appendices FIGURE 5: R Version and Packages
The figure displays the R version and packages used for the analysis.

FIGURE 1 :
FIGURE 1: Differences Between the Minimum USMLE Score and Predicted Minimum USMLE Score from the COMLEX-USA Score

FIGURE 2 :
FIGURE 2: Outliers By Threshold for Within-Program Differences Between Minimum Board Scores The figure displays the proportion of outliers by score threshold for within-program differences between minimum USMLE scores and predicted minimum USMLE scores calculated from the minimum COMLEX-USA scores.USMLE, United States Medical Licensing Examination; COMLEX-USA, Comprehensive Osteopathic Medical Licensing Examination

FIGURE 4 :
FIGURE 4: Difference Between COMLEX-USA to USMLE Conversion Tools -Step 2 1,477 (43.9%) met inclusion criteria in the Step 1 analysis and 1,227 (36.5%) in the Step 2 analysis with complete minimum board score data.The median minimum COMLEX-USA scores were 458 (IQR 400, 500) for Level 1 and 475 (IQR 420, 510) for Level 2. The median minimum USMLE scores were 210 (IQR 200, 220) for Step 1 and 215 (IQR 209, 224) for Step 2. The most common specialty was family medicine and the most common program type was community-based, university-affiliated hospitals (Table 1).