Enhancing Consistency in Posterior Malleolus Fracture Classification: A Comprehensive Interobserver Reliability Study With 20 Raters Using the Mason & Molloy Classification

Introduction: Over the past decade, there has been a growing interest in the identification and treatment of posterior malleolus fragments, driven by a better understanding of their significance. The Mason & Molloy (M&M) classification system has emerged as a valuable tool for systematically categorizing these fractures and assisting clinicians in formulating treatment. We aim to assess the interobserver reliability of the M&M classification for posterior malleolus fracture by using 20 raters. Methodology: The study was conducted at a major foot and ankle referral center in Wythenshawe, Manchester, UK. Thirty-eight Computed Tomography (CT) scans were evaluated by 20 independent raters: 15 general orthopedic and trauma surgeons plus five foot and ankle surgeons. Each rater classified the posterior malleolus fracture according to M&M classification into type 1, 2A, 2B, 3, or not classifiable. Statistical analysis was done with the R software package and SPSS (v26; IBM Corp., Armonk, NY). Fleiss kappa (κ) coefficient with a 95% confidence interval (CI) was applied. Results: The interobserver agreement was moderate with a global κ value of 0.531 (95% CI: 0.518, 0.544). There were good agreements for identifying type 3 M&M (κ=0.785) and those that are not applicable for M&M classification (κ=0.785). There was a strong correlation between all raters in using M&M classification (Tb=0.53-0.59) except for Rater 12. Conclusion: M&M classification remains a valuable tool to guide the management of patients with these subsets of ankle fractures.


Introduction
Fractures involving the posterior malleolus can manifest independently, although they are frequently concurrent with fractures of the medial and lateral malleoli.Approximately 44%-50% of all ankle fractures involve the posterior malleolus [1][2][3].
Over the past decade, there has been a growing interest in the identification and treatment of posterior malleolus fragments, driven by a better understanding of their significance.Traditionally, authors have advocated to address these fragments when they encompass more than 25% of the plafond [4].Treatment suggested then was mostly insertion of screws in an anterior-posterior direction during ankle open reduction and internal fixation (ORIF) [5].However, in recent times, the increased availability of computed tomography (CT) scans has led to a heightened recognition of the change in posterior malleolus fracture management [6].
Contemporary strategies for stabilizing the posterior malleolus fragment typically involve direct fixation, often utilizing either a posterolateral or posteromedial approach [7][8][9].This approach enables reattachment of the posterior inferior tibiofibular ligament (PITFL) to achieve bone-to-bone union and also contributes significantly to the stability of the posterior ankle joint and syndesmosis [10,11].
A study conducted by Jeyaseelan et al. showcased substantial improvement in clinical outcomes over a minimum follow-up period of two years for 320 patients who underwent treatment for posterior malleolus fractures, in contrast to those left untreated.However, the study also highlighted a 10% heightened risk of complications and a twofold increase in re-operation rates, primarily attributed to hardware-related issues [12].Notable classifications for posterior malleolus fracture were proposed by Haraguchi et al. [13] in 2006, followed by Bartonicek et al. in 2015 [14] and the latest by Mason & Molloy (M&M) in 2017 [15].
The M&M CT-based classification system utilized scans from 121 patients, describing four types based on CT scans, anatomical and patho-mechanical considerations.The four types included are type 1 (extra-articular avulsion of the distal posterior tibial cortex by the PITFL-34% prevalence); type 2A (primary fragment of the posterolateral Volkmann area extending into the incisura-25% prevalence); type 2B (primary fragment of the posterolateral Volkmann area extending into the incisura plus a secondary fragment of the posteromedial aspect of the tibia-21% prevalence); and type 3 (fracture extending across the whole posterior plafond-21% prevalence).The authors proposed a surgical algorithm bound by the classification.They reported an interobserver reliability Cohen kappa (κ) value of 0.919 (between two fellowship-trained foot and ankle surgeons) [15].
The M&M classification system has emerged as a valuable tool for systematically categorizing these fractures and assisting clinicians in formulating informed treatment decisions.However, the consistent application and reliability of this classification system across a bigger number of observers remain subjects of crucial inquiry.Hence, our study aims to assess the interobserver reliability of the M&M classification for posterior malleolus fractures, building upon the foundation laid by previous research.

Materials And Methods
The study was conducted at a major foot and ankle referral center in Wythenshawe, Manchester UK.A retrospective search of patients with ankle fractures treated between January 2022 and January 2023 was performed at the institution.Inclusion criteria were patients 18 years and older who had a preoperative ankle CT scan with a posterior malleolus fracture.Exclusion criteria were previous history of ankle fracture, infection, tumor, or surgery.In addition, posterior pilon fractures [14], were excluded.
CT images of 38 patients were evaluated using the digital imaging software, Picture Archive and Communication Systems (PACS) by 20 independent raters: 15 general orthopedic and trauma surgeons plus five foot and ankle surgeons.The raters were exposed to posterior malleolar classifications described by M&M through a direct briefing session before the evaluation.The evaluation was carried out during a regional physical meeting among orthopedic surgeons at our center.Each rater classified the fracture of the posterior malleolus according to M&M classification into type 1, 2A, 2B, 3, or not classifiable.The first response of each evaluator was compared to obtain the interobserver reliability.
Statistical analysis was done with R software package and SPSS (v26; IBM Corp., Armonk, NY).Fleiss kappa (κ) coefficient with 95% confidence interval (CI) was applied to determine the interobserver agreement.Levels of agreement for κ were estimated as proposed by Landis and Koch [16].
κ values of 0.00 to 0.20 were considered slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, almost perfect agreement [16].Individual interobserver agreement was also assessed to look for correlation between the raters.

Results
A total of 20 raters evaluated 38 CT pictograms of posterior malleolus fracture using the M&M classification.The interobserver agreement was moderate with a global κ value of 0.531 (95% CI: 0.518, 0.544).There were good agreements for identifying type 3 M&M (κ=0.785) and those that are not applicable for M&M classification (κ=0.785)(Table 1).

Discussion
The assessment of interobserver reliability is pivotal in validating the clinical utility of any classification system, as it underpins consistent treatment decisions and patient management.Our findings revealed that the interobserver agreement for M&M classification was moderate with a global κ value of 0.531 (95% CI: 0.518, 0.544).This is comparable with previous studies in the literature.Mustafa et al. compared all the three classifications mentioned earlier and the M&M classification had moderate interobserver reliability with a Fleiss κ value of 0.541 (SE 0.098).This was a study with nine raters [17].Another paper by Morales et al. also showed moderate interobserver reliability for the M&M classification with a global κ value of 0.54 (95% CI: 0.47-0.62),rated by six evaluators [18].It is also worth mentioning a study by Kleinertz et al. with four raters which presented Fleiss' κ value of 0.724 (95% CI: 0.674-0.774)[19].
While the interobserver reliability is moderate (two papers) and substantial (one paper), the number of raters in those studies is a point of paramount significance.A low number has the potential to yield higher rates of agreement among the raters and vice versa.In our study, we had 20 raters, which to our knowledge is the most in the literature for M&M classification.This shows that even with such a large number of raters, the interobserver reliability still remains moderate.It may not seem to be the most ideal classification system to be used, but it still carries a decent role among the larger group of observers.
Previous reliability and reproducibility studies have suggested that the observers' experience improves the reliability of a classification system [20][21][22].The intricate nature of posterior malleolus fractures, characterized by diverse fracture patterns inherently challenges a consistent classification [23,24].This complexity likely contributes to the variability in interpretations observed across different observers.
Additionally, the observers' varying levels of familiarity with the M&M classification system may have played a role in the degree of agreement.
Our study found that there is a correlation between the rater's experience and interobserver agreement of the Mason & Malloy classification, contrary to previous papers on this subject [17,18].Rater no. 12, a junior orthopedic surgeon, ranked lowest in agreement with the rest.We postulate that training level, exposure, and frequency of usage of this classification do influence interobserver reliability.In their landmark paper, Mason et al. who conducted the study with an agreement analysis, reported almost perfect interobserver agreement (κ = 0.92) [15].However, the blinded evaluation was performed by two fellowship-trained senior foot and ankle surgeons.
It is worth highlighting another significant finding in our analysis.Type 1 M&M performed poorly in comparison to other types for the interobserver agreement.Type 3 and the ones that fell under the unclassifiable category had good identification agreement by the 20 raters.This can be attributed to the lack of a clear definition of type 1 M&M in its original description [15].It is also not uncommon for extraarticular cortical avulsion of the distal posterior tibia to be variable in size and number of fragments.There may also be some crossover to type 2A.Further direction to improve the classification is probably by subdivision of type 1 with emphasis on the fracture fragment size.
Our study is not without its limitations.

Conclusions
Our study contributes valuable insight into the interobserver reliability of the M&M classification for posterior malleolus fractures among a large number of evaluators.The moderate level of agreement observed among the 20 raters here underscores the constantly essential need for refinement in posterior malleolus fracture classification.However, in the interim of producing a more ideal classification system, M&M classification remains a valuable tool to guide the management of patients with these subsets of ankle fracture.In the bigger picture, a concerted effort toward enhancing the reliability of classification systems is important.This ensures that treatment decisions are based on accurate, consistent and easily communicable categorizations, ultimately improving outcomes for patients, especially the ones with posterior malleolus fracture.

TABLE 2 : Inter-rater agreements among 20 raters.
Note: Agreement was assessed with Kendall's tau-b.**Correlation is significant at the 0.01 level (two-tailed) * Correlation is significant at the 0.05 level (two-tailed) The sample size was obtained through convenience sampling.Whilst it offers simplicity in data collection, it may over or underrepresent certain types of fractures.It may not reflect the entire posterior malleolus fracture geometry in the wider group of ankle fractures that are present in our institution.We also did not perform a comparison across the other classification systems for posterior malleolus fracture, namely the ones by Haraguchi et al. and Bartonicek et al.This was to ensure that our focus is the one classification that is widely used at our center.Moving forward, strategies to improve interobserver reliability merit exploration.Standardized educational interventions, coupled with rigorous training in the application of the M&M classification system, have the potential to align observers' interpretations and reduce variability.Additionally, investigating the link between interobserver reliability and patient outcomes remains a promising avenue for understanding the clinical implications of classification consistency.