The fragility of statistical findings in the reverse total shoulder arthroplasty literature: a systematic review of randomized controlled trials

Journal of Shoulder and Elbow Surgery


Background: Reverse total shoulder arthroplasty (RTSA) has seen increasing utilization as an effective intervention for a wide variety of shoulder pathologies. The scope and indications for growth are often driven by findings from randomized controlled trials (RCTs) guiding surgical decision-making for RTSA. In this study, we utilized the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to assess the robustness of outcomes reported in RCTs in the RTSA literature. Methods: PubMed, Embase, and MEDLINE were queried for RCTs (Jan. 1, 2010-Mar. 31, 2023) in the RTSA literature reporting dichotomous outcomes. The FI and rFI were defined as the number of outcome reversals required to alter statistical significance for significant and nonsignificant outcomes, respectively. The FQ was determined by dividing the FI by the sample size of each study. Subgroup analysis was performed based on outcome category. Results: One hundred seventy-six RCTs were screened with 18 studies included. The median FI across 59 total outcomes was 4 (interquartile range [IQR]: 3-5) with an associated FQ of 0.051 (IQR: 0.029-0.065). Thirteen outcomes were statistically significant with a median FI of 3 (IQR: 1-4) and FQ of 0.033 (IQR: 0.012-0.066). Forty-six outcomes were nonsignificant with a median rFI of 4 (IQR: 3-5) and FQ of 0.055 (IQR: 0.032-0.065). The most fragile outcome category was revision/reoperations with a median FI of 2.50 (IQR: 1.00-3.25), followed by clinical score/outcome (median FI: 3.00), complications (median FI: 4.00), “other” (median FI: 4.00), and radiographic findings (median FI: 5.00). Notably, the number of patients lost to follow-up was greater than or equal to the FI for 59% of outcomes. Conclusion: The statistical findings in RTSA RCTs are fragile and should be interpreted with caution. Reversal of only a few outcomes, or maintaining postoperative follow-up, may be sufficient to alter significance of study findings. We recommend standardized reporting of P values with FI and FQ metrics to allow clinicians to effectively assess the robustness of study findings.

