Introduction
Patient-reported outcome (PRO) measures are increasingly used to assess the impact of interventions on health-related quality of life (HRQOL) in clinical care and in research. However, interpretation of meaningful change over time can be challenging. One approach to interpreting score changes is tied to the concept of
meaningful score differences (MSD) introduced by the U.S. Food & Drug Administration [
1]. In this work, we define MSD as the smallest change in scores that is perceived by patients as meaningful [
2]. When conceptualizing treatment benefit, MSDs can be used in interventional studies “to evaluate the expected treatment effect for the average patient in a target population” and in observational studies “as a threshold in descriptive analyses that identify individual patients who might have changed by a meaningful amount” (pg. 20) [
1].
Methods to identify an MSD are often reliant on an external anchor variable, separate from the targeted PRO measure but significantly related to the measured concept of interest [
1,
3]. Differences or changes in PRO scores are then interpreted in relation to differences or values on the anchor variable. Recent recommendations support using multiple anchors and reporting ranges of MSD values, as opposed to one static value [
4,
5]. It follows that changing the anchor may change the ultimate identification of specific MSDs for PRO measures, which in some cases, may be significant enough to change the downstream interpretation of scores and decision-making based on those scores [
6]. Thus, the choice and quality of anchoring variables is incredibly important to the ultimate implementation of PRO measures to inform decision making.
The NIH’s Patient-Reported Outcomes Measurement Information System
® (PROMIS) measures offer a unique opportunity for exploring changes in HRQOL. PROMIS Pediatric measures are commonly used in pediatric rheumatology research and increasingly in clinical care to measure the impact of these chronic conditions on how children feel and function [
7]. There is a larger body of work exploring MSDs for PROMIS measures in adults, [
2] but few studies explore the thresholds for pediatric populations. Further, the few pediatric studies are limited by the exclusion of rheumatic disease patients [
8] or small sample sizes [
9]. Interpreting meaningful difference is especially important and challenging in pediatric rheumatic disease due to their chronic nature. For example, juvenile idiopathic arthritis (JIA) and systemic lupus erythematosus (SLE) are chronic, inflammatory conditions characterized by periodic “flares” of disease activity. While many highly effective treatments exist, patients may experience symptoms such as pain and fatigue even after apparent resolution of inflammatory disease activity [
27,
28]. Thus, patient reported impact within meaningful health domains like physical activity, pain, and fatigue, may differ and fluctuate within the context of illness course.
The goal of this study was to examine the appropriateness of multiple external anchors for the PROMIS Pediatric measures of Physical Activity, Fatigue, Pain Interference, and Mobility for patients with JIA and SLE. We report the MSD ranges for JIA and SLE for those anchors that were significantly related to each PROMIS Pediatric measure. We included and reported results separately for JIA and SLE to explore how estimates varied by domain in two different patient populations with different disease manifestations, drug exposure, and underlying sociodemographic characteristics. Further, we discuss heterogeneity across the resulting estimates, as well as to other MSDs identified using alternative methods [
8,
9]. Since our focus was on
meaningful change in the patient-reported scores, we proposed that the
patient-reported GIC anchor would be most optimal for this purpose. We also hypothesized that it would most strongly relate to changes in the PROMIS Pediatric scores for each domain, in comparison to the other candidate anchors.
Discussion
In this study, we examined a number of external anchoring variables with the goal of evaluating their appropriateness while also identifying MSD ranges for several PROMIS Pediatric measures in children with JIA and SLE. We evaluated the quality of each candidate anchor using
a priori criteria: a sufficient relationship between the anchor and the scores as defined as a correlation of at least >0.3 between the anchor and the change in scores over time, sufficient sample size (n≥10), and no more than 10% missing in the responses. When performing that evaluation, we found that the candidate anchoring variables varied in the strength of their relationship with changes in PROMIS scores over time and across measures/domains. Interestingly, the patient-reported GIC for each domain, which was expected to have the strongest relationship with change scores, performed below pre-specified anchor criteria in more than half of the scenarios. Particularly for patients with JIA, whose average HRQoL was generally stable over the study period, GIC did not meet criteria for any PROMIS measure, but was marginal for Fatigue (r=0.29 vs 0.3 criteria; Table
4).
Further complicating the MSD estimates, the observed correlations between GIC and scores at T2 were typically stronger compared to those at T1, suggesting that the children’s current state at T2 (when GIC is collected) may have influenced their GIC response more than change over time in a given domain. This phenomena has been frequently observed in adult samples [
24,
25] and pediatric studies using daily diary reports [
26]. For patients with SLE, the GIC performed better as an anchor than in JIA. Even with lower sample sizes, we were able to estimate MSDs for patients that improved on all four PROMIS measures.
Clinician- and parent-reported anchors also performed differently across the four PROMIS measures and disease groups. The composite clinical activity measures performed best, with the JADAS meeting criteria for Fatigue, Mobility, and Pain for children with JIA. The SLEDAI met criteria for Mobility, Pain Interference, and Physical Activity. The global measures met criteria for some domains; the physician-global measure of disease activity met criteria as an anchor for Fatigue in children with SLE and for Mobility and Pain Interference for children with JIA. Changes in parent global measures of disease activity also met anchoring criteria for Mobility and Pain Interference for JIA, but only met criteria for Pain Interference in SLE. Active joint count met criteria as an anchor for Mobility in children with JIA, which makes sense clinically. This may reflect differences in salient symptom(s) that clinicians and parents use in responding to global questions about disease activity between JIA and SLE.
When calculable, MSD values varied across domains, diagnoses, and direction of change (improvement vs worsening; Table
6). Notably, standard deviations for all MSD values were quite large, reflecting heterogeneity in the sample, small sample sizes, and low confidence in the point estimates. Using GICs as an anchor for patients with SLE, the MSDs were >4.5 points for patients getting “better”, comparable to some of the MDCs calculated using distribution-based estimates (i.e. fatigue MDC=5.7 points, fatigue MSD = -5.4 points). However, this did not hold in all cases, and variability existed even within a domain, anchor, and diagnosis, depending on the patient’s direction of change (e.g. MSD
JIA,GIC,fatiguebetter=−0.3 vs MSD
JIA,GIC,fatigueworse=7.2). The vast range of MSDs make it difficult to utilize these estimates confidently in decision-making.
In previous work, Thissen et al (2016) established potential values for meaningful change for PROMIS Pediatric measures using scale-judgement methods in children and adolescents diagnosed with cancer, asthma, sickle cell disease, and nephrotic syndrome. Using this method, a value of approximately 3 points was defined as ‘important’ for PROMIS Pediatric Depressive Symptoms, Pain Interference, Fatigue, and Mobility scales [
8]. For domains where the GIC met
a priori criteria as an anchor, the MSDs identified using our method were generally larger in magnitude than those using the scale-judgement method. In another study using standard setting methodology to identify minimally important differences for JIA patients, [
9] the study team reported similar variations in the thresholds relevant to severity of initial status, domain, and type of measure (clinician, parent, and patient) to those observed in our data.
Both JIA and SLE are chronic, inflammatory conditions that present with periodic “flares” of disease activity. While PROMIS scores offer opportunities for standardized, patient-oriented assessment in clinical care, interpreting changes in scores and appropriate response (e.g., starting or stopping a medication) is difficult without established MSDs. In this study, performance of the candidate anchors was likely limited by the stability in the domains over the 6-month study period (as seen in registry studies where disease is well controlled) and small sample sizes (particularly for participants who “worsened” on each domain, and those with SLE). As recommended by other published studies, [
9] longitudinal qualitative work to identify meaningful changes with a relatively large and diverse set of patients surveyed at more frequent intervals may be more useful for these purposes and allow for more advanced methods. This type of data collection ideally could elicit heterogeneity in how stakeholders define
meaningfulness. Further, if data was tied to a known intervention or other use-case (e.g. before/after experiencing a flare and receiving treatment), detailed information could be gathered regarding context-specific MSD. Meaningfulness could also be conceptualized differently in different subgroups of patients and cultures, which was not explored in the current study. It is also possible that the longitudinal follow-up period of six months influenced children’s accuracy of recall of change in the studied domains [
29].
In conclusion, in this study, many of the candidate variables exhibited poor performance as anchors. Notably, the GIC variables, even with strong conceptual overlap with the HRQOL measure, most often did not meet criteria for use, especially for patients with JIA. This is an important contribution to the field, as GICs are often cited as the top anchors [
1]. For observational studies with a similar follow-up period, disease activity indices (JADAS & SLEDAI) may be more useful as anchors, as they had the best performance overall. Unsurprisingly, when anchoring variables met pre-specified criteria, the choice of anchoring variable had a strong impact on the estimated MSD values which differed across PROMIS Pediatric measure (Mobility, Fatigue, Pain Interference, and Physical Activity), diagnosis (SLE vs JIA), and direction of change (better vs worse). The estimated MSDs also differ from other studies reporting MSDs using different methods, and between JIA and SLE indicating that disease specific estimations of what is ‘meaningful’ are needed. Researchers and clinicians should carefully consider which anchors (if any) provide information appropriate for their specific context of use and consider whether using a range of MSDs would be most helpful. Further, for research and clinicians designing studies to identify MSDs that are meaningful to patients, carefully considering the candidate anchoring variables and identifying sources of heterogeneity in these value judgements would be extremely important.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.