Introduction
The development of the Patient-Reported Outcome Measurement Information System (PROMIS
®) [
1,
2] was supported by a National Institutes of Health initiative to evaluate and monitor physical, mental, and social health. PROMIS measures are constructed following established guidelines for patient-reported outcome measures (PROM) development and evaluation [
3]. The PROMIS library includes a variety of measurement options, including numerous domain measures that can be administered individually as fixed-length short forms as well as pre-packaged forms called PROMIS Profiles that assess seven multi-item domains: anxiety, depression, fatigue, pain interference, physical function, sleep disturbance, and ability to participate in social roles and activities, and include a single item measure of pain intensity. The seven domains are assessed with 4, 6, or 8 items each, resulting in Profile measures of 29, 43, or 57 items [
4]. Moreover, the domains can be combined into mental and physical health summary scores [
5,
6]. The PROMIS Profiles have seen increased adoption in clinical and health research settings [
4,
7] given their efficiency, flexibility, and sensitivity, as well as their ability to characterize specific health domains and generate summary scores [
8].
Until recently, the shortest options were the PROMIS-29 or PROMIS-29 + 2 [
2,
4], which adds two items for the cognitive function-abilities domain. However, concerns over the burden of overall length may lead some to opt for alternative shorter measures such as the Global-10 [
9]. While shorter, the Global-10 lacks the domain score specificity of the longer PROMIS-29. In response, Edelen et al. [
10] developed the PROMIS-16, an ultra-brief measure spanning the same eight health-related quality of life domains as the PROMIS 29 + 2, using only two items per domain. Given that this measure is minimally burdensome and can generate domain-specific scores, physical and mental summary scores, and PROMIS preference scores [
6,
10], there is a utility for this measure in research and routine clinical care. Initial validation work by Zeng et al. [
11] found that correlations among corresponding PROMIS-16 and the PROMIS-29 + 2 scores were strong and mean scores were similar. But, no studies have yet compared longitudinal change over time in PROMIS-16 and PROMIS 29 + 2 domain scores.
In clinical settings, repeated data collection allows for evaluating progress or decrements in each domain. Similarly, in research applications, tracking change as a function of an intervention is critical in determining whether the intervention is successful at changing a given outcome. The latent growth model (LGM) framework [
12,
13] is ideal for examining longitudinal processes and changes over time. Briefly, LGMs are designed to use repeated measurements to estimate growth factors - an intercept, most commonly a baseline score, and, at a minimum, a slope that characterizes linear change over time. With these parameters, we can identify the average initial status and the direction and magnitude of the average change over time. These methods are regularly applied to PROMIS data in intervention and observational research [
14,
15] and clinical settings [
16,
17]. While shorter PROMs are appealing in clinical and research settings, the question remains whether the same observable effects or change can be recovered with the PROMIS-16 ultra-short profile.
Using a sample of respondents with back pain, this work aims to evaluate the validity of the newly developed PROMIS-16 by comparing its mean baseline scores and longitudinal change estimates to the commonly used PROMIS 29 + 2 Profile measure. We also assess the predictive validity (i.e., the ability of a test or measurement to predict a future outcome) of the PROMIS-16 by comparing the statistical significance and magnitude of effects between latent growth factors derived from the PROMIS-16 versus the PROMIS-29 + 2 and several outcome measures (i.e., overall health rating and two disability indices). Specifically, the focus of this work is to evaluate the comparability of between-level parameter estimates (i.e., average initial status and average change) from the PROMIS-16 and PROMIS 29 + 2 as well as between-level prediction – i.e., the average change of PROMIS domains predicting average change of outcomes.
Results
Sociodemographic descriptives are presented in Table
1 to characterize the sample. The sample was primarily White (85%), with 9% identifying as Hispanic, and consisted of slightly more females (54%) than males. Over half of the participants (55%) were educated with a bachelor’s degree or higher and had an annual income of $50,000 or more (51%). The median age was 40 years (Interquartile range: 33–51). A complete set of PROMIS-16 and PROMIS-29 + 2 domain means and standard deviations, along with longitudinal outcome descriptives, is presented in Table
S1 in Supplemental Materials. We include spaghetti plots of individual trajectories for PROMIS-16 and PROMIS-29 + 2 domains, ODI, RMDQ, and overall health rating in Supplemental Materials.
Table 1
Demographic characteristics of the analytic sample (N = 1137)
Age (years, median, IQR) | 40 (33, 51) |
Race | |
White | 966 (85) |
Black or African American | 101 (9) |
Asian or Asian American | 80 (7) |
Native Hawaiian or Pacific Islander, Native American, Other races | 28 (2) |
Multiracial | 37 (3) |
Ethnicity | |
Non-Hispanic | 1031 (91) |
Hispanic | 106 (9) |
Gender | |
Female | 611 (54) |
Male | 519 (46) |
Transgender | 2 (0) |
Do not identify as female, male, or transgender | 5 (0) |
Education | |
Bachelor’s degree or higher | 625 (55) |
Annual income | |
Less than $49,999 | 560 (49) |
$50,000 - $99,999 | 417 (37) |
More than $100,000 | 160 (14) |
Model fit indices for separate growth models for PROMIS-16 and PROMIS 29 + 2 by domain are presented in Table
2. All models fit the data well. The concordance of a random intercept and slope scores for individuals between PROMIS-16 and PROMIS-29 was computed. For random intercepts, correlations between PROMIS-16 and PROMIS-29 + 2 were as follows: Anxiety (
r = .95), Cognitive Function (0.87), Depression (0.98), Fatigue (0.98), Pain Interference (0.98), Physical Function (0.94), Sleep Disturbance (0.89), and Social Roles (0.95). For random slopes, correlations between PROMIS-16 and PROMIS-29 + 2 were as follows: Anxiety (0.95), Cognitive Function (0.61), Depression (0.91), Fatigue (0.91), Pain Interference (0.94), Physical Function (0.73), Sleep Disturbance (0.69), and Social Roles (0.68).
Table 2
Model fit indices for latent growth models by domain for both PROMIS-16 (P16) and PROMIS-29 + 2 (P29 + 2)
Anxiety | | | | |
P16 | 10.746 (p = .013) | 0.048 | 0.990 | 0.032 |
P29 + 2 | 7.613 (p = .055) | 0.037 | 0.995 | 0.014 |
Cognitive Function | | | | |
P16 | 1.203 (p = .273) | 0.013 | 1 | 0.008 |
P29 + 2 | 0.076 (p = .783) | 0 | 1 | 0.002 |
Depression | | | | |
P16 | 1.248 (p = .264) | 0.015 | 1 | 0.005 |
P29 + 2 | 3.499 (p = .061) | 0.047 | 0.998 | 0.009 |
Fatigue | | | | |
P16 | 3.084 (p = .079) | 0.043 | 0.998 | 0.010 |
P29 + 2 | 10.455 (p = .002) | 0.047 | 0.993 | 0.056 |
Pain Interference | | | | |
P16 | 1.088 (p = .297) | 0.009 | 1 | 0.005 |
P29 + 2 | 0.874 (p = .350) | 0 | 1 | 0.004 |
Physical Function | | | | |
P16 | 2.638 (p = .104) | 0.038 | 0.998 | 0.009 |
P29 + 2 | 15.455 (p = .002) | 0.060 | 0.985 | 0.022 |
Sleep disturbance | | | | |
P16 | 0.455 (p = .500) | 0 | 1 | 0.004 |
P29 + 2 | 0.081 (p = .775) | 0 | 1 | 0.002 |
Social Roles | | | | |
P16 | 0.05 (p = .823) | 0 | 1 | 0.002 |
P29 + 2 | 0.209 (p = .647) | 0 | 1 | 0.003 |
Longitudinal validity
Growth parameters from parallel process models using the PROMIS-16 and PROMIS 29 + 2 scores for each domain are presented in Table
3. There were significant differences in the intercept parameter between PROMIS-16 and PROMIS 29 + 2 across all domains, except for anxiety, depression, and pain interference. That said, the magnitude of differences was small. In the T-score metric (i.e., Mean = 50, SD = 10), the difference between intercepts ranged from 0.12 (depression) to 1.29 (physical function). When transformed to Cohen’s d effect sizes, this range was from 0.02 to 0.13, with all but two being < 0.10 – all falling below the 0.2 small effect threshold [
31]. On the other hand, for the slopes (i.e., average change), there were no significant differences between PROMIS-16 and PROMIS 29 + 2 across all domains. For all but two domains, there was no significant average change across the three measurement time points—i.e., the slopes were not significantly different than zero. However, on average there was a significant decrease in physical function and pain interference over time.
Table 3
PROMIS-16 (P16) and PROMIS-29 + 2 (P29 + 2) growth parameters for all PROMIS domains and Wald tests of Equality constraints between profiles
Anxiety | | |
P16 | 56.161 (0.285), p < .001 | 0.166 (0.117), p = .156 |
P29 + 2 | 55.995 (0.281), p < .001 | 0.083 (0.111), p = .457 |
z-score, p-value | z = 0.41, p = .341 | z = 0.51, p = .610 |
Cognitive Function | | |
P16 | 48.752 (0.236), p < .001 | 0.180 (0.121), p = .135 |
P29 + 2 | 49.511 (0.224), p < .001 | 0.136 (0.122), p = .267 |
z-score, p-value | z = -2.33, p = .019 | z = 0.26, p = .795 |
Depression | | |
P16 | 55.054 (0.292), p < .001 | -0.069 (0.117), p = .553 |
P29 + 2 | 54.936 (0.295), p < .001 | -0.064 (0.114), p = .576 |
z-score, p-value | z = 0.28, p = .779 | z = -0.03, p = .910 |
Fatigue | | |
P16 | 52.650 (0.273), p < .001 | 0.169 (0.116), p = .147 |
P29 + 2 | 53.673 (0.285), p < .001 | 0.166 (0.113), p = .144 |
z-score, p-value | z = -2.59, p = .010 | z = 0.02, p = .976 |
Pain Interference | | |
P16 | 55.265 (0.234), p < .001 | -0.249 (0.115), p = .030 |
P29 + 2 | 55.005 (0.241), p < .001 | -0.222 (0.115), p = .053 |
z-score, p-value | z = 0.77, p = .441 | z = -0.17, p = .865 |
Physical Function | | |
P16 | 48.271 (0.239), p < .001 | -0.539 (0.097), p < .001 |
P29 + 2 | 46.986 (0.255), p < .001 | -0.610 (0.094), p < .001 |
z-score, p-value | z = 3.68, p < .001 | z = 0.53, p = .596 |
Sleep disturbance | | |
P16 | 52.250 (0.240), p < .001 | 0.150 (0.112), p = .180 |
P29 + 2 | 53.187 (0.262), p < .001 | 0.058 (0.114), p = .612 |
z-score, p-value | z = 2.64, p = .008 | z = 0.58, p = .562 |
Social Roles | | |
P16 | 50.029 (0.263), p < .001 | 0.022 (0.116), p = .846 |
P29 + 2 | 50.994 (0.268), p < .001 | -0.111 (0.113), p = .325 |
z-score, p-value | z = -2.57, p = .010 | z = 0.82, p = .412 |
Predictive validity
Table
4 presents regression coefficients for changes (i.e., slopes) in pain interference and physical function slopes predicting changes (i.e., slopes) in the ODI, RMDQ, and the overall health rating. Regression coefficients were not significantly different for all analyses between PROMIS-16 and PROMIS-29 + 2.
Table 4
Predictive validity regression coefficients for PROMIS-16 (P16) and PROMIS-29 + 2 (P29 + 2) slopes Predicting Outcome slopes
Oswestry Disability Index | | | | |
| 2.69 (0.61) p < .001 | 2.63 (0.58) p < .001 | -2.39 (0.54) p < .001 | -2.78 (0.64) p < .001 |
z-score, p-value | z = -0.07, p = .94 | z = -0.46, p = .64 |
Roland Morris Disability Questionnaire | | | | |
| 1.30 (0.43) p = .002 | 1.29 (0.44) p = .003 | -1.03 (0.38) p = .007 | -1.28 (0.54) p = .017 |
z-score, p-value | z = 0.02, p = .98 | z = 0.37, p = .71 |
Overall health rating | | | | |
| -0.14 (0.10) p = .169 | -0.14 (0.10) p = .163 | 0.08 (0.03) p = .009 | 0.11 (0.08) p = .127 |
z-score, p-value | z = -0.01, p = .99 | z = -0.37, p = .71 |
Discussion
This paper expands prior work by longitudinally validating the eight PROMIS-16 domains across three waves of data by comparing performance to the PROMIS-29 + 2 in a sample of individuals with chronic low back pain. Across all eight domains, the PROMIS-16 captured comparable average baseline scores and changes over time. Additionally, random intercept and slope scores for individuals were highly correlated between the PROMIS-16 and PROMIS-29 + 2. Further, mean changes in physical function and pain interference represented as slopes comparably predicted mean change in ODI scores, RMDQ scores, and overall health ratings. Preliminary between-level results support the longitudinal validity and utility of the ultra-short PROMIS-16 profile domain scores for use in clinical settings and research.
Average baseline scores (i.e., intercepts) for the corresponding PROMIS-16 and PROMIS 29 + 2 domains were significantly different, but the effect sizes were minimal. Given the relatively large sample size, this was likely due to being highly powered to detect even trivial differences. Further, there was variability in the magnitude of the discrepancies. For instance, much smaller differences were observed for depression, anxiety, and pain interference than for physical function, fatigue, and social roles. That said, given that PROMIS domain scores are on a T distribution (Mean = 50, SD = 10), the largest average baseline difference was 0.13 SDs – an arguably ignorable difference. On the other hand, mean change over time was always statistically comparable between the PROMIS-16 and PROMIS 29 + 2, indicating the two-item PROMIS-16 domain scores were sensitive enough to produce comparable average change estimates over the 6 months.
This is promising for clinical and research settings. Short forms are appropriate and desirable in research settings if a reliable and valid discrete single domain is measured [
32]. Prior psychometric work has demonstrated the reliability of PROMIS-16 domain scores (eight discrete domains) [
9], and this study builds on this work by providing evidence supporting the longitudinal and predictive validity of the PROMIS-16. Additionally, short forms have a place in clinical screening, in which time burden, cost, and resources are major considerations [
30]. The PROMIS-16 is a psychometrically sound fixed-length short profile measure that limits patient or respondent burden and the associated data collection costs while producing domain score change estimates comparable to longer, more time-intensive measures.
It is important to note that while average change score estimates were statistically comparable between the PROMIS-16 and PROMIS-29 + 2, this was observed with three time points over 6 months. Research is needed to evaluate and compare average change over a longer period and consider possible non-linear change. Moreover, this work was primarily focused on between-level effects. Future work should examine more nuanced within-person change, perhaps among a clinical sample where more change and individual variation are expected. Additionally, there was limited variability in overall health ratings over time, perhaps due to using a 5-point scale and that scores tended to be relatively stable over time. This may have contributed to the lack of prediction between slopes for PROMIS domains and overall health rating. While results were similar between the PROMIS-16 and PROMIS-29 + 2, longer forms will have greater precision, particularly in the extremes of the distribution. Further, this study used a sample of respondents with chronic lower back pain (CLBP) from an online nonprobability convenience sample. Future work is needed to evaluate the predictive validity of the PROMIS-16 and PROMIS-29 + 2 in different samples, including the general population and those with health conditions other than CLBP. It is also worth noting that while this sample included respondents who reported chronic low back pain, they were not necessarily receiving treatment or exposed to an event where pain may be affected. This may explain why we did not see mean changes over time in several domains. As such, future work should consider evaluating longitudinal validity in a clinical sample receiving treatment to determine if changes in other domains are comparable between the PROMIS-16 and PROMIS-29.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.