Introduction
Short-term toxicities can arise at any time during cancer treatment, and long-term and late effects can persist or arise beyond active treatment. The purpose of the Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE®) is to capture symptomatic adverse events by patient self-report in cancer clinical trials [
1]. PRO-CTCAE is designed to complement clinician adverse event reporting using Common Terminology Criteria for Adverse Events (CTCAE).
A prior study demonstrated that each ordinal response choice for PRO-CTCAE distinguished respondents with meaningfully different symptom experiences [
2]. Because each CTCAE grade can inform clinical actions, a prior study used any 1-point score change in PRO-CTCAE as a meaningful change [
3]. However, empirical evidence is lacking to support the assumption that a score change of one point is meaningful. The design, analysis, and interpretation of studies using PRO-CTCAE can be enhanced by deriving meaningful change thresholds (MCTs) to identify important changes for individuals.
Change scores in the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 (EORTC QLQ-C30), a 30-item instrument for assessing quality of life in cancer patients, have been frequently used as anchors to derive clinically important differences and meaningful change thresholds for other patient-reported outcome measures [
4,
5]. An individual-focused approach requires an anchor at the individual level to determine which changes are important for individual patients [
6]. Patients can have their own individual thresholds of a minimally important change (MIC) and the mean of these individual MIC thresholds can be conceptualized as the average within-person MIC [
7,
8]. Using the EORTC QLQ-C30 as the anchor and the predictive modeling method, in this analysis we derived minimally important individual-level change thresholds for worsening in PRO-CTCAE.
Discussion
In this study, we found that a one-point change reflects meaningful individual-level worsening for all PRO-CTCAE items investigated. Across different symptomatic adverse events, we demonstrated, using two datasets, that PRO-CTCAE MIC estimates for worsening were 0.53 or below, with the upper limits of the 95% confidence intervals at 0.66 or below. Given the ordinal nature of the PRO-CTCAE items, for the subset of PRO-CTCAE items examined in this study, a one-point increase represents a minimally important individual-level change that reflects meaningful worsening. Similarly, across symptomatic adverse events and datasets, the MIC estimates for the composite scores were 0.51 or below, with the upper limits of the 95% confidence intervals at 0.63 or below. Therefore, the minimally important individual-level worsening in a PRO-CTCAE composite score would also be a one-point increase.
The magnitudes of the MICs in PRO-CTCAE observed scores and composite scores were somewhat smaller in the PRO-TECT data compared to the validation data, even though the proportion of patients who worsened on anchor measures was very similar between the two datasets, as shown in Figs.
1 and
2. If there were more ceiling effects in the PRO-TECT data, with a higher proportion of respondents reporting extreme symptoms at the top of the scale, there would be little to no room for them to report further decline. However, there was little evidence of ceiling effects in either dataset: in the PRO-TECT data, 0.2% to 7.1% of patients selected the response option indicating the worst symptoms, and in the validation data, the range was 0.1% to 7.0%.
A possible explanation for the variability in MICs could be differences in tumor site, treatment type, performance status, or other sociodemographic characteristics. For instance, the patients in the PRO-TECT data were slightly older on average (61.7 compared to 58.0), had a higher proportion of female patients (62.8% compared to 56.3%), a higher proportion of white patients (80.0% compared to 72.1%), a lower proportion of college graduates (24.4% compared to 44.4%). Clinically, all patients in the PRO-TECT trial had metastatic cancer, and there were higher proportions of patients with lung or GI cancer compared to those in the validation study. Treatment characteristics were not directly comparable, because the PRO-TECT collected information on lines of systemic cancer treatment, including intravenous and oral delivery methods, whereas the validation data collected information on RT, surgery, and chemotherapy.
Alternatively, the smaller MICs in the PRO-TECT trial could be due to the study design, which incorporated alerts to patients whenever they reported a concerning symptom. The magnitude of change scores on the anchor measures among the decliners were very similar between the two datasets. For example, the mean score change among decliners on the EORTC QLQ-C30 constipation scale was 38.3 (SD of the change score = 15.6, n = 67) in the PRO-TECT data, which was very similar to 40.6 (SD of the change score = 16.0, n = 143) in the validation data (
t = 0.99, two-tailed
p = 0.33). Among these patients who meaningfully worsened on the EORTC constipation anchor, those in the PRO-TECT trial had a much smaller average change (0.22) in the PRO-CTCAE constipation (S) item compared to the validation data (0.91). Accordingly, the MIC estimate for constipation (S) item was small (0.10) in the PRO-TECT data compared to the validation data (0.49). Supplementary Figs.
1 and
2 illustrate the PRO-CTCAE score distributions at baseline and follow-up for both data sources. Notably, there was a higher prevalence of zeros and ones at follow-up in the PRO-TECT trial.
Patients in the PRO-TECT trial stayed on study for a median duration of 11.3 months, reporting symptoms weekly based on PRO-CTCAE for remote symptom monitoring, which generated alerts for severe or worsening symptoms. Interestingly, only 10% of these alerts were considered urgent by the respondents, warranting immediate contact with the cancer care team rather than waiting for the next appointment [
21]. This might have influenced how patients reported their symptoms, potentially moderating their responses based on their perception of symptom severity and the need for alerts. Consequently, data collected for different purposes, such as validation versus remote symptom monitoring, could impact MIC estimates. This underscores the importance of employing multiple studies, methods, and populations to enhance our understanding and confidence in MIC estimates [
6].
Because the ten EORTC QLQ-C30 scales used as anchors in this study are comprised of one to four items, with each item taking values from 1 to 4, the measurement scale is ordinal, with change increments (single response category movement on an item) ranging from 8.3 to 33.3 [
22]. Notably, to be categorized as meaningfully changed, emotional functioning requires a score change of 16.6, corresponding to two-increment change, while other scales require one change increment, ranging from 11.1 to 33.3. Implementing the 10-point threshold, the smallest change categorized as meaningful was seen in the fatigue scale (11.1), followed by emotional functioning (16.6), cognitive functioning (16.7), nausea/vomiting (16.7), and pain (16.7). This could explain the smaller MICs observed in fatigue (0.19 and 0.24), anxiety (0.30 to 0.37), concentration (0.25 and 0.26), and pain (0.30 to 0.34). For PRO-CTCAE items matched with single-item anchor measures with a larger increment (33.3), MIC estimates in the validation data ranged from 0.35 to 0.53, averaging 0.45. Therefore, anchor measures that allow for more granular increments of change might yield smaller MIC estimates, making the current MIC estimates conservative.
An important caveat of the current study lies in its focus on a subset of PRO-CTCAE items that are correlated with EORTC QLQ-C30 scales. This investigation employed EORTC QLQ-C30 scales and interpreted a ≥ 10-point change as reflecting meaningful worsening. Future studies could explore alternative anchors, such as patients’ global ratings of change or clinical anchors measured at baseline and later follow-up. Future studies should employ other methods and larger samples in different trial contexts to confirm and extend our observations. Lastly, the FDA [
23,
24] has recommended applying both qualitative and quantitative research methods to study and triangulate data on what is important to patients. This approach opens the door for further research into eliciting patient definitions or evaluations of meaningful changes on PRO-CTCAE items through interviews, focus groups, or surveys [
25‐
28], ensuring a comprehensive understanding by integrating multiple data sources.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.