Introduction
Crohn’s disease (CD) is a relapsing type of inflammatory bowel disease (IBD) that affects the gastrointestinal tract [
1,
2]. Several working definitions for CD severity levels have been proposed, generally based on clinical symptoms and mucosal inflammation, but no universal definitions exist [
3]. Common symptoms include abdominal pain, bowel urgency, bowel obstruction, and diarrhea [
1]. Fatigue, an overwhelming sense of continued tiredness, lack of energy, or exhaustion [
4], is also increasingly recognized as a key symptom of CD [
5,
6].
Fatigue is more prevalent in patients with CD (35–86%) than in the general population [
7‐
10]. Around one in ten patients with CD may present persistently with high fatigue over time [
11]. Fatigue significantly impacts quality of life and is among the most distressing and burdensome symptoms of CD [
7,
11‐
15]. Specifically, fatigue is associated with lacking motivation, limited social activities, disability, decreased physical function, limited work productivity, and increased stress and disease-related worries in CD [
7,
11,
13‐
15]. Qualitative evidence suggests that fatigue control is among the attributes that patients with IBD value the most in a treatment [
4]. Common risk factors for fatigue in CD include depression, anxiety, sleep disturbances, vitamin/mineral deficiencies, and anemia [
7]. Fatigue may be particularly important in patients with more severe CD, as it is associated with the level of disease activity in IBD [
16].
Measuring fatigue can be complex and subjective as it is variable, multifaceted, and poorly understood [
13,
17‐
20]. The Functional Assessment of Chronic Illness Therapy–Fatigue (FACIT-Fatigue) is a 13-item composite patient-reported outcome (PRO) that assesses the severity and impact of various aspects of fatigue over the previous week [
13,
21]. The FACIT-Fatigue has been used in multiple chronic conditions including IBD, and previous research has evaluated its psychometric performance in CD [
13,
14,
18,
22]. However, evidence in moderately to severely active CD is limited. Previous analyses differed in their target patient population and anchors used or had restricted sample size and study duration, all of which could limit the applicability of psychometric findings to this population. In addition, it is important to assess what constitutes clinically meaningful improvement in fatigue, as not all score changes in a PRO measure imply clinical benefit [
23]. Because the FACIT-Fatigue score change that patients would perceive as meaningful improvement may differ between patient populations [
24], thresholds for meaningful improvement should be studied for each target population.
Aim
The present analysis used Phase 3 study data from patients with moderately to severely active CD, for whom fatigue could be more severe. It expands on previous evidence of the measurement properties of the FACIT-Fatigue, determines score change thresholds representing a meaningful improvement in this population, and provides further support for using the FACIT-Fatigue in CD.
Methods
Study population and design
This study used data from VIVID-1 (NCT03926130), a Phase 3, randomized, double-blind, active- and placebo-controlled study evaluating the safety and efficacy of mirikizumab in moderately to severely active CD. Participants were 18–80 years old and had a confirmed diagnosis of CD for at least 3 months prior to baseline. Participants had moderately to severely active disease defined by baseline average daily stool frequency (SF) ≥ 4 (number of very soft or liquid stools per day per Bristol Stool Form Scale type 6 or 7) and/or average daily abdominal pain (AP) ≥ 2 (scale: 0=”none”, 1=”mild”, 2=”moderate”, and 3=”severe”), as well as endoscopic evidence of mucosal inflammation based on a Simple Endoscopic Score for Crohn’s disease (SES-CD) score of ≥ 7 (or ≥ 4 for those with isolated ileal disease; scale: 0–56, where higher scores indicate more severe disease). Inclusion criteria also required failure on prior biologic and/or non-biologic therapy, defined as inadequate response, loss of response, or intolerance. Participants were randomly assigned in a 6:3:2 ratio to receive mirikizumab, ustekinumab, or placebo for a total of 52 weeks. Consenting participants were trained on using tablet devices (for assessments completed at study visits) and electronic daily diaries (for assessments completed at home). Alarm reminders were used for the latter.
Assessments
FACIT-Fatigue
The FACIT-Fatigue [
13,
21] was completed electronically on a tablet device at Week 0 (Baseline), 12, and 52 visits. The recall period was the past 7 days. The 13 items in the FACIT-Fatigue under evaluation (version 4) are: 1 (“I feel fatigued”), 2 (“I feel weak all over”), 3 (“I feel listless [“washed out”]”), 4 (“I feel tired”), 5 (“I have trouble starting things because I am tired”), 6 (“I have trouble finishing things because I am tired”), 7 (“I have energy”), 8 (“I am able to do my usual activities”), 9 (“I need to sleep during the day”), 10 (“I am too tired to eat”), 11 (“I need help doing my usual activities”), 12 (“I am frustrated by being too tired to do the things I want to do”), and 13 (“I have to limit my social activity because I am tired”).
Individual items are rated on a 5-point scale (0=”not at all” to 4=”very much”). The values for all items except items 7 and 8 (the two positively phrased items) are reversed by subtracting their value from 4 before summing scores for all items. The summed score is then multiplied by 13 and divided by the number of items answered to obtain the total score (range, 0–52, with higher values indicating less severe fatigue). Total scores in VIVID-1 were not calculated if eight or more items were missing.
Patient global rating of severity (PGRS)
The PGRS is a single-item instrument designed to assess the participants’ rating of their overall CD symptom severity (1=“none” to 6=“very severe”) over the past 24 h. The PGRS was completed daily from screening visit to Week 52 using an electronic diary. At Baseline and at Weeks 4, 12, 16, and 52, weekly average scores were calculated by averaging the most recent 7 days in the 12 days prior to each visit, with at least 4 days of non-missing values, before rounding to the nearest integer.
Patient global impression of change (PGIC)
The PGIC is a single-item instrument assessing the participants’ rating of change in CD symptoms (Likert scale: 1=“very much better,” 4=“no change,” and 7=“very much worse”) at a given timepoint compared to how they were before they started taking the medicine. The PGIC was completed via tablet at Week 4, 8, 12, and 52 visits.
Inflammatory bowel disease questionnaire (IBDQ)
The IBDQ is a 32-item instrument that measures four domains of participants’ lives over the past 2 weeks: bowel symptoms, systemic symptoms, emotional function, and social function [
25]. Scores (1=“a very severe problem” to 7=“not a problem at all”) are summed into a total score (range, 32–224, with higher scores indicating better quality of life). The IBDQ was completed via tablet at Baseline, Week 12, and Week 52 visits. Analyses included IBDQ total score, IBDQ Bowel Function Domain score (the sum of items 1, 5, 9, 13, 17, 20, 22, 24, 26, and 29, per developer scoring instructions [
25]), and item 2 score (frequency of feeling fatigued or being tired/worn out). A ≥ 16-point increase (improvement) from Baseline in IBDQ total score was defined as IBDQ response, and a total score of ≥ 170 was defined as IBDQ remission [
26,
27].
The SF-36 includes 36 items across eight health domains; scores can be calculated for each domain (range, 0–100, with higher scores indicating better health status). The version used here (SF-36 v2 ‘acute’) has a 1-week recall period. The questionnaire was completed via tablet at Baseline, Week 12, and Week 52 visits. Analyses included the SF-36 Vitality Domain, which provides a general measure of fatigue [
28].
Crohn’s disease activity index (CDAI)
The CDAI is an 8-item instrument comprising three patient-reported items (abdominal pain, stool frequency, and wellbeing) and five clinician-reported/laboratory items. The patient-reported items were administered daily, and the clinician-reported/laboratory items were completed at all study visits except screening and Weeks 2 and 6. CDAI remission was defined as a CDAI score < 150 [
29‐
31] and CDAI response was defined as a CDAI score reduction (improvement) of ≥ 100 points versus Baseline and/or CDAI remission [
32,
33].
SES-CD
The SES-CD is a CD-specific clinician-reported instrument based on four endoscopic variables assessed across five bowel segments [
34]. The SES-CD was scored by central readers based on endoscopy recordings from clinical sites at screening, Week 12, and Week 52.
Inflammation markers
The concentrations of high-sensitivity C-reactive protein (hsCRP) from blood samples and fecal calprotectin from stool samples assessed the level of inflammation at Baseline and at Weeks 4, 8 (only hsCRP), 12, 16, 28, 44, and 52.
Psychometric analyses
The present psychometric analyses used individual participant data from VIVID-1 pooled across treatment groups. Analyses were conducted using observed cases only; developer scoring rules for handling missing data were used (as described in the
Assessments section). Continuous variables were summarized with sample size, mean, standard deviation (SD), and range. Categorical variables were summarized with frequencies and percentages.
Internal consistency reliability
As the FACIT-Fatigue is a multi-item measure, the internal consistency of the FACIT-Fatigue items was evaluated at Baseline, Week 12, and Week 52. Cronbach’s alphas were calculated to evaluate internal reliability by assessing the extent to which individual items related to one another (range: 0–1, with values ≥ 0.70 considered acceptable reliability [
35]). Cronbach’s alphas were also calculated for the total score with each individual item removed. In addition, corrected item-to-total score correlations were evaluated using Spearman correlation coefficients between each individual item and the total score, with that item omitted. Correlations were considered weak (
r < 0.30), moderate (≥ 0.30 to < 0.70), strong (≥ 0.70 to ≤ 0.90) or very strong (> 0.90) [
36].
Convergent and discriminant validity
The convergent and discriminant validity of the FACIT-Fatigue were evaluated at Baseline, Week 12, and Week 52. Convergent validity was assessed by calculating Spearman correlations between the FACIT-Fatigue score and PGRS, IBDQ total score, IBDQ Bowel Function Domain, IBDQ item 2, SF-36 Vitality Domain, and CDAI. Discriminant validity was assessed by calculating Spearman correlations between the FACIT-Fatigue score and SES-CD, hsCRP, and fecal calprotectin. Moderate (|r|=0.30–0.70) correlations were expected between the FACIT-Fatigue and the PGRS, IBDQ total score, IBDQ Bowel Function Domain, IBDQ item 2, SF-36 Vitality Domain, and CDAI, which are/include patient-reported measures of related concepts. Weak (|r| <0.30) correlations were expected between the FACIT-Fatigue and SES-CD, hsCRP, and fecal calprotectin, as these are endoscopic or laboratory parameters.
Known-groups validity
The known-groups validity of the FACIT-Fatigue was assessed by comparing FACIT-Fatigue scores between subgroups based on different levels of response on PGRS, IBDQ total score (≤ or > median), and IBDQ item 2 score at Baseline, Week 12, and Week 52. Mean FACIT-Fatigue scores were compared across individual subgroups using analysis of variance (ANOVA) with Scheffé’s correction for post-hoc pairwise comparisons [
37], with the FACIT-Fatigue score as the dependent variable. Patient groups with more severe CD symptoms based on the PGRS, IBDQ total score, and IBDQ item 2 were expected to report greater fatigue severity (lower FACIT-Fatigue scores).
Ability to detect change (responsiveness)
Spearman correlations for change scores from Baseline to Week 12 and Week 52 were calculated between the FACIT-Fatigue and potential anchor measures. The responsiveness of the FACIT-Fatigue was assessed by comparing mean changes in FACIT-Fatigue total score from Baseline to Week 12 and 52 with changes in anchor groups in the same period using one-way analysis of covariance (ANCOVA), adjusting for the FACIT-Fatigue score at Baseline, with Scheffé’s correction for post-hoc pairwise comparisons [
37]. The anchor groups were pre-defined by PGRS average change, PGIC categories, IBDQ response, ≥ 1-point increase on IBDQ item 2, CDAI response, or SF-36 Vitality Domain response (≥ 9.3-point increase [
38]), as well as IBDQ remission or CDAI remission.
Thresholds for meaningful improvement
An anchor-based approach was used to inform the threshold that would constitute meaningful within-patient improvement, with the PGRS and PGIC used as primary anchors, and IBDQ response, CDAI response, IBDQ item 2 change, and SF-36 Vitality Domain response as secondary anchors. Mean, SD, and percentile groups (10th, 25th [quartile 1], 50th [quartile 2, median], 75th [quartile 3], and 90th) for FACIT-Fatigue score were reported for participants in each of the anchor groups.
Cumulative distribution function (CDF) and probability density function (PDF) plots, estimated using kernel density estimation curves, were used to supplement the anchor-based method to identify the threshold for meaningful improvement. The cumulative proportion (CDF) or probability density (PDF) were shown across a range of possible responder definitions defined by PGRS and PGIC.
Discussion
Using Phase 3 data from a large sample of adults with moderately to severely active CD up to study Week 52, this analysis expands on previous evidence of the psychometric properties of the FACIT-Fatigue. Prior studies used different analysis methods, timepoints, and study populations [
13,
14,
18,
22]. Specifically, Tinsley et al. included participants with CD regardless of severity [
18], while Regueiro et al. (2023) and Loftus et al. analyzed data from participants with moderately to severely active CD but only up to Week 2 [
14] or Week 12 [
13]. Regueiro et al. (2022) analyzed data from participants with moderately to severely active CD up to Week 52 using Phase 2 data (and therefore a limited sample) and only evaluated the correlation of change over time between FACIT-Fatigue and other assessments [
22].
There was an increase in the FACIT-Fatigue score (indicating improved/less severe fatigue) during the study period, which is consistent with the trend of improvement in CD symptoms observed in the trial. The full range of responses was observed for each of the individual items in the FACIT-Fatigue at each timepoint, suggesting that the score range is appropriate to capture the varying degrees of fatigue severity. No ceiling effects were observed, but floor effects were observed on some items. This is consistent with previous studies of CD [
13] and other disease areas [
39], in which item 10 (“I am too tired to eat”) also showed floor effects. It is worth noting that this item is still relevant to include in the FACIT-Fatigue, as it allows to capture extremely severe fatigue and therefore helps to understand the distribution of fatigue severity across the sample/population [
39].
Given the multidimensional nature of fatigue and its variability across individuals [
17,
20], it is appropriate to use a variety of terms and experiences to assess it. The FACIT-Fatigue is a 13-item measure that demonstrated excellent internal consistency reliability in this analysis, and Cronbach’s alpha coefficients were similar to those previously reported for this measure [
13,
18]. While the Cronbach’s alphas for the total score exceeded 0.90 across study visits in this analysis, alphas did not generally increase when any of the items were removed, suggesting that all items in the FACIT-Fatigue contribute to the concept represented by the total score. Internal consistency was also supported by all corrected item-to-total score correlations being in the moderate to strong range, with no extremely high correlations reported.
The construct validity of the FACIT-Fatigue was demonstrated at Baseline, Week 12, and Week 52. As hypothesized, the FACIT-Fatigue score had moderate to strong correlations with measures of similar concepts, supporting its convergent validity. The present work also found weak correlations of the FACIT-Fatigue with endoscopic and laboratory assessments, which were expected to be less strongly correlated with the FACIT-Fatigue than other fatigue-related PROs, thus supporting the discriminant validity of the FACIT-Fatigue. These findings were consistent with previous work demonstrating the construct validity of the FACIT-Fatigue using similar anchor measures to the ones used in this study [
13,
14,
18]. They also align with analyses of Phase 2 data assessing correlation of fatigue with other trial endpoints. Results showed that changes in fatigue in CD correlate with changes in PROs but not with endoscopic and inflammatory biomarker improvements [
40], stressing the importance of PROs for assessing this symptom.
The FACIT-Fatigue showed strong known-groups validity as it was able to discriminate between subgroups based on PRO measures that assess disease severity, quality of life, and fatigue. These findings complement previous work showing that the FACIT-Fatigue can distinguish between subgroups of participants with active vs. inactive CD, as well as between those in remission vs. not in remission based on CDAI and IBDQ total scores [
13,
18].
The FACIT-Fatigue was also responsive to change, with moderate to strong correlations with changes in other measures. Expanding on previous work only reporting change score correlations [
13], the present analysis provides additional evidence supporting the FACIT-Fatigue’s responsiveness by comparing the amount of score change between distinct patient-reported change groups.
Anchor-based analyses of the VIVID-1 data suggested that a threshold of a 6- to 9-point increase in the FACIT-Fatigue score may represent clinically meaningful improvement in adults with moderate to severely active CD. This estimate was supported by CDF/PDF plots. Findings from this study are consistent with previously reported thresholds [
13,
41]. For instance, Loftus et al. concluded that a 7- to 10-point increase in FACIT-Fatigue score represented fatigue improvement in moderately to severely active CD and recommended a threshold of a 9-point change for determining meaningful improvement [
13]. The 9-point change threshold has been applied in clinical research to define meaningful improvement in FACIT-Fatigue in moderately to severely active CD [
42]. Of note, the presented estimates are also consistent with the notion that meaningful improvement may not require absence of the symptom, as even patients with inactive IBD or in remission often present a certain level of fatigue [
10,
14,
17].
Strengths and limitations
The psychometric analyses presented used a large data set derived from a Phase 3 clinical trial, including more participants than previous work [
13,
18,
22], and analyzed data over a long period of time (52 weeks). However, potential limitations include not assessing test-retest reliability, which was due to long time gaps between measurements (although good test-retest reliability has been previously demonstrated for the FACIT-Fatigue [
13,
14,
18]). While classical test theory was used to remain consistent across analyses in submissions to regulatory agencies (e.g., FDA), item response theory may be considered for future research. As data were collected from clinical trial participants, the applicability of the findings in the community setting is unclear. Finally, although patients in the VIVID-1 trial were recruited across multiple countries [
43] and thus represent varied demographics, most study participants were White or Asian. As such, these results may not be generalizable to patients of other races.
Conclusions
This study demonstrates the internal consistency, construct validity, and responsiveness of the FACIT-Fatigue in adults with moderately to severely active CD participating in the VIVID-1 study. These psychometric results add to a growing body of evidence supporting the use of the FACIT-Fatigue to assess patients’ experience of fatigue in CD. Given the prevalence and impact of fatigue in CD, the FACIT-Fatigue provides clinicians and researchers insights into the patient’s fatigue experience and helps support a patient-centered approach to managing and assessing treatment response. The presented anchor-based analyses further support that a threshold range of 6–9 points for FACIT-Fatigue may represent clinically meaningful improvement in patients with moderately to severely active CD.
Acknowledgements
The authors thank Pablo Izquierdo, PhD, of PPD (a Thermo Fisher company) for providing medical writing support, which was funded by Eli Lilly and Company and conducted in accordance with Good Publication Practice (GPP3) guidelines (http://www.ismpp.org/gpp3).
Declarations
Competing interests
MR serves on advisory boards for AbbVie, Janssen, UCB, Takeda, Pfizer, BMS, Organon, Amgen, Genentech, Gilead, Salix, Prometheus, Eli Lilly and Company, TARGET Pharma Solutions, and Trellus. SS, AV, and FD are employees and shareholders of Eli Lilly and Company. XZ is an employee of Syneos Health, which received funding from Eli Lilly and Company in connection with this study. LS, AKK, and CC are employees of Evidera, which received funding from Eli Lilly and Company in connection with this study. VJ has received fees or grant and/or research support and/or served as a consultant and/or speaker for AbbVie, Alimentiv, Arena Pharmaceuticals, Asahi Kasei Pharma, Asieris Pharmaceuticals, AstraZeneca, Bristol Myers Squibb, Celltrion, Eli Lilly and Company, Ferring Pharmaceuticals, Flagship Pioneering, Fresenius Kabi, Galapagos NV, Genentech, Gilead Sciences, GlaxoSmithKline, Janssen, Merck, Metacrine, Mylan, Pandion Therapeutics, Pendopharm, Pfizer, Prometheus Therapeutics and Diagnostics, Protagonist Therapeutics, Reistone Biopharma, Roche, Sandoz, Second Genome, Sorriso Pharmaceuticals, Takeda, Teva, Topivert, Ventyx Biosciences, and Vividion Therapeutics.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.