Bookmarking is a qualitative method used to assign descriptive labels to ranges of patient-reported outcome (PROM) scores. We aimed to evaluate variability between bookmarking samples and test score ranges where there was variability in expert opinion in previous studies.
Methods
We conducted two bookmarking sessions with patients who experienced orthopaedic fractures (n = 11) and one session with orthopaedic clinicians (n = 10). Participants reviewed vignettes comprised of PROM items and responses that represented hypothetical patients with a range of severity. Vignettes were constructed for PROMIS Upper Extremity Function, Physical Function, and Pain Interference measures. Participants placed bookmarks between vignettes that reflected different levels of severity (e.g., mild, moderate). The score reflecting the midpoint between vignettes was used as the recommended threshold between categories. We evaluated the variability in thresholds across participants, bookmarking panels, and previous studies.
Results
Although patients and clinicians were not unanimous, the majority agreed on thresholds separating levels of severity for PROMIS Upper Extremity (≥ 40 = within normal limits, 30–39 = mild, 23–29 = moderate, < 23 = severe), PROMIS Physical Function (≥ 46 = within normal limits, 38–45 = mild, 26–37 = moderate, < 26 = severe), and PROMIS Pain Interference (≤ 50 = within normal limits, 51–60 = mild, 61–68 = moderate, > 68 = severe).
Conclusion
Testing new vignette scores within the same patient population enables more nuanced testing of score ranges without clear consensus and provides additional evidence for recommending thresholds for severity categories. These thresholds can be utilized to help interpret PROMIS scores from patients receiving orthopaedic care.
Opmerkingen
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Patient-reported outcome measures (PROMs) are routinely utilized to better understand orthopaedic patients’ function and pain [1]. This includes use in research (e.g., comparing interventions), clinical practice (e.g., assessing a patient’s pain intensity over time), and quality evaluation (e.g., Centers for Medicare and Medicaid’s Patient-Reported Outcome Performance Measures [PRO-PMs]) [2]. All of these uses involve interpretating a PROM’s score. Many strategies are used to facilitate interpretation. These include (1) using a score metric that references a specific group thus enabling calculating a respondent’s distance from a known mean (e.g., T-scores with mean = 50 in general population), (2) equating a score with a percentile for a known population (e.g., a patient is in 40th percentile for physical function), and (3) comparison with reference values (e.g., mean for others with same diagnosis and age e.g [3, 4]). Creating interpretable labels describing the level of severity (e.g., within normal limits, mild, moderate, severe) for a specific range of scores can enhance PROM score interpretation. For example, being able to describe the level of severity pre- and post-intervention can enhance clinicians’ understanding, help set patients’ expectations, aid in shared decision making, and facilitate accurate interpretation of research results. A change in a PROM score that reflects moving from the upper boundary of the moderate range to the lower boundary of the moderate range is interpreted differently than moving from the moderate range to the normal range. Standard setting by bookmarking is one method to identify these score ranges.
Standard setting is an approach primarily utilized in educational testing to identify a score that is “good enough for a particular purpose” (e.g., pass/fail) [5]. In one standard setting approach, bookmarking, experts are assembled to identify the score cut point or threshold that indicates the boundary between categories [6]. When bookmarking is used with PROMs, hypothetical patients are described in a series of vignettes comprised of PROM items and responses. Each vignette equates to a given score on a PROM. A set of vignettes representing a range of scores is created. Experts (i.e., patients, clinicians) review this set of vignettes and assign each a label describing the level of severity. They “bookmark” the location between vignettes that is the threshold between a patient they would describe with one label (e.g., mild) from a patient they would describe with a different label (e.g., moderate). Experts then discuss their bookmark locations with each other with a goal of reaching consensus. Bookmarking has been conducted with a range of patient populations including orthopaedic fractures [7], cancer [8, 9], spinal cord injury [10], arthritis [11, 12], multiple sclerosis [13], and acquired cognitive and language disorders [14].
Recently, the Food and Drug Administration has identified bookmarking as a tool that can be used when evaluating medical products [15]. Specifically, bookmarking can facilitate interpreting changes in PROM scores over time. For example, quantifying the number of patients who moved from one category (e.g., severe) to another category (e.g., moderate) can support interpreting that change as meaningful to patients. Given this recommendation has the potential for increasing use of bookmarking, it is important to have clearer recommendations for the number of bookmarking panels and vignettes needed to establish threshold recommendations.
One limitation of PROM bookmarking is that only a single set of vignettes anchored to a set of scores (e.g., 40, 45, 50, etc.) is tested. As a result, the only available thresholds are the midpoints between those scores (e.g., 42.5, 47.5, etc.). An untested threshold may have greater expert support. Additionally, many previous PROM bookmarking studies have pressured participants to reach unanimity in bookmark placement. When unanimity was not forced, participants continued to have varying opinions after discussion [7] suggesting some thresholds may have more expert support than others.
In a previous bookmarking study, we tested vignettes with two panels of patients with fractures and two panels of orthopaedic clinicians [7]. Participants reviewed vignettes comprised of items from PROMIS Upper Extremity Function (score range T = 17.5 to T = 47.5), Physical Function (score range T = 17.5 to T = 57.5), and Pain Interference (score range T = 47.5 to T = 77.5) item banks. Participants were largely able to agree on thresholds separating function and pain that was within normal limits, mild, moderate, and severe. However, in some cases, panels were divided between two adjacent thresholds. Specifically, these were the threshold between moderate and severe for Upper Extremity Function (T = 20 or T = 25), the moderate/severe threshold for Physical Function (T = 25 or T = 30), and the within normal limits/mild (T = 50 or T = 55) and moderate/severe (T = 65 or T = 70) threshold for Pain Interference. Consequently, the current study aimed to (1) replicate the recommended thresholds and (2) further evaluate the score ranges where we previously found variability in clinicians’ and patients’ opinions.
Methods
Measures
Patient participants completed a sociodemographics survey that included age, gender, race, ethnicity, education, employment status, type of fracture, date of fracture, and fracture treatment. Clinician participants completed a similar form including age, gender, race, ethnicity, profession, and years in orthopaedic practice. To familiarize themselves with the concepts and types of items used in bookmarking, all participants rated their own functioning and pain on three PROMIS short forms: PROMIS short form v2.1 – Upper Extremity 7a, PROMIS short form v2.0 – Physical Function 10a, and PROMIS short form v1.1 – Pain Interference 8a although these data were not collected. These short forms are comprised of a subset of items from a larger parent item bank. The PROMIS Bank v2.1 – Upper Extremity includes 46 items reflecting tasks that require use of one’s shoulders, arms, and hands. The PROMIS Bank v2.0 – Physical Function includes 173 items assessing one’s capability to do physical activities and instrumental activities of daily living. The PROMIS Bank v1.1 – Pain Interference includes 40 items reflecting the degree to which pain interferes in one’s engagement in social, cognitive, emotional, physical, and recreational activities.
Vignette construction
We constructed vignettes comprised of 6 PROM items and responses selected from the parent item bank. First, we identified what T-scores we wanted to test by identifying areas where there was lack of consensus in previous research. Next, for each of these scores, we identified the most probable response for every item in an item bank. We selected items for a given vignette that reflected the full content of the item bank (e.g., activities of daily living, mobility) and had a range of response options. We scored the included items and responses as though they were completed custom short forms via the HealthMeasures Scoring Service to establish a given vignette’s T-score. We assigned a name (e.g., Ms. Hill) to each vignette. Finally, we had 13 colleagues outside of the vignette construction team rank order the vignettes by level of severity as a test that each vignette conveyed increasing levels of severity. We then identified vignettes that were ranked in the wrong location and evaluated the distance from their correct location. If 25% or more of the testers had a vignette in the wrong location or if more than one tester placed a vignette more than one position away from its true location, we revised the vignette.
Patient participants
Patients were recruited from an academic orthopaedic clinic in the Southeastern United States. A study team member reviewed the clinic schedule daily and conducted a chart review to determine eligibility. Eligibility criteria included sustaining an orthopaedic fracture confirmed radiographically or by an attending physician, age > 18 years old, proficiency in written and spoken English, ability to send and receive email, and ability to participate in a video conference call. In order to include patients with diverse experiences, for each group we aimed to include at least 1 male and 1 female; at least 2 Hispanic/Latinx identifying people; at least 2 African American, Asian, Native American/Alaskan Native, or multi-racial identifying people; and at least 1 patient each from specific stages of recovery (i.e., < 1 month, 1–2 months, 3–5 months, and 6–12 months post-injury).
Clinician participants
Clinical healthcare providers who treat patients with orthopaedic traumas including orthopaedic surgeons, physiatrists, nurse practitioners, physician assistants, physical therapists, and occupational therapists were eligible. Trainees were ineligible. Information about the study was sent via email to relevant departments in an academic medical center and affiliated hospital-based rehabilitation center (i.e., Departments of Orthopaedics, Physical Therapy). Additionally, study coordinators approached nurse practitioners working within orthopaedic clinics the academic medical center. Interested clinicians were instructed to contact the study coordinator who provided additional information, determined eligibility, and completed informed consent procedures.
Procedures
We conducted two bookmarking panels with patients and one with clinicians following procedures described in detail by Cook et al. 2019 [6]. Consented participants received a study binder that included materials for the study as well as instructions for joining the videoconference. A study team member met with each patient participant prior to the study session to ensure the participant was able to use the videoconference platform. The bookmarking session began with a brief explanation of the aims of the study and reviewed bookmarking procedures. Next, participants engaged in a practice exercise of setting thresholds for “fanciness” of a range of desserts. Afterwards, participants were asked to define the severity labels that would be used during the session – within normal limits, mild, moderate, and severe – to align understanding of these terms. Participants completed the upper extremity short form to become more familiar with the construct and type of questions used in PROMs.
Participants were then asked to review vignettes for upper extremity function (see Fig. 1 for example). Next, they identified which hypothetical patients they would describe as experiencing within normal limits, mild, moderate, or severe problems with upper extremity function by placing a bookmark between vignettes to separate categories of severity (see Fig. 2). They reported the location of their bookmarks in a personalized link via REDcap. The moderator then led a discussion about individuals’ bookmark placement. After discussion, participants were invited to review and revise their bookmark placement and record a final location via a personalized link in REDCap. Study procedures were repeated for physical function and pain interference. Patient participants received $100 and clinician participants $300 for their time.
Fig. 1
Upper Extremity vignette for T = 32.5
×
Fig. 2
Bookmarking schematic
×
Data analysis
We calculated thresholds between severity categories for each participant by identifying the midpoint between the vignettes where a participant placed a bookmark. For example, a bookmark between a vignette with a T-score of 16 and a vignette with a T-score of 20 equated to a threshold of T = 18 (the midpoint between 16 and 20). We created frequency distributions for the thresholds participants selected before and after discussion. At the group level, we identified the modal bookmark placement as the group’s threshold. We compared recommended thresholds between panels and with previous bookmarking studies.
Results
Vignettes
We created 8 vignettes each for Upper Extremity Function (T = 15.1, 20.8, 24.8, 27.6, 32.2, 36.9, 45.2, and 58.7) and Pain Interference (T = 40.6, 41.1, 53.7, 57.4, 61.7, 65.9, 69.7, and 73.7) and 9 vignettes for Physical Function (T = 19.9, 23.3, 28.9, 32.4, 35.6, 40.9, 44.5, 47.8, and 54.7). As demonstrated in Fig. 3, new vignettes were usually separated by approximately 5 T-score points and reflected previously untested scores in areas of previous disagreement [7]. In pre-testing, only one vignette was mis-ordered by more than 25% of testers. It was consequently revised before being included in the bookmarking study.
Fig. 3
Physical Function vignette scores with severity labels from previous study and new vignettes. Note: WNL = within normal limits. Mod = moderate
×
Participants
A total of 24 patients expressed interest in the study. Fifteen met eligibility criteria and completed informed consent procedures. Eleven patients attended one of two bookmarking sessions. Participants’ mean age was 43 and they were about evenly divided by gender (see Table 1). Most were white (91%) and 5 (45%) were Hispanic/Latino. The sample was highly educated with a range of employment status. The majority experienced an ankle fracture treated with internal fixation using plate and screws. Participants were a median of 3.2 months post-injury (range 0.4 to 12 months; mean = 4.3, SD = 3.6).
Nineteen potential clinician participants expressed interest in the study of which 18 met eligibility criteria. Eight were unable to participate as they were not available at the time of the bookmarking session. Ten clinicians including physical therapists, occupational therapists, a nurse practitioner, and an orthopaedic surgeon completed informed consent procedures and attended the bookmarking session. The clinicians’ mean age was 40 with a mean of 10 years of practice in orthopaedics (range 1 to 36 years). 60% were white and 40% Hispanic/Latino.
Table 1
Participant demographics
Patient Participants
Clinician Participants
Gender
Number
Percentage
Number
Percentage
Female
6
55%
6
60%
Male
5
45%
4
40%
Mean age
43 (SD = 18)
Range 19–74
40 (SD = 10)
Range 28–58
Race
Asian
0
0%
2
20%
Black or African American
1
9%
1
10%
White
10
91%
6
60%
Other
0
0%
1
10%
Spanish/Hispanic/Latino
5
45%
4
40%
Education
High school or secondary school
2
18%
College or vocational certificate
6
55%
Post-graduate degree
3
27%
10
100%
Employment status
Full-time employed
5
45%
Part-time employed
1
9%
Unemployed
2
18%
Retired
3
27%
Type of fracture
Ankle
8
73%
Tibia
2
18%
Hip
1
9%
Type of treatment*
Plate and screws
6
55%
Intramedullary nail
3
27%
Closed
2
18%
External fixation
1
9%
Median months since injury
3.2
Range 0.4–12
Profession
Physical Therapist
6
55%
Occupational Therapist
2
18%
Nurse practitioner
1
9%
Orthopaedic Surgeon
1
9%
Mean years in orthopaedic practice
10 (SD = 11)
Range 1–36
*Patients may have more than one type of treatment
Thresholds between severity categories
For upper extremity, two thresholds (T = 40, T = 30) were established by our previous research [7]. In the current study, two of the three panels selected the same threshold separating within normal limits and mild (T = 40) and all three panels agreed on the threshold separating mild and moderate (T = 30; see Fig. 4). The prior lack of consensus for the threshold separating moderate and severe (T = 20 or T = 25) was resolved. All three panels agreed on a threshold of T = 23.
For physical function, our previous research supported a threshold of T = 50 separating within normal limits and mild. In this study, two of three panels selected T = 46 (see Fig. 4). Two panels and half of the participants in the third group selected T = 38 as the threshold between mild and moderate which was consistent with previous recommendations. The prior lack of consensus for the threshold separating moderate and severe (T = 25 or T = 30) was resolved. All panels selected T = 26 as the threshold.
For pain interference, past participants disagreed on using a threshold of T = 50 or T = 55 to separate within normal limits and mild. In this study, two of three panels and half of the participants in the third group labeled the T = 53.7 vignette as mild supporting use of T = 50 as the threshold between within normal limits and mild (see Fig. 4). All three panels selected T = 60 to separate mild from moderate consistent with our previous study. Two of the three panels selected T = 68 as the threshold between moderate and severe. In our previous study, participants were divided between T = 65 and T = 70.
Fig. 4
Group-level consensus on vignette severity labels by T-score
×
Individual change
To evaluate the impact of using moderated group discussion that did not force unanimity, we reviewed participants’ selected thresholds before and after discussion. As shown in Fig. 5, in most cases, some participants in each group modified their selected thresholds. This was true for both patients and clinicians. Post-discussion ratings had less variability than pre-discussion ratings. For patients, 5 of the 9 ratings had smaller ranges post-discussion. For clinicians, 6 of the 9 ratings had smaller ranges post-discussion.
Fig. 5
Participants’ thresholds between severity categories pre- and post-discussion
×
Discussion
Patients with orthopaedic fractures and orthopaedic clinicians were able to reach agreement on thresholds to separate levels of severity using bookmarking methods. Recommended thresholds for PROMIS Upper Extremity scores are T = 40 (within normal limits/mild), T = 30 (mild/moderate), and T = 23 (moderate/severe). Recommended thresholds for PROMIS Physical Function scores are T = 46 (within normal limits/mild), T = 38 (mild/moderate), and T = 26 (moderate/severe). Recommended thresholds for PROMIS Pain Interference scores are T = 50 (within normal limits/mild), T = 60 (mild/moderate), and T = 68 (moderate/severe). These descriptive labels can be utilized when PROMIS Upper Extremity, Physical Function, and Pain Interference measures are used in clinical practice and research. For example, when measures are collected within routine clinical practice, the descriptive label can accompany the T-score which can facilitate accurate interpretation by both the patient and clinician. These thresholds may also be more appropriate than general PROMIS score interpretation recommendations when applied to patients with orthopaedic fractures.
Anchoring vignettes to new scores provided (1) additional evidence for previously recommended thresholds and (2) clearer support for thresholds in areas that previously lacked consensus. For example, in our previous study, participants were evenly divided between T = 20 and T = 25 as the threshold separating moderate and severe problems with upper extremity function. Because vignettes were anchored to T-scores at 5-point intervals, participants could only choose one of these options. By creating vignettes for this study using thresholds of T = 26, T = 23, and T = 18 and observing 81% of participants selected T = 23, we can more defensibly recommend T = 23 as the threshold. To our knowledge, this is the first study that has replicated bookmarking within the same patient population with novel vignettes. This approach supports conducting bookmarking with multiple panels and evaluating the variability in selected thresholds. When there is not consensus, replication with alternate vignettes can generate evidence for a threshold recommendation.
Our results also suggest that in a moderated discussion, participants are open to changing their opinion, particularly when their opinion is an outlier. Neither patients nor clinicians were unanimous, but the majority did reach agreement. By not forcing unanimity, it can be easier to identify those score ranges that require more evidence before making threshold recommendations. The degree of variability in opinions post-discussion could be used to inform the strength of a given threshold recommendation.
This study has several limitations. First, patient and clinician participants were selected from one academic medical center and its affiliates and the sample sizes are small (a criticism inherent to all bookmarking studies). Participants were volunteers and the organizational culture may contribute to clinicians’ perspectives. Although patients with different fracture types were approached about the study, most patient participants had ankle fractures. Patients may have had less personal experience with vignette items that described tasks that use one’s hands, shoulders, and trunk. Consequently, these participants may not be reflective of the full diversity of the population of individuals who experience or treat orthopaedic fractures, particularly those with upper extremity fractures. Future research including more patients with upper extremity fractures and participants from other healthcare systems would inform the generalizability of these findings. Second, participation required use of a video conference platform. Although this modality helps eliminate the mobility challenges in attending an in-person session, it may reduce the participation of individuals without high-speed internet access and lower levels of technology literacy.
In conclusion, specific thresholds for PROMIS Upper Extremity Function, Physical Function, and Pain Interference can be used with patients with fractures based upon high consistency in recommendations across panels in two studies. Replication of bookmarking studies with new vignettes anchored to new scores resolved areas for disagreement and strengthens recommendations for specific thresholds.
Declarations
Ethics approval and consent to participate
This study was approved by the University of Miami institutional review board. Informed consent was obtained from all individual participants included in the study.
Conflict of interest.
Drs. Heng, Kaat, and Rothrock receive grant support from the AO Foundation. Authors Drandarov, Mosher, and Prado have no relevant financial or non-financial interests to disclose.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Binnen de bundel kunt u gebruik maken van boeken, tijdschriften, e-learnings, web-tv's en uitlegvideo's. BSL Podotherapeut Totaal is overal toegankelijk; via uw PC, tablet of smartphone.