Data
Patient level data were obtained from four randomised controlled trials (RCTs) in the United Kingdom: “Can Shoulder Arthroscopy Work (CSAW)”, “Proximal Fracture of the Humerus: Evaluation by Randomisation (Profher)”, “United Kingdom Rotator Cuff Trial (UKUFF)” and “United Kingdom Frozen Shoulder Trial (UKFROST)” [
13‐
16]. These RCTs evaluated surgical and non-surgical interventions for different types of shoulder problems in different patient cohorts. All trials collected both OSS and EQ-5D responses as primary and secondary outcome measures from patients at specific time-points throughout their follow-up, ranging from baseline to 24 months following intervention (trial details can be found in the Supplementary material).
Source and target measures
The OSS is a 12-item, unidimensional PROM designed and developed for assessing outcomes following shoulder surgery [
12]. A score between zero and four is assigned to each of its 12 items. Items encompass different elements of shoulder pain and the effect of shoulder function on daily activities, for example “Q1: How would you describe the worst pain you had from your shoulder?” and “Q3: Have you had any trouble getting in and out of a car or using public transport because of your shoulder?”. These individual scores are summed to give a total OSS score ranging from zero (worst outcome) to 48 (best outcome). Like the Oxford Hip and Knee scores, the OSS can be treated as a continuous variable under the assumption that it reflects levels of clinical severity [
17].
The EQ-5D is a widely used generic measure of health where a number of different health states based on five dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression) can be translated into a summary health index using a preference-based valuation set. Whereas a higher value of the EQ-5D health index represents a better health state, a higher value of each individual domain score represents a poorer health state.
The EQ-5D-3L, the original version of the EQ-5D used, records three levels of responses to each domain (1, 2, 3) and has an established value set for the UK population [
18]. The EQ-5D-5L was developed more recently, allowing five levels of responses (1, 2, 3, 4, 5), and a value set for the English population was published in 2018 [
19]. However, NICE issued a position statement in 2019 recommending that the health index using 5L data is only calculated following the use of a crosswalk mapping algorithm to 3L first [
20,
21]. NICE subsequently recommended further valuation studies for the EQ-5D-5L [
22].
Three studies (CSAW, Profher, UKUFF) collected the EQ-5D-3L whilst UKFROST collected the EQ-5D-5L. As the primary objective of this study was focused on the 3L version, the detailed results presented here pertain to pooled data from the CSAW, Profher and UKUFF trials. The same study methodology was applied separately to data from the UKFROST trial; detailed results for those analyses are reported in Supplementary material 2 and 3 where the health index was calculated using the 5L value set directly, and the crosswalk mapping function to 3L, respectively. All analyses were implemented using the eq5d package in R.
Like existing mapping studies, we were interested in cross-sectional mapping to estimate health states without necessitating repeated per-patient follow-up observations. Therefore, we pooled all patients’ paired OSS and EQ-5D responses for the studies using the EQ-5D-3L together, giving a total of 4061 (CSAW-939, Profher-750, UKUFF-2372) paired outcome observations. Most patients provided questionnaire responses at more than one time-point; we accounted for this data clustering using the R packages miceadds and estimatr to produce robust standard errors for the reported model coefficients.
Models
Two categories of mapping approaches were evaluated: transfer to utility (TTU) regression and response mapping. A summary of these models is shown in Table
1 (details in Sect. 3, Supplementary material 1).
Table 1
Summary of models
Time to utility (TTU) regression |
Univariate linear | Ordinary least squares linear regression | Total OSS score | Predicted EQ-5D health index |
Linear splines | Ordinary least squares linear regression with piecewise function introducing knots | Total OSS score | Predicted EQ-5D health index |
Polynomial | Squared and cubic polynomial models | Total OSS score | Predicted EQ-5D health index |
Cubic splines | Squared and cubic polynomial models with piecewise function introducing knots | Total OSS score | Predicted EQ-5D health index |
Multivariable linear | Ordinary least squares linear regression | Question-level OSS score for each of 12 questions | Predicted EQ-5D health index |
Two-part | Step 1: logistic regression to identify patients with a probability greater or equal to 0.5 of having a health index of 1. Step 2: ordinary least squares linear regression for patients with a probability less than 0.5 of having a health index of 1 from step 1 | Total OSS score | Predicted EQ-5D health index |
Tobit | Censored regression model designed for left or right censoring in the dependent variable. Bounds for EQ-5D index used are -0.594 to 1 | Total OSS score | Predicted EQ-5D health index |
Adjusted limited dependent variable mixture model (ALDVMM) | Tailored model for mapping that replaces the underlying normal distributions with beta distributions | Question-level OSS score for each of 12 questions | Predicted EQ-5D health index |
Response mapping |
Ordered logistic regression | Ordinal regression model to predict the probability of responses 1, 2 or 3 for each EQ-5D domain | Question-level OSS score for each of 12 questions | Predicted response category (1,2 or 3) for each EQ-5D domain |
Seemingly unrelated regression (SUR) | Simultaneous estimation of OLS linear equations to predict each EQ-5D domain response | Question-level OSS score for each of 12 questions | Predicted response category (1,2 or 3) for each EQ-5D domain |
Regularised models (LASSO, ridge and elastic net regression) |
Multivariable linear | LASSO, ridge regression and elastic net regression regularisation techniques applied to above multivariable model | Question-level OSS score for each of 12 questions | Predicted EQ-5D health index |
Ordered logistic regression | LASSO, ridge regression and elastic net regression regularisation techniques applied to above ordered logistic regression model | Question-level OSS score for each of 12 questions | Predicted response category (1,2 or 3) for each EQ-5D domain |
TTU regression approaches aim to use OSS responses to directly predict the EQ-5D health index. We evaluated several different TTU regression models including univariate linear, polynomial, multivariable linear, two-part logistic-linear, tobit and adjusted limited dependent variable mixture models (ALDVMM). We investigated the effect of introducing a piecewise (spline) function to display different coefficients over different ranges of the OSS. From our data, a considerable number of patients (17.9%) reported an EQ-5D health state of “11111” indicating full health (i.e. health utility index equal to 1). The univariate linear regression model would not predict a health index equal to 1, so we developed a two-part model consisting of a logistic and linear regression component. The logistic regression component predicts the probability of a patient to have an EQ-5D health index of 1, and the linear component predicts the health state of the remaining patients. Tobit models allow for a linear relationship between the OSS and EQ-5D with censoring of values at the lower and upper bounds of possible EQ-5D values. ALDVMM is a tailored model developed specifically for mapping that replaces the underlying normal distributions with beta distributions that can be used for bounded outcomes [
23]. Sequential likelihood-ratio tests were used to compare nested multivariable models when reducing the number of covariates to improve model parsimony.
The aim of response mapping is to predict responses to the EQ-5D questions rather than to directly predict the health index [
24]. The EQ-5D health index can then be calculated using country-specific tariffs. Interest in response mapping has grown due to certain limitations with TTU regression approaches [
25]. Firstly, the distribution of health utilities is often not linear and there is a significant mass of observations at the upper boundary of one. This means that regression techniques may not be able to capture the true association between a predictor and the health index directly. Second, TTU regression models for EQ-5D are country-specific and thus less generalisable, due to requiring specific tariffs to convert health states to health utility. However, response mapping approaches require more granular data and are often more computationally intensive.
We used ordered logistic regression and seemingly unrelated regression (SUR) models. The ordered logistic regression model used all 12 OSS question responses to predict the response categories (1, 2 or 3) for each EQ-5D domain. The health index was subsequently calculated using the UK tariff from the predicted EQ-5D-3L questionnaire responses. SUR accounted for the potential correlations between elements of the equations for each EQ-5D domain.
We evaluated the effect of the addition of age and sex as predictor variables to each model. Regularisation techniques such as LASSO (Least Absolute Shrinkage and Selection Operator), ridge and elastic net regression reduce the risk of overfitting by reducing parameters and shrinking a model. We implemented these techniques on the multivariable linear and ordered logistic regression models and their effect on model performance was evaluated.
Validation
We did not have access to a dedicated validation dataset, but our sample size was sufficiently large to split it randomly into training and testing samples. Patients were randomly assigned into either the training or testing sample with a 70:30 split, respectively. Models were developed using the training sample. All models were first evaluated through internal validation where the model was fit to the training sample. We then examined model fit on the testing sample. Given the potential for this random split of our dataset to not be truly random, we carried out a 100-fold repeated random split of the training and testing samples. Each time, models were developed on a different training sample and their performance evaluated on a different testing sample. We reported the overall model performance across repeated testing samples. All available data were subsequently used to calculate the final model parameters reported in this study.
The developed models’ performance was then evaluated against subsets of the original testing sample, where each subset consisted of data from just one trial at a time, to evaluate model performance against known heterogeneity.
Our primary measures of model performance were the mean absolute error (MAE) and mean square error (MSE) between the observed and predicted EQ-5D health index scores. We were primarily interested in overall model performance on the testing sample averaged across the 100-fold repeated random split.
Other performance metrics are also important as they reflect different aspect of prediction accuracy. We assessed the deviation of the predicted mean from the observed mean health index and estimated the linear correlation between observed and predicted health index scores. We reported model calibration by examining how model performance varied depending on different tenths of the predicted health index and of the observed total OSS score.
We followed the ‘MAPS (MApping onto Preference-based measures reporting Standards) reporting statement’ and ‘ISPOR Mapping to Estimate Health-State Utility Values from Non–Preference-Based Outcomes Measures for Cost per QALY Economic Analysis Good Practices Task Force Report’ when reporting this mapping study [
26,
27]. All statistical analysis were undertaken using R software [
28].