Introduction
Health economic evaluation undertaken by the Norwegian Institute of Public Health and the Norwegian Medical Products Agency includes the quality-adjusted life years (QALY) methodology and use of the EuroQol EQ-5D for assessing health outcomes [
1,
2]. This follows similar recommendations for other countries, many of which have their own national value set for scoring the EQ-5D based on general population surveys [
3,
4]. The EQ-5D is the most widely used patient-reported outcome measure (PROM) in economic evaluation and is used in research more generally, including in Scandinavian national medical registers [
5,
6].
The most recent EQ-5D with five-levels (EQ-5D-5L) has a descriptive system which includes five dimensions of health: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has five levels reflecting no problems, slight problems, moderate problems, severe problems, and extreme problems/unable to do. The five responses give a health state represented by five-digits (for example, 12231), beginning with mobility. Health states are scored to give an index using a scoring algorithm from a value set derived from valuation tasks typically undertaken with general population samples [
3]. These values are anchored in 1 (= full health) and 0 (= dead) with values < 0 indicating states valued less than dead. Values in the form of a scoring algorithm inform the economic evaluation of health technologies based on cost per QALY.
Further national value sets were developed following the introduction of the five-level version, the EQ-5D-5L, which now cover at least 38 countries [
3,
4,
7‐
17]. These have largely followed the EuroQol Valuation Technology (EQ-VT) protocol, which provides methodological consistency, quality control, and promotes best practices and comparability across countries [
3,
18]. The EQ-VT is available for computer/PC-assisted face-to-face and video conferencing interviews [
8], which include the composite time trade-off (cTTO) and discrete choice experiment (DCE) valuation methods [
18,
19].
EQ-5D-5L use is increasing in Norway, and it is the most widely used PROM in the national system of medical registers. Norway lacks a value set for the EQ-5D-5L, and national recommendations are use of an interim crosswalk, which maps the EQ-5D-5L to the original UK EQ-5D-3L value set [
2,
20,
21]. However, the crosswalk is based on the UK value set developed in the 1990s for the earlier EQ-5D descriptive system, and before important methodological advances including EQ-VT. The availability of a Norwegian value set will increase the legitimacy of EQ-5D-5L use across applications and particularly economic evaluation. This study followed the EQ-VT protocol [
18,
19] to derive a Norwegian value set for the EQ-5D-5L based on interviews with a representative sample of the adult Norwegian general population.
Methods
The study followed the latest protocol from the EuroQol Group (EQ-VT 2.1) [
18], and reporting follows the CREATE checklist [
22]. Data collection started in November 2019 using face-to-face interviews [
19] but was postponed in March 2019 due to the COVID-19 pandemic. Data collection resumed with face-to face and video conferencing interviews in the final two months of 2022.
Ethics
The Regional Committee for Medical and Research Ethics stated that the study did not require their approval. The Data Protection Impact Assessment was approved by the Norwegian Institute of Public Health on 30 September 2019. The continuation of data collection in 2022 was assessed by the Data Protection Officer at Akershus University Hospital, who concluded that the impact assessment of 2019 was still valid.
Sample selection and data collection
Respondents were aged 18 years and over, resided in Norway and were sufficiently literate in Norwegian to complete the interview. Sample size was set to a minimum of 1,000 individuals with each valuing 10 health states, with the aim of 10,000 responses [
3,
23]. Multistage random sampling and quota sampling were used to ensure representativeness according to age, sex, educational level, and geography, as described in the protocol [
19]. Five hospital areas were randomly selected such that a minimum of one would be in each of the Norwegian health regions (North, East/South, Middle, West), with sampling likelihood weighted by the average number of individuals in the catchment area. Within these samples, quota sampling was used to achieve national representativeness [
19]. Hard to reach groups were oversampled including ethnic minorities, those with lower socio-economic status, and parents of young children. Potential respondents were informed about a cash card incentive equivalent to €30 at interview completion. Data collection took place at different locations [
19].
For the video conferencing and to ensure accordance with quotas, interviewers completed an interview scheduling tool with available slots filled by recruiters. Recruiters were awarded a fixed hourly rate and based on previous experiences of recruitment challenges, more time was allotted for participants who were male, aged over 65 years, and with lower educational level. Participants received a reminder prior to the scheduled time and an interview link. The interviewer shared the screen with the respondent and managed data entry [
8].
Valuation interviews
Interviewers were graduate students at the University of Oslo, except 2 who were retired healthcare professionals. All underwent 2.5 days of intensive training and undertook 10 test interviews with approved quality according to the EQ-VT protocol [
24] before main data collection.
The interviewer guided the respondent throughout and answered any questions. Their instructions included not commenting on possible illogical responses, encouraging thinking aloud, and asking respondents to carefully consider each health state. Interview content was identical for the two data collections. For the face-to-face interviews, the portable version of the software (EQ-PVT) was used, which has the same functionality and interface as the standard web-based version 2.1. The interviews have five sections. First, introduction and welcoming the respondent. Second, respondents completed the EQ-5D-5L and questions about age, gender (male, female, other), and education level. Third, the cTTO was introduced including an explanation of the task and completion of three practice tasks for mild, moderate, and severe health states. Respondents were randomized to 1 of 10 standardized cTTO blocks of EQ-5D-5L health states, each of which included one very mild state (11112, 11121, 11211, 12111, or 21111), the worst state (55555), 8 states of different severity, and covering a total of 86 health states [
23,
24]. Following completion of the cTTO, respondents were presented with a feedback module comprising the 10 health states in order of the values they assigned, and asked to flag responses that were clearly misordered. Fourth, the DCE was introduced along with instructions. Respondents were randomized to 1 of 28 standardized blocks of 7 DCE pairs [
23,
24]. Fifth, respondents were thanked for their participation.
cTTO valuation starts with the standard time trade-off (TTO), applicable to health states valued higher than dead, and changes to the lead-time TTO for states valued lower than dead by the respondent. Values range from − 1 to 1 for the lowest (trading all lead time) and highest (trading no time) valued health state with 0.05 increments. DCE asks respondents to choose between two EQ-5D-5L health states in terms of preference.
QC was used at weekly intervals to begin with, and then at 2 to 4-week intervals to monitor interviewer protocol compliance and face validity of cTTO data. Flags included time spent on the task, introduction to lead-time TTO during the initial example of the health state, and inconsistent valuations of the worst possible health state [
20].
Modelling and data analysis
Descriptive statistics were used to compare characteristics of respondents with those for the Norwegian general population (Statistics Norway, October 1, 2022). Prior to statistical modelling, cTTO data for health states flagged in the feedback module were removed along with cTTO and DCE data not complying with the study protocol.
Respondent values for the 86 EQ-5D-5L health states were used to estimate the 3,125 possible health state values. Modelling was undertaken for cTTO data alone and combined with DCE data for the hybrid model [
3,
4], using the widely used 20-parameter (Eq. 1) and 8-parameter (Eq. 2) models [
7,
25]. The models have the same dependent and independent variables but differ by estimated coefficients. Following convention, the dependent variable was rescaled to disutilities (1-cTTO). The models were tested with and without intercepts, for a total of 8 tested combinations: cTTO alone vs. cTTO + DCE hybrid; 8 or 20 parameter form; with and without an intercept.
Let l be level ∈ {2,3,4,5}, d be dimension ∈ {MO, SC, UA, PD, AD}, and x be a vector of dummies.
Equation 1, 20-parameter model with intercept α:
$$\:-\left(u-1\right)=\:\alpha\:+$$
$$\:{{\beta\:}_{MO2}x}_{MO2}+{{\beta\:}_{SC2}x}_{SC2}+{{\beta\:}_{UA2}x}_{UA2}+{{\beta\:}_{PD2}x}_{PD2}+{{\beta\:}_{AD2}x}_{AD2}+$$
$$\:{{\beta\:}_{MO3}x}_{MO3}+{{\beta\:}_{SC3}x}_{SC3}+{{\beta\:}_{UA3}x}_{UA3}+{{\beta\:}_{PD3}x}_{PD3}+{{\beta\:}_{AD3}x}_{AD3}+$$
$$\:{{\beta\:}_{MO4}x}_{MO4}+{{\beta\:}_{SC4}x}_{SC4}+{{\beta\:}_{UA4}x}_{UA4}+{{\beta\:}_{PD4}x}_{PD4}+{{\beta\:}_{AD4}x}_{AD4}+$$
$$\:{{\:\:\:\beta\:}_{MO5}x}_{MO5}+{{\beta\:}_{SC5}x}_{SC5}+{{\beta\:}_{UA5}x}_{UA5}+{{\beta\:}_{PD5}x}_{PD5}+{{\beta\:}_{AD5}x}_{AD5}+\:\epsilon\:$$
Equation 2, 8-parameter model with intercept α:
$$\:-\left(u-1\right)=\alpha\:+$$
$$\:\left({{\beta\:}_{MO}x}_{MO2}+{{\beta\:}_{SC}x}_{SC2}+{{\beta\:}_{UA}x}_{UA2}+{{\beta\:}_{PD}x}_{PD2}+{{\beta\:}_{AD}x}_{AD2}\right){L}_{2}+$$
$$\:\left({{\beta\:}_{MO}x}_{MO3}+{{\beta\:}_{SC}x}_{SC3}+{{\beta\:}_{UA}x}_{UA3}+{{\beta\:}_{PD}x}_{PD3}+{{\beta\:}_{AD}x}_{AD3}\right){L}_{3}+$$
$$\:\left({{\beta\:}_{MO}x}_{MO4}+{{\beta\:}_{SC}x}_{SC4}+{{\beta\:}_{UA}x}_{UA4}+{{\beta\:}_{PD}x}_{PD4}+{{\beta\:}_{AD}x}_{AD4}\right){L}_{4}+$$
$$\:{{\beta\:}_{MO}x}_{MO5}+{{\beta\:}_{SC}x}_{SC5}+{{\beta\:}_{UA}x}_{UA5}+{{\beta\:}_{PD}x}_{PD5}+{{\beta\:}_{AD}x}_{AD5}+e$$
The hybrid model [
26] combines cTTO and DCE data to give common coefficients found to be logically consistent and is widely used alongside EQ-VT versions 2.0 and 2.1 [
3,
4]. This model uses joint maximum likelihood with the same parameters to give likelihood estimates over cTTO data assuming a normal distribution and DCE using conditional logit, applying the same set of coefficients to both, with an arbitrary parameter θ to account for the difference in scale between cTTO and DCE. In common with existing value sets [
3,
4], potential models had censoring at -1 (right-censoring at disutility 2), random intercepts for cTTO at the individual respondent level, and allowed for heteroscedasticity linear in estimated state disutility, i.e. relaxing the homoscedasticity assumption by modeling the standard deviation of the fitted normal distribution for the model error, typically modeled as a single parameter σ, by
\(\:{\sigma\:}_{s}={\alpha\:}_{\sigma\:}+{\beta\:}_{\sigma\:}\times\:{v}_{s}\), where
\(\:{v}_{s}\) is the estimated disutility of EQ-5D-5L health states.
Final model selection was based on logical consistency in terms of ordering of EQ-5D-5L states and corresponding values, and out-of-sample predictive accuracy for TTO blocks as assessed by root mean square error (RMSE) between predicted health state values and likelihood-based (censored) mean values for the corresponding health states. Censored mean values were used to account for censoring at -1 and estimated using Tobit models to predict the mean value for each health state with a single coefficient per health state. Confidence intervals and standard errors for coefficients and all 3,125 predicted EQ-5D-5L health state values were derived using bootstrapping: 10,000 samples of the same size were drawn at the level of individual study participants, with resampling, the models were fitted to each subsample. Standard errors were estimated using the standard deviation of estimated coefficients and predicted health state values, and 95% confidence intervals were taken as the 2.5 and 97.5 percentiles of the bootstrapped values. To ensure representativeness of the sample and reduce potential biases in the analysis, the sample was re-weighted to match the general Norwegian population in terms of age, sex, and geographic region using propensity score weighting [
27].
The distribution of the values in the new Norwegian value set were compared with values calculated using the crosswalk to UK 3 L values and EQ-5D-5L value sets from Denmark [
28], Sweden [
11], and US [
29]. This was done by graphical display of key characteristics. Comparative value set sensitivity to change was also assessed graphically [
30].
Models were fitted and tested in R 3.6.1 (R Development Core Team, Vienna, Austria).
Discussion
This study reports the Norwegian EQ-5D-5L value set based on surveys of the adult Norwegian general population using cTTO and DCE as part of the most recent EQ-VT 2.1 protocol to secure data quality. The value set gives the index score based on a scoring algorithm applied to responses and is suitable for estimating QALYs in economic evaluation. Application of a national value set improves the legitimacy of economic evaluation in Norway and is a major improvement over current recommendations for the interim crosswalk value set across research and other applications [
2,
6,
31].
The crosswalk is based on the UK value set, derived three decades ago for the earlier EQ-5D descriptive system with three levels for each dimension. Methodological flaws are widely documented [
3,
18,
21,
23,
24], which led to the development of the EQ-VT to address these deficiencies and provide greater standardization across nations to aid value set comparisons. There is increasing application of the 5 L version, and it is now the most widely used PROM in Norwegian national quality registers [
6]. The availability of the value set and accompanying norm data [
31] for the Norwegian general population is timely in this regard and enhances use of the five-level descriptive system.
The relative importance of dimensions across 31 EQ-5D-5L value sets was compared in a recent systematic review [
4]. For standard hypothetical health state valuation studies, the results for Norway, with anxiety/depression and pain/discomfort as the two most important dimensions, follow those of 9 of 11 Western European countries including value sets more recently published [
3,
4,
7,
8,
11]. Anxiety/depression was the most important for three of these countries. The dimension of mobility was the next most important after pain/discomfort in value sets for France, Portugal, and Italy [
3,
4,
8]. More recent value sets for Eastern European countries found mobility [
10,
14] and self-care [
14] to be among the two most important dimensions, which together with anxiety/depression being one of the two least important dimensions, are similar for other Eastern European countries [
3,
4].
When the size of regression coefficients is compared across dimension levels, those for Norway are within the range of those for Western European countries [
3,
4,
7,
8,
11]. The exception is usual activities, the least important dimension in the Norwegian value set, which has slightly lower coefficients across three levels compared to Western European countries. When compared to those for other Western European countries, coefficients for the three Scandinavian countries [
3,
4,
11,
28] are also closer in size. This is particularly true for usual activities, with the caveat that there is less variation for this dimension more generally. The lowest possible score of -0.453 is similar to the Netherlands [
32] and Spain [
33] and ranked fifth among 12 Western European countries. The number of states worse than dead at 11%, is closest to that for Portugal [
34] and Spain [
33] with 9 and 8%, respectively [
3].
The comparison of EQ-5D-5L value sets is facilitated by the application of EQ-VT [
3,
4]. The standardized protocol lends uniformity to valuation, promotes best practices for data collection, and includes QC procedures. Furthermore, the great majority of EQ-5D-5L value sets are based on either the cTTO or cTTO combined with DCE, with similar statistical modelling [
3,
4], a possible synergy effect arising from EQ-VT use and greater awareness of good scientific practice. Hence, accruing evidence for differences across value sets is likely due to differences in values for health states associated with culture, income and wealth, and health systems rather than methods used to obtain EQ-5D-5L values [
3,
4]. Research culture including attitudes and expectations about the way research is communicated and conducted, is also potentially important and might affect valuation studies differently across countries, including the interaction between interviewers and respondents.
The differences, including the relative importance of the dimensions and number of states worse than dead, lend support to the validity of national value sets but limit cross-national relevance and application in other countries. The uniqueness of the Norwegian value set including dimension rankings, lends further support for the use of country-specific value sets in economic evaluation.
The Norwegian value set followed the most recent version of the EQ-VT, version 2.1 to give high levels of data quality. This version added the feedback module as a further means of improving data quality. The Norwegian interviewers had substantial training, including EQ-VT presentations by the research team, a training workshop, detailed discussion of the interview guide, demonstration of EQ-PVT in an interview setting, pre-pilot interviews via role-playing, and feedback opportunities. Ten pilot interviews were subsequently conducted by all interviewers with feedback opportunities and QC in collaboration with the EuroQol Group. Members of the project team worked as interviewers including the Principal Investigator.
After comparing widely used models against standard criteria, the hybrid multiplicative 8-coefficient model without an intercept for cTTO data, random effects, and correction for heteroskedasticity was selected for the Norwegian value set data. This followed other national value sets where the combination of valuation data was informed by both sufficient agreement between cTTO and DCE and improvement in fit of observed and predicted values [
3]. The hybrid model is the most popular and selected for the final value set in 19 of 27 studies using EQ-VT protocol 2.0 or 2.1 [
3,
4,
7‐
12,
14‐
17].
The Norwegian EQ-5D-5L value set is considerably different to that currently recommended for Norway based on the crosswalk. The Norwegian values are higher which follows findings for several other countries where comparisons with EQ-5D-3L values and crosswalk were undertaken [
29,
35‐
38]. Moreover, the ranking of dimensions has changed with anxiety/depression being most important compared to third most important in the crosswalk. The second largest change relates to mobility, which is the fourth most important compared to second most important in the crosswalk. Changes in dimension rankings were also found for France in comparisons of EQ-5D-3L and 5L value sets [
35].
The EQ-VT protocol makes no requirements regarding sampling or representativeness, and several approaches were used across different countries. Norway has a low population density with remote areas, and the sampling strategy reflected this [
19]. Multistage random and quota sampling were used to give representativeness in terms of age, sex, and geography. Moreover, locations were selected in a manner that would fulfill necessary quotas and hard to reach respondents including those in low socioeconomic groups and with time constraints. There was a lack of representativeness, with underrepresentation of respondents from the Southern region and a lower education level. It was decided to include weighting for the characteristics of the general population in the final value set. Only slight differences were found between the weighted and unweighted value sets in terms of performance across the different models tested.
Most EQ-5D-5L value set studies that considered representativeness through comparisons of respondent characteristics with those for the general population, found differences of > 5% for the background characteristics reported. This includes overrepresentation of more highly educated respondents and those from urban areas [
3]. The two other Scandinavian countries of Denmark and Sweden found underrepresentation for age groups 18–24 and 30–49 years respectively [
11,
28]. Denmark compared education levels and like Norway, found overrepresentation of the more highly educated [
28]. The inclusion of weights was tested in models for five countries [
3,
37‐
39] and included in the final value sets for England, France, and Peru [
35,
38,
40].
The main study limitation arises from the COVID-19 pandemic preventing completion of the original data collection. The remaining half of the PC-assisted face-to-face interviews were due to take place from March to June 2020. Similar to Sweden [
11], video conferencing was used following a delay of 2½ years. The time lag and the COVID-19 pandemic might have affected valuations but during this time, EQ-VT video conferencing became available with evidence for feasibility of data collection [
8,
11]. Both data collections used the same methods of recruitment including use of contact persons at locations, and information materials. The revised protocol followed the original as far as possible and did not include consideration of mode of administration or possible interviewer effects arising from the recruitment of new interviewers. Both data collections adhered to EQ-VT 2.1 protocol and stringent criteria relating to training of interviewers, interviewer testing, and QC. Given the different interview methods, there are possibly differences in values for the two data collections. Both data collections had high levels of protocol compliance, and the decision was made not to test for this in the revised protocol.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.