Top

Journal of Psychopathology and Behavioral Assessment

Gepubliceerd in:

Open Access 01-03-2025

Differential Item Functioning for Gender and Age of the DSM-IV Borderline Personality Disorder Criteria in a Large Clinical Sample

Auteurs: Benjamin Hummelen, Tuva Langjord, Muirne C.S. Paap, Espen Jan Folmo, Geir Pedersen, Johan Braeken

Gepubliceerd in: Journal of Psychopathology and Behavioral Assessment | Uitgave 1/2025

Abstract

This study examined the DSM-IV/DSM-5 Borderline Personality Disorder (BPD) in a clinical sample of 4102 patients (845 diagnosed with BPD) using Item Response Theory analysis, with special emphasis on Differential Item Functioning (DIF) across gender and age. Among the three criteria that displayed DIF for age, Fear of abandonment and Self-injurious behavior were more frequently assigned to female patients as compared with male patients situated at the same position on the latent BPD scale. Uncontrolled anger was more commonly attributed to male patients at equivalent levels of latent BPD severity. For age, DIF was present for five criteria. Self-injurious behavior and Affective instability were more prevalent in the younger age group (18–25), given the same severity levels as the older age group. Conversely, Unstable relationships, Impulsivity, and Dissociation were more frequently identified in older patients. Identity problems showed no DIF and had good discriminative ability. The results were interpreted in light of the view that BPD is a proxy for general personality pathology severity. As such, the behavioral-oriented criteria, notably Self-injurious behavior and Uncontrolled anger, posed the most challenges in terms of DIF, and caution is advised in using these criteria to assess general severity.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Borderline personality disorder (BPD) is a debilitating condition characterized by a pattern of unstable relationships, disturbed identity, impulsivity, self-harm, and emotional dysregulation (American Psychiatric Association (APA), 2013). Despite its relatively low prevalence in the general population, estimated at around 1%, BPD stands out as one of the most commonly diagnosed personality disorders in clinical settings, affecting up to 22% of individuals in inpatient samples and 12% in outpatient samples (Eaton & Greene, 2018; Ellison et al., 2018). The impact of BPD extends beyond its symptomatology, encompassing a significant socioeconomic burden and reduced life expectancy (Gunderson et al., 2018; Leichsenring et al., 2024).

The Fourth Edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV, APA, 1994) and its subsequent revision (DSM-5, APA, 2013) operationalize BPD through nine criteria, requiring at least five for a diagnosis (see Table 1). These criteria span a wide array of personality problems, including self-pathology (e.g., Identity disturbances), interpersonal dysfunction (e.g., Unstable relationships), emotional dysregulation (e.g., Uncontrolled anger and Affective instability), and behavioral dysregulation (e.g., Impulsivity and Self-injurious behavior). In spite of its heterogeneity, research has largely supported the conception of BPD as a unidimensional construct (Aggen et al., 2009; Clifton & Pilkonis, 2007; Jane et al., 2007; Johansen et al., 2004; McMahon et al., 2019; Sharp et al., 2012).

Table 1

Graded response model parameters of the Nine BPD Criteria as assessed by SCID-II

i	Criterion	a_i	b_i1	b_i2	d_i1	d_i2	h_i²
1	Fears of abandonment	1.58	0.32	1.01	-0.51	-1.59	0.46
2	Unstable relationships	2.56	0.27	0.75	-0.68	-1.93	0.69
3	Identity problems	2.03	0.51	1.08	-1.04	-2.18	0.59
4	Impulsivity	1.70	0.25	1.04	-0.43	-1.77	0.50
5	Self-injurious behavior	1.34	0.24	0.94	-0.32	-1.26	0.38
6	Affective instability	2.53	− 0.13	0.37	0.33	-0.93	0.69
7	Chronic Emptiness	0.88	− 0.63	0.52	0.55	-0.46	0.21
8	Uncontrolled anger	1.82	0.50	1.17	-0.92	-2.13	0.53
9	Dissociation or paranoid ideation	1.08	0.53	1.51	-0.57	-1.63	0.29

Note. Model fit: M₂(df = 9) = 89, p <.001; RMSEA₂ = 0.05; CFI = 0.96; SRMR = 0.03. a_i, b_1i, b_2i, d_1i, d_2i, and h_i² are respectively the estimated discrimination parameter, first and second location parameters, first and second threshold parameters (or intercepts, i.e., b_ic = -d_ic/a_i), and commonality coefficient ($\:{h}_{i}^{2}={\left[\frac{{a}_{i}}{\sqrt{{\frac{{\pi\:}^{2}}{3}+a}_{i}^{2}}}\right]}^{2}$) of each BPD criterion in the graded response model

Due to its comprehensiveness, unidimensional structure, and extensive overlap with other PDs, BPD is often viewed as a global index of PD severity rather than a separate diagnostic category (Gunderson et al., 2018; Paap et al., 2022; Sharp et al., 2015). Global severity typically encompasses elements related to self-functioning and interpersonal functioning (APA. 2013) but may also include behavior oriented indicators. For example, the DSM-5 Alternative Model for Personality Disorders defines general severity in terms of impairment of self-functioning and interpersonal functioning, whereas the ICD-11 model for Personality Disorders extends this to include self-injurious behavior and impulsivity (World Health Organization (WHO), 2019). Thus, there still seems to be some controversy regarding the definition of general severity. Under the assumption that BPD is a good proxy for general severity, in-depth analyses of the psychometric properties of the BPD criteria can illuminate which criteria are the most reliable indicators of general PD severity. Given that overall severity is crucial in PD diagnosis according to contemporary diagnostic manuals (APA, 2013; WHO, 2019), understanding which criteria are influenced by common demographic variables such as gender and age is important, as such biases may impact PD prevalence rates across demographic groups.

For several decades, it was widely believed that BPD was more prevalent in women than in men, partly supported by epidemiological research (APA, 2013; Trull et al., 2010). This assumption has spurred research into potential gender bias within the criteria for BPD as a potential cause of differences in prevalence rates (Bozzatello et al., 2024; Sharp et al., 2014). Gender bias refers to the potential for the criteria to be influenced by societal stereotypes or expectations about gender. For example, traits such as emotional dysregulation, self-harm, and intense interpersonal relationships, which are central to BPD diagnosis, may be more socially acceptable or expected in females, potentially leading to overrepresentation of women diagnosed with BPD. Conversely, certain behaviors that are more commonly associated with males, such as impulsivity and aggression, may be underemphasized or overlooked in female patients. This can result in disparities in access to treatment and support for individuals with BPD based on gender.

Though recent epidemiological studies have indicated that the prevalence of BPD may be similar in both genders (Bozzatello et al., 2024), gender bias remains a concern as certain criteria may be more prevalent in men while others may be more prevalent in women, which may go unnoticed when counting the total number of criteria. A straightforward way to evaluate gender bias would be to compare prevalence rates of individual criteria between female and male participants. However, this is not an appropriate approach because severity is not taken into account. As it is plausible that higher prevalence of one specific criterion is associated with higher prevalence of other criteria, it is necessary to control for the severity of the disorder. In Item Response Theory (IRT), severity is taken into account by evaluating the psychometric properties of the BPD criteria on a latent scale; the “latent BPD severity scale” in this situation. One commonly used approach for assessing the psychometric properties of a criteria set is through the application of the two-parameter IRT model (or in case of polytomous data, the Graded Response Model), which estimates the discrimination parameter and the location parameter for each item. For dichotomous data, the location parameter, also known as difficulty or threshold parameter, refers to the location at the latent severity scale at which individuals have a 50% probability of being assigned a particular criterion. It essentially represents how “easy” or “difficult” an item is– a smaller location parameter indicates an easier item (requires less psychopathology), while a larger parameter indicates a more difficult one (requires higher levels of psychopathology). The discrimination parameter indicates how well an item differentiates between individuals with high and low levels of BPD severity. Both parameters can display differential item functioning (DIF). When there is DIF for gender in the discrimination parameter, the criterion is not equally effective in discriminating between men and women with the same position on the latent BPD severity scale. DIF in the location parameter implies that women have a different probability of endorsing the criterion than men, given the same severity level.

Table 2 summarizes studies investigating DIF for gender of the BPD criteria as assessed by clinicians or trained laypersons. In line with the tradition in the PD field, we report the threshold parameters instead of location parameters. Of note, the study of Aggen et al. (2009) investigated DIF from a factor-analytical perspective. This study is included in the table due to similarity in the interpretation of the parameter estimates, i.e., intercepts are analogous to threshold parameters and factor loadings are analogous to discrimination parameters. If we look at the specific studies, we see that the first IRT study on gender bias of the DSM-IV BPD criteria was published by Jane et al. (2007), using a sample of 433 Air Force recruits and 166 students. Unexpectedly, the criteria showed no evidence of DIF in this study. However, subsequent DSM-IV based studies consistently identified DIF for several criteria. The threshold parameter for the Impulsivity criterion was larger in women in several studies (Aggen et al., 2009; Benson et al., 2017; Hoertel et al., 2014; Sharp et al., 2014), implying that female participants had a lower probability of endorsing the impulsivity criterion than men, given the same location on the latent BPD severity scale. The threshold parameter for the eighth BPD criterion, Uncontrolled anger, was also larger for women in some studies (Benson et al., 2017; Sharp et al., 2014). Larger thresholds for men were found in three studies, involving three criteria, i.e., Affective instability, Chronic emptiness, and Self-injurious behavior (Aggen et al., 2009; Benson et al., 2017; Hoertel et al., 2014).

Table 2

DIF studies of gender and age using a checklist or a structured clinical interview to assess the BPD criteria in adults

	Sample	Instrument	Main findings
Gender			Larger threshold in women	Larger threshold in men
Jane et al. (2007)	433 Air Force recruits; 166 students	SIDP-IV	No DIF	No DIF
Sharp et al. (2014)	747 inpatients	SCID-II	Impulsivity Uncontrolled anger
Hoertel et al. (2014)	34.481 community dwellers	AUDADIS-IV	Impulsivity	Self-injurious behavior; Affective instability Chronic emptiness
Benson et al. (2017)	337 clinicians rated one of their patients	Check list of the DSM-IV PD criteria	Impulsivity Uncontrolled anger	Chronic emptiness
Aggen et al. (2009)	2794 twins	SIDP-IV	Impulsivity ^b	Affective instability
Age			Larger threshold in older adults	Larger threshold in younger adults
Sharp et al. (2019)	1879 inpatients^a	SCID-II	No DIF across adult age groups	No DIF across adult age groups
McMahon et al. (2019)	34.481 community dwellers	AUDADIS-IV	Self-injurious behavior ^c

Note: SCID-II: Structured Clinical Interview for DSM-IV Axis II Personality Disorders; SIDP-IV: Structured Interview for DSM-IV; AUDADIS-IV: Alcohol Use Disorder and Associated Disabilities Interview Schedule-IV

^aAdolescents (ages 12–17 years; n = 484), young adults (ages 18–25 years; n = 442), and adults (ages ≥ 26 years; n = 953)

^b Interaction effect: Women in the younger age group had smaller thresholds for the Impulsivity criterion as compared with men in the younger age group. This effect was more pronounced for women and men in the older age group. ^c For female participants

With respect to the discrimination parameter, only two studies found DIF for gender (results not provided in Table 2). Using a large community sample, Hoertel et al. (2014) found larger discrimination parameters in women for both Affective instability and Chronic emptiness, indicating that in female participants, these criteria were better suited than the other criteria to discern between those who were situated at the higher end of the latent BPD scale versus those who were at the lower end. Aggen et al. (2009), on the other hand, found larger factor loadings in men for Impulsivity using data from the Norwegian twin register. In sum, it appears that most studies found considerable DIF for gender, most pronounced for the threshold parameter for Uncontrolled anger and Impulsivity. Therefore, based on these studies, it appears that the aspiration for gender-neutral criteria is not fully realized. However, it is worth mentioning that only three studies utilized clinical samples, with two employing structured diagnostic interviews. Further studies are required to gain deeper insights into whether there is any variation in item functioning for the BPD criteria in clinical populations.

Ideally, diagnostic criteria should also be free from age-related DIF. However, achieving age-neutrality of the BPD criteria may be challenging. As highlighted by Sharp et al. (2019), the BPD criteria were not constructed in developmentally sensitive ways, and therefore, it is reasonable to assume that the BPD criteria may behave differently across age groups. At the bottom of Table 2, two IRT studies and one factor-analytical study are presented that have investigated DIF across different age groups. Based on the classic taxonomy of developmental periods, Sharp et al. (2019) analyzed three age cohorts of psychiatric inpatients in the range of adolescents (12–17 years), young adults (18–25 years), and older adults (≥ 25 years). All patients were assessed by the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II), administered by trained master’s level research assistants. There was DIF for all nine criteria when the adolescent and adult groups were compared. However, no DIF emerged between the two adult groups (young adults versus older adults). A recent NESARC study showed that Self-injurious behavior manifested itself differentially across age in female participants, i.e., older women had more severe BPD pathology before this criterion was endorsed compared to younger women (McMahon et al., 2019). Moreover, Unstable relationships discriminated BPD severity better in younger adults as compared with older adults, in both genders (results not provided in the table). Of note, these effects were only found in the youngest group (20–33 years) compared to the oldest group (65 years and older).

Considering these findings collectively, a discernable trend emerges indicating that Impulsivity, Uncontrolled anger, and Self-injurious behavior were the criteria most frequently exhibiting DIF across gender and age in these studies, corroborating the findings of a recent narrative review on gender differences in BPD (Bozzatello et al., 2024). This may imply that criteria evaluating the behavioral aspects of BPD to a larger degree contribute to differences in prevalence rates across these demographics than criteria that focus on self-pathology and interpersonal dysfunction. Thus, the utility the behaviorally oriented criteria in diagnostic decision making might be constrained, as these criteria could contribute to variations in prevalence across age and gender. Translating this position to modern conceptualizations of PD, this could also suggest that behaviorally oriented criteria are less useful as indicators of overall severity. However, findings are rather heterogeneous with a few studies also reporting DIF for Affective instability and Chronic emptiness. Better empirical evidence is needed before the results of DIF studies can be taken into account in the development of future diagnostic manuals. Notably, the results of the DIF analyses should be considered in light of the findings from other psychometric analyses. Therefore, comprehensive psychometric studies of large clinical samples using structured clinical interviews may be especially useful in this endeavor.

The primary aim of the current study was to evaluate the psychometric properties of the DSM-5 BPD criteria with special emphasis on DIF across gender and age in a large clinical sample of patients (N = 4102) assessed by experienced clinicians using the SCID-II. Given the assumption that BPD serves as a good proxy for general PD severity, a secondary aim of the study was to provide additional empirical evidence to inform contemporary models of personality disorders, as outlined in the AMPD and ICD-11.

Employing a variety of psychometric approaches, we addressed six specific research questions. First, we wanted to assess the dimensional structure of the BPD criteria, assuming to find support for a unidimensional scale. Second, we sought to identify where on the latent BPD scale reliability was most optimal by evaluating local reliability across the latent BPD scale. Third, we wanted to assess the overall psychometric properties of the BPD criteria from an IRT perspective by examining the discrimination parameter and location parameter for each of the nine BPD criteria. Fourth, we aimed at investigating whether diagnostic subthreshold scores according to the SCID-II provided additional information in the assessment of BPD severity. Fifth, we sought to assess whether the BPD criteria behaved differently across gender. Based on the findings of Benson et al. (2017) and Sharp et al. (2014), we assumed that male patients would be more frequently assigned Impulsivity and Uncontrolled anger, given the same standing on the latent BPD severity scale. Sixth, we aspired to explore whether there was DIF for age by comparing two age groups, i.e., patients aged 18 to 25 years and patients aged 26 years and older– the same age groups as in Sharp et al. (2019). In line with the study of McMahon et al. (2019), we expected that Unstable relationships would discriminate better in younger adults and Self-injurious behavior would be more easily assigned to younger adults given the same standing on the BPD severity scale as older adults.

Methods

Sample Characteristics and Missing Data

Data were systematically collected from twenty outpatient units affiliated with the Norwegian Network of Personality Disorders (Pedersen et al., 2023), ensuring a comprehensive representation of the patient population. All units specialize in diagnosing and treating personality disorders and related challenges. The sample comprised 4102 psychiatric outpatients, of whom 3030 (74%) were women, with ages ranging from 17 to 66 years (mean = 33.1, SD = 10). Mean age was slightly higher for men than for women; 35.4 versus 32.2 years, respectively. This difference was statistically significant (t = 9.2, p <.001). Moreover, 1130 participants (28%) were under the age of 26.

Diagnoses were established based on DSM-IV criteria, using the Mini International Neuropsychiatric Interview (Sheehan et al., 1994) for symptom disorders and the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II, First, 1994) for PDs. Diagnoses were informed by a comprehensive approach, integrating referral information, patient history, clinical impressions, and structured interviews, as per the LEAD procedure (Longitudinal, Expert, All-Data, Pedersen et al., 2013; Spitzer, 1983). The SCID-II criteria were scored on a three-point scale: 1 denoting absence; 2 indicating subthreshold presentation; and 3 signifying full criterion met.

In the current sample, 88% had one or more symptom disorders and 69% had one or more PD diagnoses. Avoidant PD was the most common PD diagnosis (n = 1158, 30%), followed by borderline PD (n = 835, 22%), and PD not otherwise specified (n = 674, 17%).

Statistical Analyses

Dimensionality

To evaluate the dimensionality of the SCID-II BPD criteria, we applied two complementary methods: the Empirical Kaiser Criterion and exploratory Mokken Scale Analysis. The Empirical Kaiser Criterion method (Braeken & Van Assen, 2017) is an eigenvalue-based method that addresses the shortcomings of traditional approaches such as the Kaiser criterion (i.e., eigenvalue-greater-than-one rule). Based on the asymptotical sampling distribution of eigenvalues, Empirical Kaiser Criterion method establishes reference eigenvalues that would be expected for a data set of specified size if no factor structure would be present. The reference eigenvalues are presented in a scree plot and compared with the observed sample eigenvalues. Factors to be retained are located above the reference line.

Mokken Scale Analysis (Mokken, 1971) is a normed-item-covariance-based method and identifies scales that allow an ordering of individuals on an underlying scale using unweighted sum scores. The scalability H coefficient reflects the degree to which the scale can be used to reliably order persons on the latent trait using their sum score. A scale is considered acceptable if 0.3 ≤ H < 0.4, good if 0.4 ≤ H < 0.5, and strong if H ≥ 0.5 (Mokken, 1971; Sijtsma & Molenaar, 2002). For a more detailed evaluation, scalability coefficients were also inspected at item level: item pairs (H_ij) and/or individual items (H_i).

Item Response Model

Following an item response theory (IRT) approach, the Graded Response Model (Samejima, 1997) was used to scale and evaluate the BPD criteria (i.e., the “items” in our study context), and investigate the local reliability of the criteria set over the BPD latent trait severity scale.

Under the Graded Response Model, the probability of an outpatient being assigned to a specific assessment category (level c or higher) was modeled as a function of the outpatient’s underlying BPD pathology level and a set of two item parameters: an intercept d_ic and a slope a_i.

$$\:Pr\left({Y}_{pi}\ge\:c\:|\:{\theta\:}_{p}\right)=\frac{1}{1+exp\left(-{a}_{i}{\theta\:}_{p}-{d}_{ic}\right)}$$

The slope a_i is also known as item discrimination, with higher values indicating a steeper slope of the probability function across the latent BPD trait, and implying a clearer distinction between outpatients of different BPD levels in terms of which category (level c or higher versus lower than c) they are most likely to be assigned to. The intercept d_ic is proportional to the probability for an outpatient with latent BPD level equal to zero of being assigned a score greater than or equal to category level c (versus a lower category level). Their values are expressed on a logit scale and hence positive/negative d-values correspond to probabilities higher/lower than 0.5.

Given three assessment categories in the SCID-II, our model contained two such cumulative response category probability functions, each with the same item slope, but with their own item intercept covering a graded comparison; d_i2: criterion is absent versus not absent, and d_i3: criterion is fully met versus not fully met. Response probabilities for a specific category can be derived from taking simple differences:

$$\begin{aligned} &Pr\left( {{Y_{pi}} = 1\:|\:\theta {\:_p}} \right) = 1 - Pr\left( {{Y_{pi}} \geqslant \:2\:|\:\theta {\:_p}} \right) \\ &Pr\left( {{Y_{pi}} = 2\:|\:\theta {\:_p}} \right) = Pr\left( {{Y_{pi}} \geqslant \:2\:|\:\theta {\:_p}} \right) - Pr\left( {{Y_{pi}} \geqslant \:3\:|\:\theta {\:_p}} \right) \\&Pr\left( {{Y_{pi}} = 3\:|\:\theta {\:_p}} \right) = Pr\left( {{Y_{pi}} \geqslant \:3\:|\:\theta {\:_p}} \right) - 0. \end{aligned} $$

For interpretation and comparison with previous studies, instead of the item intercepts d_i2 and d_i3, it might be easier to speak in terms of location parameters b_ic = -d_ic/a_i, which correspond to the position on the latent BPD scale for which an outpatient has a 50% probability of being assigned a score greater than or equal to category level c (versus a lower category level); i.e., the inflection point. Higher b_i2 and b_i3 values imply that the criterion i is only likely to be partially, respectively fully, endorsed for outpatients with more severe BPD pathology.

A key advantage of using IRT to assess the psychometric properties of a criteria set is that reliability can be investigated locally; i.e., for different levels of the latent trait being measured (here: BPD). In IRT, measurement precision in absolute units is conceptualized in terms of test information$\:{\rm\:I}\left({\theta\:}_{p}\right)$ which by taking the square root of its inverse translates to the standard error of the estimated latent trait $\:SE\left({\theta\:}_{p}\right)$. In relative units, measurement precision can be computed as local reliability $\:Rel\left({\theta\:}_{p}\right)$ through the following formula, where $\:VAR\left({\theta\:}_{p}\right)$ denotes the variance of the latent trait (under the Graded Response Model equal to 1):

$$\:Rel\left({\theta\:}_{p}\right)=1-\frac{{SE\left({\theta\:}_{p}\right)}^{2}}{VAR\left({\theta\:}_{p}\right)}=1-\frac{1}{{\rm\:I}\left({\theta\:}_{p}\right)}$$

Differential item Functioning Across Gender and Age

Subsequently, we investigated potential biases in BPD criteria based on gender and age, using differential item functioning (DIF) analysis. By means of DIF analysis (Millsap, 2012), it was assessed whether the probability of endorsing a criterion was equal between members of the two groups (i.e., male versus female outpatients, or outpatients 25 years or younger versus those older than 25 years) that have the exact same latent scale score and hence implied severity of BPD pathology. If such differential functioning should occur due to aspects of the criterion or assessment context that is irrelevant to the intended measurement of BPD pathology, this would imply bias and fairness issues in assessment. If many items in the scale display DIF, they may compromise the instrument’s ability to scale distinct groups onto a common metric (Reise & Waller, 2009), and may thus dilute the construct we wish to assess people on– in our case the BPD construct.

Following Thissen and colleagues (1993), we adopted a model comparison approach in order to assess DIF. The procedure was identical for gender and age. First, two reference models were estimated and compared: a model in which all item parameters were constrained to be equal across the two groups, and a model in which all item parameters were left unconstrained. The latter unconstrained model reflects the extreme case, where the whole scale is inequivalent across the groups, whereas the fully constrained model reflects the case of equivalence, where the items function similarly across the groups, and there is merely a potential difference in scale mean and variance between the groups. If the unconstrained model had significantly better fit than the constrained model, as shown by the likelihood ratio test (LRT) comparing both models (and further supported by relative comparison of information criteria such as AIC and BIC), then this would imply that one or several BPD criteria function differently across the two gender/age groups.

If DIF was found, a second step followed, with the aim to identify which items caused the DIF. This was realized by estimating one extra model per BPD criterion, in which the fully constrained model was relaxed by removing the equality constraints across groups only for the item parameters of the criterion under investigation. If this partially unconstrained model for the criterion fitted significantly better than the fully constrained model, the criterion was flagged for DIF. In order to account for the multiple comparisons, the significance level was set family-wise across the nine LRTs (one per criterion) at a conservative value of 0.05/9 ≈ 0.006. Criteria flagged for DIF were further characterized by means of Wald tests to assess whether the DIF is uniform across the scale (i.e., whether it only affects the category thresholds) or nonuniformly varies across the scale (i.e., also affects the discrimination parameters). In order to account for multiple testing, the significance level was set family-wise across parameters within an item (i.e., 1 discrimination a_i, and 2 category thresholds, d_i1 & d_i2), at a conservative value of 0.05 / 3 ≈ 0.017.

Missing BPD Criteria

The data of the current study are based on ordinary routine assessments, which have led to some missing data, mostly due to failing administrative routines or to the initial assessment routines being incomplete when data were collected. As a result, two patients had five missing BPD criteria, three persons had three missing BPD criteria, 31 had two, and 125 had one missing value. Thus, 3941 (96%) had complete BPD criteria sets.

Missing data were not linked to gender or age of the outpatient and are considered as missing at random (MAR). In further statistical analyses, we used all available data, and followed an intention-to-treat approach, including all outpatients regardless of missingness of a few criteria.

Statistical Software

All statistical analyses were coded and performed in the open source software environment for statistical computing and graphics R (Team, 2022). All IRT models were estimated using a full information maximum likelihood approach in the R package mirt (Chalmers, 2012). Further analyses and diagnostics were custom coded in R.

Results

Dimensionality of the BPD Criteria

The Empirical Kaiser Criterion results strongly supported the hypothesis that the BPD criteria form a single underlying common latent trait: only the first eigenvalue of the pairwise correlation matrix exceeded the corresponding empirical Kaiser reference value (see Fig. 1). The first eigenvalue explained approximately 42% of the variance in the nine BPD criteria. The Mokken Scale Analysis further corroborated the unidimensional structure, which resulted in an H-coefficient of 0.38. At the item level, most criteria exhibited adequate scalability. The H_i-values for criteria Chronic emptiness and Dissociation equaled 0.29, just below the commonly used threshold of 0.30. Since both the Empirical Kaiser Criterion method and Mokken Scale Analysis results provided support for a unidimensional scale, the parametric IRT analyses were performed using the unidimensional Graded Response Model.

Measurement Precision

Figure 2 graphically depicts the measurement precision across the latent BPD severity scale. The histogram at the top of Fig. 2 demonstrates a wide range of BPD severity within this outpatient sample, with a tail on the left side of the distribution corresponding to outpatients that do not meet the BPD criteria. The highest level of measurement precision ($\:{r}_{xx}$(θ) > 0.80) was noted for theta values within the range of -0.6 to 1.5, while the lowest precision ($\:{r}_{xx}$(θ) < 0.70) was observed for theta values smaller than − 1.0. As the trait levels increase, measurement precision diminishes, rendering the assessment of very severe BPD pathology less precise. This implies that the BPD criteria fail to precisely distinguish between patients with substantial dysfunction and those with very severe dysfunction. Measurement precision was poorest (i.e., local reliability around 0.6) for outpatients with low BPD severity.

Item Parameters and Category Thresholds

Global model fit diagnostics, based on comparisons between observed and model-implied multivariate item response frequency tables (e.g., Maydeu-Olivares & Joe, 2014), indicated good fit: M₂(df = 9) = 89, p <.001; RMSEA₂ = 0.05; CFI = 0.96; SRMR = 0.03; and graphical comparison of empirical and model implied characteristic curves did not expose any item fit issues.

Table 1 details the Graded Response Model’s results. There was a wide variance in the discrimination parameters (a_i), ranging from 0.88 to 2.56. In line with the results of the dimensionality analyses, Chronic emptiness and Dissociation performed less well than the other criteria in discriminating across patients with adjacent severity levels. This is reflected in the percentages of common variance, which was below 30% for these criteria (h² <0.30). The better discriminating criteria were Unstable relationships, Identity problems, and Affective instability, with percentages of common variance of about 60% or higher.

To facilitate interpretation and comparison with previous IRT studies on BPD, we provide both the location parameters (b₁ & b₂) and intercepts (d₁ & d₂) in Table 1. The location parameters for diagnostic threshold scores (b₂) ranged from 0.37 to 1.51, thus covering latent trait scores that were slightly above average compared to those that approach the more severe end on the scale. Affective instability and Chronic emptiness had the smallest location parameters, indicating that these criteria discriminated best at the lower end of the latent BPD severity scale. These criteria also had the largest intercepts, demonstrating that these criteria were more easily assigned than the other criteria, after having controlled for latent BPD severity. Dissociation had the largest location parameters (b₁ = 1.5). Thus, only patients over 1.5 scale units above the average would have a higher than 50% chance of fully meeting this criterion.

Subthreshold Scores

We further investigated whether diagnostic subthreshold scores according to the SCID-II provide additional information in the assessment of BPD severity. In line with a previous study on the antisocial PD criteria (Paap et al., 2020), we anticipated that subthreshold scores only would give limited information in the diagnostic process. In general, the subthreshold category level 2 (i.e., criterion partially met) was used infrequently and did not occur as the dominant assessment for an outpatient anywhere across the BPD scale. As a prototypical example, consider the Graded Response Models’ category characteristic curves for Affective instability in Fig. 3. The curve of the subthreshold category had a low top and is completely “swallowed” by the other two curves for the adjacent categories, such that the subthreshold category level 2 never has a higher probability of being chosen than level 1 (i.e., criterion not present), or level 3 (i.e., threshold fully met).

Differential Item Functioning for Gender

Measurement equivalence of the BPD scale across groups was not supported for gender: LRT(df = 25) = 68, p <.001. DIF was found for three criteria (see Table 3): Fears of abandonment, Self-injurious behavior, and Uncontrolled anger. When accounting for the identified DIF criteria in a new IRT model, the new model (AIC = 62067; BIC = 62307) performed better than the fully constrained model (AIC = 62097; BIC = 62280; LRT(df = 9) = 47.63, p <.001); and equivalently to the unconstrained model (AIC = 62078; BIC = 62419; LRT(df = 16) = 20.54, p =.197), supporting that the new model may provide a good account of potential DIF for gender for Fears of abandonment, Self-injurious behavior, and Uncontrolled anger. Overall, the female outpatient group is expected to score about 0.4 standard deviations higher on the BPD scale (M = 0.40 (0.04), p <.001) and to show slightly more within-group variation in scores (VAR = 1.18 (0.10), p =.019).

Table 3

Differential Item Functioning (DIF) as function of gender and of age of the nine BPD criteria as assessed through SCID-II

		Gender		Age
i	Criterion	LRT(df = 3)	p	LRT(df = 3)	p
1	Fears of abandonment	15.85	0.001*	4.49	0.213
2	Unstable relationships	4.18	0.242	25.38	< 0.001*
3	Identity problems	2.67	0.446	4.19	0.242
4	Impulsivity	8.80	0.032	17.32	0.001*
5	Self-injurious behavior	14.24	0.003*	81.99	< 0.001*
6	Affective instability	0.65	0.884	45.51	< 0.001*
7	Chronic Emptiness	4.07	0.254	2.33	0.507
8	Uncontrolled anger	20.20	< 0.001*	3.51	0.319
9	Dissociation	3.10	0.376	26.46	< 0.001*

Note. LRT is the likelihood ratio test of the constraint model (where all criterion parameters are equal across groups and only mean and variance of the groups differ) compared to the same model with no equality constraints for the criterion listed (i.e., allowing for DIF only for that criterion)

DIF analyses were run separately for Gender (male and female outpatients) and Age (outpatients below or equal 25 year and those above 25 years old)

Significance level is set familywise across the LRT’s at 0.05 / 9 ≈ 0.006. An asterisk * signals that the significance level is met

Table 4 shows that there was uniform DIF for these three criteria, i.e., intercepts were different across gender but not discrimination parameters. For instance, for Uncontrolled anger intercepts, the d₂ parameter (intercept) was 0.56 points larger in male patients, suggesting that male patients with latent BPD severity level at zero were more likely to be assigned this criterion compared to female patients. For Fears of abandonment and Self-injurious behavior, the results were opposite; i.e., smaller intercepts for male outpatients. Thus, male patients were less likely to be assigned this criterion compared to female patients, given the same location on the latent BPD scale.

Table 4

Differential criterion functioning for gender

	Criterion	Parameter	Male	Female	Δ (SE)	p
		a	1.45	1.43	0.02 (0.14)	0.219
1	Fears of abandonment	d₁	-1.15	-0.86	− 0.29 (0.11)	0.002*
		d₂	-2.36	-1.92	− 0.44 (0.15)	0.001*
		a	1.23	1.21	0.02 (0.12)	0.220
5	Self-injurious behavior	d₁	-0.92	-0.60	− 0.33 (0.10)	< 0.001*
		d₂	-1.86	-1.54	− 0.32 (0.12)	0.002*
		a	1.58	1.76	− 0.19 (0.16)	0.060
8	Uncontrolled anger	d₁	-1.20	-1.52	0.32 (0.13)	0.003*
		d₂	-2.24	-2.81	0.56 (0.16)	< 0.001*

Note. Δ is the difference between male and female outpatients for the specific criterion parameter

Significance level is set familywise across parameters within an item at 0.05 / 3 ≈ 0.017

An asterisk * signals that the significance level is met

Differential item Functioning for Age

Measurement equivalence of the BPD scale was not supported for the two different age groups; i.e., outpatients of 25 year or younger versus outpatients older than 25 years (LRT(df = 25) = 187, p <.001). DIF was found for five criteria; i.e., Unstable relationships, Impulsivity, Self-injurious behavior, Affective instability, and Dissociation (see Table 3). When estimating a new model that accounted for the identified DIF criteria, the new model (AIC = 61957; BIC = 62235) performed better than the constrained model (AIC = 62103; BIC = 62287; LRT(df = 15) = 176.77, p <.001;) and performed equally well as the unconstrained model (AIC = 61966; BIC = 62307; LRT(df = 10) = 10.37, p =.409), indicating that the new model provides a good account of potential DIF for gender. Overall, older outpatients are expected to score about 0.4 standard deviations lower on the BPD scale (M = − 0.37 (0.05), p <.001) than younger outpatients, but with a similar amount of within-group variation in scores (VAR = 1.04 (0.10), p =.171).

DIF for age mostly expressed itself in intercept differences, i.e., uniform DIF (see Table 5); with the exception of criterion Affective instability - and to a lesser extent criterion Impulsivity - which had higher discrimination ability in the younger age group. Among the five criteria that displayed DIF, Self-injurious behavior and Affective stability had smaller intercepts in the older age group, for both diagnostic subthreshold and diagnostic threshold scores, indicating that older outpatients who were assigned these criteria had more severe BPD pathology than their younger counterparts. For criteria Unstable relationships, Impulsivity, and Dissociation, the results were opposite, demonstrating that patients in the younger age group had more severe BPD pathology when meeting these criteria. However, for d₂ this effect did not reach significance at the alpha = 0.017 level for criteria Unstable relationships and Impulsivity.

Table 5

Differential criterion functioning for age

	Criterion	Parameter	Age ≤ 25	Age > 25	Δ (SE)	p
		a	2.65	2.50	0.15 (0.24)	0.134
2	Unstable relationships	d₁	-0.27	0.09	− 0.36 (0.12)	0.001*
		d₂	-1.42	-1.21	− 0.22 (0.14)	0.032
		a	1.82	1.61	0.24 (0.15)	0.029
4	Impulsivity	d₁	-0.18	0.08	− 0.25 (0.10)	0.002*
		d₂	-1.43	-1.30	− 0.12 (0.12)	0.073
		a	1.22	1.30	− 0.08 (0.12)	0.123
5	Self-injurious behavior	d₁	0.51	-0.16	0.68 (0.09)	< 0.001*
		d₂	-0.42	-1.13	0.70 (0.09)	< 0.001*
		a	2.89	2.31	0.58 (0.26)	0.006*
6	Affective instability	d₁	1.49	0.82	0.67 (0.14)	< 0.001*
		d₂	0.19	-0.44	0.64 (0.12)	< 0.001*
		a	1.11	1.07	0.04 (0.11)	0.186
9	Dissociation	d₁	-0.55	-0.19	− 0.37 (0.09)	< 0.001*
		d₂	-1.55	-1.27	− 0.28 (0.10)	0.002*

Note. Δ is the difference between outpatients 25 years or younger and those older than 25 for the specific criterion parameter

Significance level is set familywise across parameters within an item at 0.05 / 3 ≈ 0.017

An asterisk * signals that the significance level is met

Discussion

Consistent with our hypotheses, the dimensionality analyses confirmed the unidimensionality of the BPD criteria within this extensive outpatient cohort evaluated through the SCID-II. The primary findings of this study indicated significant differential item functioning (DIF) across gender and age dimensions: Three BPD criteria showed DIF for gender and five BPD criteria showed DIF for age. In line with the conclusions of the review of Bozzatello et al. (2024), we found that women were more likely to endorse the criterion Self-injurious behavior, whereas men were more likely to endorse the criteria Uncontrolled anger and Impulsivity (though not at the alpha < .017 level), given the same level of BPD severity. In contrast to the conclusion of Bozzatello et al. (2024), we did not find any gender differences with respect to Affective instability and Chronic emptiness. However, the latter criterion had poor discriminative ability whereas Affective instability had substantial DIF for age, only surpassed by Self-injurious behavior.

Among the nine BPD criteria, only Identity problems and Chronic emptiness were free from DIF for gender or age. However, Identity problems displayed significantly better discriminative ability. The fact that the Identity criterion was free from DIF aligns with previous research (see Table 2), suggesting that subjective experiences like these can be assessed reliably—at least when structured clinical interviews are conducted by experienced clinicians. Considering that BPD may reflect overall PD severity, these results provide empirical justification for the inclusion of Identity in the Level of Personality Functioning Scale of the DSM-5 Alternative Model for Personality Disorders (APA, 2013) and the general PD severity description in the ICD-11 (WHO, 2019).

Self-injurious behavior stood out as the most problematic criterion from a DIF point of view. Both our study and the study of Hoertel et al. (2014) found that higher levels of BPD pathology were needed before Self-injurious behavior was allocated to male participants. In accordance with McMahon et al. (2019), this criterion also displayed DIF for age in our study, with a large effect size. We might then wonder whether it is advisable to include self-injurious behavior as a severity marker in future diagnostic manuals (WHO, 2019). Self-harm is seen in relation to many mental health disorders and might as such be a more general indicator of mental ill health (Reichl & Kaess, 2021). Thus, in considering general severity of PD, it might be preferable to focus more on aspects reflecting self and interpersonal functioning, such as identity problems, which have demonstrated both good discrimination and a high degree of pathology in our sample.

Unstable relationships, Impulsivity, and Dissociation showed no evidence of differential item functioning (DIF) for gender and exhibited moderate effect sizes in terms of DIF for age. For Impulsivity, this finding was surprising since several previous studies reported significant gender-related DIF for this criterion (Benson et al., 2017; Hoertel et al., 2014; Sharp et al., 2014). Selection bias could partly explain these differences. In our sample, female patients were relatively young compared with male patients, as well as when compared with participants in other studies. Aggen et al. (2009) noted that Differential Item Functioning (DIF) of the Impulsivity criterion in women was less noticeable in the younger age group, which could account for the lesser prominence of DIF for Impulsivity in our sample. It is important to note that we employed a conservative p-value threshold (p <.017) to mitigate the risk of false positives, albeit at the expense of potentially overlooking subtle gender-related differences in Impulsivity. This stringent statistical criterion increases the likelihood of Type II errors, meaning that certain nuanced distinctions in impulsivity between genders may not have been detected.

The finding that Uncontrolled anger required a lower level of BPD severity in men to be rated as present is in line with the findings from several previous studies (Benson et al., 2017; Sharp et al., 2014). This criterion is often viewed as representing the emotional dysregulation aspect of BPD (Weinberg et al., 2011). However, in DSM-IV and DSM-5, this criterion emphasizes behavioral expressions of anger, e.g., “recurrent physical fights,” which is reproduced in the SCID-II. The inclusion of this behavior may lead to undue focus on physical aggression, at the expense of focus on interpersonal aggression. Thus, DIF might be avoided by focusing more on gender-neutral behavior or by inquiring about anger experiences rather than on the behavior itself.

The criterion Fear of being abandoned demonstrated a significantly lower intercept for male patients, indicating that this criterion was less readily assigned to male patients as compared with female patients, given the same level of BPD severity. The description of this criterion in the DSM-5 emphasizes frantic efforts to avoid real or imagined abandonment. Male patients might not readily identify with such “frantic efforts” but still may experience a strong sense of desperation, which is acted out in other ways than by begging, pleading, or threatening. Consequently, clinical assessments may benefit from a greater focus on the internal experiences of loneliness and abandonment in male patients, rather than on overt behaviors.

Hoertel and colleagues (2014) posited that age-related variations in self-injurious behavior might result from maturation in emotional regulation throughout the lifespan. Our results suggest that individuals with less improvement in emotional regulation are prone to experience more severe BPD pathology. This observation may be partially attributable to the interrelationship between affective instability and self-injurious behavior, which both manifested more prominently among older individuals presenting with advanced BPD pathology. Longitudinal studies have shown that Affective instability to be one of the most stable BPD criteria (Skodol et al., 2005). Affective instability is described as one of the core aspects for BPD pathology (Miller & Pilkonis, 2006), and a probable driving force of additional problematic behaviors in BPD (Trull et al., 2008). Affective instability has also been reported as one of the maintaining factors of non-suicidal self-injury in BPD (Reichl & Kaess, 2021). It should also be noted that Affective instability was indicative of underlying BPD pathology at the lower severity levels within our sample. Even in the older cohort, Affective instability was not indicative of severe BPD pathology as compared with the other criteria.

Consistent with the findings of Sharp et al. (2014), our study revealed that the criteria Chronic emptiness and Dissociation had limited discriminative power. These criteria may also have contributed to the relatively poor reliability at both ends of the BPD severity scale since the Emptiness criterion discriminated best at the lower end of the BPD severity scale and Dissociation discriminated best the higher end. Chronic emptiness is a complex feeling state that is experienced in various ways by different individuals, and overlaps somewhat with other expressions of psychopathology, such as depression, narcissistic PD, and schizophrenia spectrum disorders (D’Agostino et al., 2020). A systematic review of Chronic emptiness calls the criterion “under-researched” and the target of several (albeit not unifying) theories, nevertheless finding emptiness to be experienced as a sense of disconnection from self and others (Miller et al., 2020). Troubles with understanding and feeling connected to self and others is also central to personality disorders in general, and to BPD in particular (Bender & Skodol, 2007). As proposed by Price et al. (2019), this suggests that the Emptiness criterion may not be unique to BPD, but rather a broader factor within PDs or psychopathology at large.

In agreement with Al-Shamali et al. (2022), Dissociation was identified as one of the most severe criteria, and also had low discriminating power in our sample. Thus, on the one hand, these results suggest that the Dissociation criterion is relatively rare, particularly in less severe cases; on the other hand, they indicate that this criterion is relatively common among patients without BPD. Dissociation has been described as “ubiquitous” and is part of many different psychiatric disorders (Lyssenko et al., 2018); hence, it is difficult to delineate dissociation as a separate psychiatric disorder. As its name implies, it is the core element of dissociative disorders, and also plays a role in acute stress disorder, PTSD, schizophrenia, eating disorders, panic disorders, affective disorders, and OCD (Lyssenko et al., 2018; Spitzer et al., 2006). According to the review of Al-Shamali et al. (2022), treatment effectiveness may be diminished for patients with BPD in the presence of dissociative symptoms. This conclusion may partly explain our finding that dissociative symptoms are associated with more severe BPD in younger patients, i.e., selection bias. It is conceivable that therapists encountering young patients with severe BPD and pronounced dissociative symptoms, may find these cases particularly challenging and, as a result, are more likely to refer these patients to specialized treatment.

Echoing findings from our prior research on antisocial PD (Paap et al., 2020), which shared a similar methodology, we observed that the diagnostic subthreshold scores of the SCID-II offered limited diagnostic information, raising questions about their utility in the diagnostic process. One possible explanation is that clinicians use these scores infrequently. Perhaps, they paid relatively little attention to these subthreshold scores, since these scores do not count when designating a BPD diagnosis. The latter may amount to an argument convincing enough to discard subthreshold scores altogether in future diagnostic interviews. On the other hand, it could also be argued that subthreshold scores should be given a larger role in the diagnostic process, provided that clinicians are better trained in how to use these scores.

It is important to note that the clinicians involved in this study were highly trained in assessing PDs and worked regularly with these patients. Thus, generalizability of our findings to general clinical settings might be somewhat limited. In less specialized environments, DIF for gender and age could be even more pronounced. However, we cannot entirely rule out assessment bias in our study. The clinicians in this network may have had a stronger interest in BPD than the average clinician, and potentially held a more favorable attitude toward these patients. Given that impulsivity does not inherently carry a negative connotation, one interpretation is that it might be considered a ‘preferred criterion’ for female patients, reflecting a leniency bias (Podsakoff et al., 2012). Another potential bias that might have affected our clinicians is ‘implicit theory bias’—the tendency to hold implicit beliefs about the relationships between BPD criteria, leading to illusory correlations (Podsakoff et al., 2012). This may have been amplified by the sequential assessment of the ten PD categories according to the structure of the SCID-II. As a result, clinicians might have been inclined to rate criteria more similarly within a given PD diagnosis, which could have inflated the coherence of BPD symptoms.

A significant limitation of this study is our inability to provide detailed information on the interrater reliability within our sample. However, a video-based interrater reliability study conducted by the Network for Personality Disorders involved the rescreening of 24 patients, including 11 with BPD, by an experienced clinician (Arnevik et al., 2009)^a. The study reported a kappa value of 0.66 for BPD, suggesting a level of agreement that could be deemed satisfactory. Another limitation worth mentioning is that we did not directly investigate other PD criteria that might be part of general PD severity (Paap et al., 2022; Sharp et al., 2015).

Therefore, caution is advised when extrapolating our findings to broader assessments of general PD severity. However, by focusing on the BPD criteria only, comparison with previous studies is facilitated and generalizability increased.

In sum, consistent with previous research, we observed a notable tendency for the behavior-oriented criteria (Impulsivity, Self-injurious behavior, and Uncontrolled anger) to display DIF, most pronounced for Self-injurious behavior. While some issues might be addressed by shifting the focus towards internal experiences rather than overt behavior, this approach is not feasible for Self-injurious behavior due to its inherent orientation towards observable actions. Consequently, we suggest that the utility of Self-injurious behavior as a diagnostic criterion for general severity is limited. It should also be considered to abandon Dissociation and Chronic emptiness since these criteria were relatively poor indicators of the BPD construct.

Among the other BPD criteria, Unstable relationships, Identity problems, and Affective instability had the best psychometric properties, as indicated by relatively large discrimination parameters and no DIF for gender. The negative consequences of DIF for Fears of abandonment might be avoided by focusing more on internal experiences related to abandonment fears rather than observable behaviors, i.e., “Frantic efforts to avoid real or imagined abandonment”. Particularly noteworthy is the absence of DIF for age in Identity problems, which is promising, since this is a criterion that is central to assessing general personality pathology.¹

Based on our findings and a synthesis of prior research, caution is advised when relying on behaviorally focused criteria, especially Self-injurious behavior, to diagnose BPD. Given that the concept of general severity of personality pathology, central to contemporary diagnostic approaches, overlaps significantly with the criteria for BPD, our cautionary note extends to the operationalization and assessment of general severity as well. We suggest that criteria assessing self-pathology and interpersonal dysfunction may be better indicators of overall PD severity than those emphasizing overt behavior.

Acknowledgements

The authors wish to thank the patients and staff from the following units of the Norwegian Network for Personality Disorders for their contribution to this study: Unit for Group Therapy, Øvre Romerike District Psychiatric Center (DPC), Akershus University Hospital, Jessheim; Group Therapy Unit, Nedre Romerike DPC, Akershus University Hospital, Lillestrøm; Group Therapy Unit, Follo DPC, Akershus University Hospital, Ski; Group Therapy Unit, Groruddalen DPC, Akershus University Hospital, Oslo; Clinic for Personality Disorders, Outpatient Clinic for Specialized Treatment of Personality Disorders, Section for Personality Psychiatry and Specialized Treatments, Oslo University Hospital, Oslo; Group Therapy Unit, Lovisenberg DPC, Lovisenberg Hospital, Oslo; Group Therapy Team, Vinderen DPC, Diakonhjemmet Hospital, Oslo; Unit of Personality Psychiatry, Vestfold DPC, Sandefjord; Unit for Intensive Group Therapy, Aust-Agder DPC, Sørlandet Hospital, Arendal; Unit for Group Therapy, DPC, Strømme, Sørlandet Hospital, Kristiansand; Group Therapy Unit, Stavanger DPC, Stavanger University Hospital, Stavanger; Section for Group Treatment, Kronstad DPC, Haukeland University Hospital, Bergen; MBT Team, Department of Substance Abuse Medicine, Haukeland Universitetssjukehus, Bergen; Group Therapy Unit, Psychiatric Outpatient Clinic, Ålesund, and MBT Team, Outpatient Clinic, Rogaland A - center, Stavanger.

Declarations

Ethical Approval

All participating patients from each treatment unit have given written informed consent to use anonymous, clinical data for research purposes. Anonymized data from each treatment unit was collected and transferred to a common research database. The collection procedures were approved by local data protection officer for each contributing unit. Data security procedures for the research database were approved by data protection officer at the responsible center for the research. Since the data are anonymous, formal approval from the Norwegian State Data Inspectorate and Regional Committee for Medical Research and Ethics is not required.

Conflict of Interest

We have no conflicts of interests to disclose.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

vorige artikel Estudio De La Vida Bajo Estres: Methodological Overview and Baseline Data Analysis of a Case-Control Investigation of Risk and Resiliency Factors for Traumatic Stress in Colombia

volgende artikel Harmonization of SDQ and ASEBA Phenotypes: Measurement Variance Across Cohorts

Onze productaanbevelingen

BSL Psychologie Totaal

Met BSL Psychologie Totaal blijf je als professional steeds op de hoogte van de nieuwste ontwikkelingen binnen jouw vak. Met het online abonnement heb je toegang tot een groot aantal boeken, protocollen, vaktijdschriften en e-learnings op het gebied van psychologie en psychiatrie. Zo kun je op je gemak en wanneer het jou het beste uitkomt verdiepen in jouw vakgebied.

Meer informatie

BSL Academy Accare GGZ collective

Meer informatie

BSL GOP_opleiding GZ-psycholoog

Meer informatie

Note that the number of patients with BPD was not listed in the paper by Arnevik et al. (2009). This number was provided by the principal investigator Theresa Wilberg.

Aggen, S., Neale, M., Røysamb, E., Reichborn-Kjennerud, T., & Kendler, K. (2009). A psychometric evaluation of the DSM-IV borderline personality disorder criteria: Age and sex moderation of criterion functioning. Psychological Medicine, 39(12), 1967–1978.PubMedPubMedCentral

Al-Shamali, H. F., Winkler, O., Talarico, F., Greenshaw, A. J., Forner, C., Zhang, Y., Vermetten, E., & Burback, L. (2022). A systematic scoping review of dissociation in borderline personality disorder and implications for research and clinical practice: Exploring the fog. Australian & New Zealand Journal of Psychiatry, 56, 1252–1264. https://doi.org/10.1177/00048674221077029CrossRef

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders, 4th edition: DSM-IV. American Psychiatric Association.

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders, 5th edition: DSM-5. American Psychiatric Association.

Arnevik, E., Wilberg, T., Urnes, O., Johansen, M., MonsenJ.T., & Karterud, S. (2009). Psychotherapy for personality disorders: Short-term day hospital psychotherapy versus outpatient individual therapy - a randomized controlled study. European Psychiatry: The Journal of the Association of European Psychiatrists, 24(2), 71–78. https://doi.org/10.1016/j.eurpsy.2008.09.004CrossRefPubMed

Bender, D. S., & Skodol, A. E. (2007). Borderline personality as a self-other representational disturbance. Journal of Personality Disorders, 21(5), 500–517.PubMed

Benson, K. T., Donnellan, M. B., & Morey, L. C. (2017). Gender-related differential item functioning in DSM-IV/DSM-5-III (alternative model) diagnostic criteria for borderline personality disorder. Personality Disorders: Theory Research and Treatment, 8(1), 87. https://doi.org/10.1037/per0000166CrossRef

Bozzatello, P., Blua, C., Brasso, C., Rocca, P., & Bellino, S. (2024). Gender differences in borderline personality disorder: A narrative review. Frontiers in Psychiatry, 15, 1320546.PubMedPubMedCentral

Braeken, J., & Van Assen, M. A. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450.PubMed

Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29.

Clifton, A., & Pilkonis, P. A. (2007). Evidence for a single latent class of Diagnostic and Statistical Manual of Mental disorders borderline personality pathology. Comprehensive Psychiatry, 48(1), 70–78.PubMed

D’Agostino, A., Pepi, R., Monti, M. R., & Starcevic, V. (2020). The feeling of emptiness: A review of a complex subjective experience. Harvard Review of Psychiatry, 28(5), 287–295.PubMed

Eaton, N. R., & Greene, A. L. (2018). Personality disorders: Community prevalence and socio-demographic correlates. Current Opinion in Psychology, 21, 28–32.PubMed

Ellison, W. D., Rosenstein, L. K., Morgan, T. A., & Zimmerman, M. (2018). Community and clinical epidemiology of borderline personality disorder. Psychiatric Clinics, 41(4), 561–573.PubMed

First, M. B. (1994). Structured clinical interview for DSM-IV Axis II personality disorders (SCID II). New York State Psychiatric Institute.

Gunderson, J. G., Herpertz, S. C., Skodol, A. E., Torgersen, S., & Zanarini, M. C. (2018). Borderline personality disorder. Nature Reviews Disease Primers, 4(1), 1–20.

Hoertel, N., Peyre, H., Wall, M. M., Limosin, F., & Blanco, C. (2014). Examining sex differences in DSM-IV borderline personality disorder symptom expression using item response theory (IRT). Journal of Psychiatric Research, 59, 213–219. https://doi.org/10.1016/j.jpsychires.2018.12.019CrossRefPubMed

Jane, J. S., Oltmanns, T. F., South, S. C., & Turkheimer, E. (2007). Gender bias in diagnostic criteria for personality disorders: An item response theory analysis. Journal of Abnormal Psychology, 116(1), 166. https://doi.org/10.1037/0021-843X.116.1.166CrossRefPubMedPubMedCentral

Johansen, M., Karterud, S., Pedersen, G., Gude, T., & Falkum, E. (2004). An investigation of the prototype validity of the borderline DSM-IV construct. Acta Psychiatrica Scandinavica, 109(4), 289–298.PubMed

Leichsenring, F., Fonagy, P., Heim, N., Kernberg, O. F., Leweke, F., Luyten, P., Salzer, S., Spitzer, C., & Steinert, C. (2024). Borderline personality disorder: A comprehensive review of diagnosis and clinical presentation, etiology, treatment, and current controversies. World Psychiatry, 23(1), 4–25.PubMedPubMedCentral

Lyssenko, L., Schmahl, C., Bockhacker, L., Vonderlin, R., Bohus, M., & Kleindienst, N. (2018). Dissociation in psychiatric disorders: A meta-analysis of studies using the dissociative experiences scale. American Journal of Psychiatry, 175(1), 37–46.PubMed

Maydeu-Olivares A. & Joe H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavior Research, 49(4), 305–328.

McMahon, K., Hoertel, N., Peyre, H., Blanco, C., Fang, C., & Limosin, F. (2019). Age differences in DSM-IV borderline personality disorder symptom expression: Results from a national study using item response theory (IRT). Journal of Psychiatric Research, 110, 16–23. https://doi.org/10.1016/j.jpsychires.2018.12.019CrossRefPubMed

Miller, J. D., & Pilkonis, P. A. (2006). Neuroticism and affective instability: The same or different? American Journal of Psychiatry, 163(5), 839–845.PubMed

Miller, C. E., Townsend, M. L., Day, N. J., & Grenyer, B. F. (2020). Measuring the shadows: A systematic review of chronic emptiness in borderline personality disorder. Plos One, 15(7), e0233970.

Millsap, R. E. (2012). Statistical approaches to measurement invariance. Routledge.

Mokken, R. J. (1971). A theory and procedure of scale analysis: With applications in political research (Vol. 1). Walter de Gruyter.

Paap, M. C. S., Braeken, J., Urnes, O., Karterud, S., Wilberg, T., Pedersen, G., & Hummelen, B. (2020). A psychometric evaluation of the DSM-IV Criteria for Antisocial personality disorder: Dimensionality, local reliability, and Differential Item Functioning Across gender. Assessment, 13(1). https://doi.org/10.1177/1073191117745126

Paap, M. C. S., Pedersen, G., Selvik, S. G., Frans, N., Wilberg, T., & Hummelen, B. (2022). More is more: evidence for the incremental value of the SCID-II/SCID-5-PD specific factors over and above a general PD factor. Personality Disorders: Theory, Research, & Treatment, 13(2), 108. https://doi.org/10.1037/per0000426

Pedersen, G., Karterud, S., Hummelen, B., & Wilberg, T. (2013). The impact of extended longitudinal observation on the assessment of personality disorders. Personality and Mental Health, 7(4), 277–287. https://doi.org/10.1002/pmh.1234CrossRefPubMed

Pedersen, G., Wilberg, T., Hummelen, B., & Hartveit Kvarstein, E. (2023). The Norwegian network for personality disorders–development, contributions and challenges through 30 years. Nordic Journal of Psychiatry, 77(5), 512–520.PubMed

Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569.

Price, A. L., Mahler, H., & Hopwood, C. (2019). Subjective emptiness: A clinically significant trans-diagnostic psychopathology construct. https://doi.org/10.31235/osf.io/f2x6r

Reichl, C., & Kaess, M. (2021). Self-harm in the context of borderline personality disorder. Current Opinion in Psychology, 37, 139–144.PubMed

Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27–48.PubMed

Samejima, F. (1997). Graded response model. Handbook of modern item response theory (pp. 85–100). Springer.

Sharp, C., Ha, C., Michonski, J., Venta, A., & Carbone, C. (2012). Borderline personality disorder in adolescents: Evidence in support of the Childhood Interview for DSM-IV Borderline Personality Disorder in a sample of adolescent inpatients. Comprehensive Psychiatry, 53(6), 765–774. https://doi.org/10.1016/j.comppsych.2011.12.003CrossRefPubMed

Sharp, C., Michonski, J., Steinberg, L., Fowler, J. C., Frueh, B. C., & Oldham, J. M. (2014). An investigation of differential item functioning across gender of BPD criteria. Journal of Abnormal Psychology, 123(1), 231.PubMed

Sharp, C., Wright, A. G., Fowler, J. C., Frueh, B. C., Allen, J. G., Oldham, J., & Clark, L. A. (2015). The structure of personality pathology: Both general (‘g’) and specific (‘s’) factors? Journal of Abnormal Psychology, 124(2), 387.PubMed

Sharp, C., Steinberg, L., Michonski, J., Kalpakci, A., Fowler, C., Frueh, B. C., & Fonagy, P. (2019). DSM borderline criterion function across age-groups: A cross-sectional mixed-method study. Assessment, 26(6), 1014–1029.PubMed

Sheehan, D. V., Lecrubier, Y., Janavs, J., Knapp, E., Weiller, E., & Bonora, L. I. (1994). Mini International Neuropsychiatric Interview (MINI). Tampa, Florida and Paris, France: University of South Florida Institutt for Research in Psychiatry and INSERM-Hôpital de la Salpétrière.

Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). Sage.

Skodol, A. E., Gunderson, J. G., Shea, M. T., McGlashan, T. H., Morey, L. C., Sanislow, C. A., Bender, D. S., Grilo, C. M., Zanarini, M. C., Yen, S., Pagano, M. E., & Stout, R. L. (2005). The collaborative Longitudinal Personality disorders Study (CLPS): Overview and implications. Journal of Personality Disorders, 19(5), 487–504.PubMedPubMedCentral

Spitzer, R. L. (1983). Psychiatric diagnosis: Are clinicians still necessary? Comprehensive Psychiatry, 24(5), 399–411.PubMed

Spitzer, C., Barnow, S., Freyberger, H. J., & Grabe, H. J. (2006). Recent developments in the theory of dissociation. World Psychiatry, 5(2), 82.PubMedPubMedCentral

R CoreTeam. (2022). R: A language and environment for statistical computing [Computer software manual]. Retrieved from https://www.R-project.org/

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. HollandW. H. (Ed.), Differential item functioning. Lawrence Erlbaum.

Trull, T. J., Solhan, M. B., Tragesser, S. L., Jahng, S., Wood, P. K., Piasecki, T. M., & Watson, D. (2008). Affective instability: Measuring a core feature of borderline personality disorder with ecological momentary assessment. Journal of Abnormal Psychology, 117(3), 647–661. https://doi.org/10.1037/a0012532CrossRefPubMed

Trull, T. J., Jahng, S., Tomko, R. L., Wood, P. K., & Sher, K. J. (2010). Revised NESARC personality disorder diagnoses: Gender, prevalence, and comorbidity with substance dependence disorders. Journal of Personality Disorders, 24(4), 412–426.PubMedPubMedCentral

Weinberg, I., Ronningstam, E., Goldblatt, M. J., Schechter, M., & Maltsberger, J. T. (2011). Common factors in empirically supported treatments of borderline personality disorder. Current Psychiatry Reports, 13, 60–68.PubMed

World Health Organization. (2019). International statistical classification of diseases and related health problems (11th ed.). https://icd.who.int/

Titel: Differential Item Functioning for Gender and Age of the DSM-IV Borderline Personality Disorder Criteria in a Large Clinical Sample
Auteurs: Benjamin Hummelen
Tuva Langjord
Muirne C.S. Paap
Espen Jan Folmo
Geir Pedersen
Johan Braeken
Publicatiedatum: 01-03-2025
Uitgeverij: Springer US
Gepubliceerd in: Journal of Psychopathology and Behavioral Assessment / Uitgave 1/2025
Print ISSN: 0882-2689
Elektronisch ISSN: 1573-3505
DOI: https://doi.org/10.1007/s10862-024-10183-8

Andere artikelen Uitgave 1/2025

Differences in Associations Between Autonomic Nervous System Activity and Psychopathic Traits Across Stress Paradigms and Measures

Multicomponent Multimethod Assessment of Emotional Change in Psychotherapy Research: Initial Validation of a Neurobehavioral Paradigm

Open Access

Brain Network Analysis of Cognitive Reappraisal and Expression Inhibition Based on Graph Theory Analysis

Psychometric Evaluation of a Caregiver- and Self-Report Youth Adaptation of the Overall Depression Severity and Impairment Scale (ODSIS)

Development and Evaluation of the Religious and Spiritual Struggles Scale-5 (RSS-5)

Open Access

A Behavioral Paradigm to Assess for Change in Difficulty Discarding Clutter in the Home: Psychometric Properties of the Hoarding Behavioral Approach Task (H-BAT)

Bohn Stafleu van Loghum

Welkom bij Scalda & Bohn Stafleu van Loghum

Registreer

Login

Differential Item Functioning for Gender and Age of the DSM-IV Borderline Personality Disorder Criteria in a Large Clinical Sample

Abstract

Publisher’s Note

Introduction

Methods

Sample Characteristics and Missing Data

Statistical Analyses

Dimensionality

Item Response Model

Differential item Functioning Across Gender and Age

Missing BPD Criteria

Statistical Software

Results

Dimensionality of the BPD Criteria

Measurement Precision

Item Parameters and Category Thresholds

Subthreshold Scores

Differential Item Functioning for Gender

Differential item Functioning for Age

Discussion

Acknowledgements

Declarations

Ethical Approval

Conflict of Interest

Publisher’s Note

Onze productaanbevelingen

BSL Psychologie Totaal

BSL Academy Accare GGZ collective

BSL GOP_opleiding GZ-psycholoog

Andere artikelen Uitgave 1/2025

Differences in Associations Between Autonomic Nervous System Activity and Psychopathic Traits Across Stress Paradigms and Measures

Multicomponent Multimethod Assessment of Emotional Change in Psychotherapy Research: Initial Validation of a Neurobehavioral Paradigm

Brain Network Analysis of Cognitive Reappraisal and Expression Inhibition Based on Graph Theory Analysis

Psychometric Evaluation of a Caregiver- and Self-Report Youth Adaptation of the Overall Depression Severity and Impairment Scale (ODSIS)

Development and Evaluation of the Religious and Spiritual Struggles Scale-5 (RSS-5)

A Behavioral Paradigm to Assess for Change in Difficulty Discarding Clutter in the Home: Psychometric Properties of the Hoarding Behavioral Approach Task (H-BAT)

Bohn Stafleu van Loghum

Welkom bij Scalda & Bohn Stafleu van Loghum

Registreer

Login

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Publisher’s Note

Introduction

Methods

Sample Characteristics and Missing Data

Statistical Analyses

Dimensionality

Item Response Model

Differential item Functioning Across Gender and Age

Missing BPD Criteria

Statistical Software

Results

Dimensionality of the BPD Criteria

Measurement Precision

Item Parameters and Category Thresholds

Subthreshold Scores

Differential Item Functioning for Gender

Differential item Functioning for Age

Discussion

Acknowledgements

Declarations

Ethical Approval

Conflict of Interest

Publisher’s Note

Deel dit onderdeel of sectie (kopieer de link)

Onze productaanbevelingen

BSL Psychologie Totaal

BSL Academy Accare GGZ collective

BSL GOP_opleiding GZ-psycholoog

Differences in Associations Between Autonomic Nervous System Activity and Psychopathic Traits Across Stress Paradigms and Measures

Multicomponent Multimethod Assessment of Emotional Change in Psychotherapy Research: Initial Validation of a Neurobehavioral Paradigm

Brain Network Analysis of Cognitive Reappraisal and Expression Inhibition Based on Graph Theory Analysis

Psychometric Evaluation of a Caregiver- and Self-Report Youth Adaptation of the Overall Depression Severity and Impairment Scale (ODSIS)

Development and Evaluation of the Religious and Spiritual Struggles Scale-5 (RSS-5)

A Behavioral Paradigm to Assess for Change in Difficulty Discarding Clutter in the Home: Psychometric Properties of the Hoarding Behavioral Approach Task (H-BAT)