Plain English summary
Psychological birth trauma (PBT) occurs frequently in maternal women all over the world, through a maternal-centered ripple effect, which has a wide and far-reaching negative impact on the health of the mother, her infant, her partner, family relationships, and future reproductive decision-making, as well as the increased utilization of health care resources. While numerous scales have been developed to measure PBT, there is currently no consensus on which instrument is most suitable to measure PBT in puerperae. We conducted a systematic review following the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology to summarize and review the psychometric properties of available patient-reported outcome measures (PROMs) of PBT, aimed to guide healthcare providers about appropriate selection of a PROM for assessing maternal PBT in postpartum women. This systematic review recommends the City Birth Trauma Scale as the preferred assessment tool for measuring PBT in clinical practice and research involving puerperae. However, further comprehensive studies are needed to better understand and improve the psychometric properties of all scales.
Introduction
Maternal health is crucial to public health, with growing emphasis on psychological well-being alongside physical health. Although childbirth is a natural process for most women, it can be physically and psychologically traumatic for some [
1]. Approximately 10% to 20% of women report traumatic childbirth, leading to significant emotional distress and long-term negative mental health outcomes [
2,
3]. While Psychological Birth Trauma (PBT) lacks a universally accepted definition, Walker and Avant’s concept analysis identified four key elements: subjective emotions, distressing experiences, trauma during the birth process, and the persistence of these effects into the postnatal period [
4]. Reported incidences of PBT range from 10 to 44% [
5,
6]. Beck et al. [
7] highlighted the "ripple effect" of childbirth trauma, which affects not only the mother but also the mother-infant bond [
8], breastfeeding [
9], marital relationships [
10], future reproductive choices [
6], partners' mental health [
11], and healthcare service utilization [
12]. Studies estimate that around 19% of women develop PTSD after traumatic childbirth [
13], a rate significantly higher than the 4% in the general postpartum population [
14]. Thus, recognising and understanding PBT is essential for ensuring appropriate support and interventions.
Patient-reported outcome measures (PROMs) are tools used to evaluate patients' health, health-related quality of life, and other relevant constructs. PROMs have proven effective in capturing patients' subjective health perceptions, especially when direct observations are challenging or time is limited in clinical settings [
15]. By improving communication between patients and physicians and informing clinical decision-making, PROMs play a crucial role in improving care quality and health outcomes [
16].
Currently, several PROMs are available for assessing maternal Psychological Birth Trauma (PBT), including the City Birth Trauma Scale (City BiTS) [
17], Perinatal Posttraumatic Stress Disorder Questionnaire-Revised (PPQ-R) [
18], Childbirth Trauma Index (CTI) [
19], Childbirth Trauma Index-Revised (CTI-R) [
20], the Post-Traumatic Stress Disorder Checklist for DSM-5 (PCL-5) [
21], Posttraumatic Diagnostic Scale (PDS) [
22], Birth Trauma Perception Scale for Women During Vaginal Delivery (BTPS-WVD)[
23], Psychological Childbirth Trauma Assessment Scale (PCTAS) [
24], Impact of Event Scale-Revised (IES-R) [
25], Maternal Childbirth Trauma Scale (MCTS)[
26], and Psychological Birth Trauma Assessment Scale (PBTAS) [
27]. Despite the availability of these tools, prior studies have not rigorously evaluated their measurement properties or offered recommendations for their selection [
28,
29]. The Consensus-based Standards for health Measurement instruments (COSMIN) methodology offers guidelines for systematically reviewing the methodological quality and psychometric properties of PROMs [
30]. These guidelines assist researchers and clinicians in choosing the most appropriate PROMs for both research and clinical practice. This systematic review used the COSMIN methodology to synthesise studies on self-report instruments assessing PBT from the perspectives of postpartum women. It assessed the psychometric properties and methodological quality of these instruments to provide evidence-based, transparent recommendations for the use of PROMs.
Methods
Design
We conducted a systematic review in accordance with the COSMIN guidelines and the manual for systematic reviews of PROMs [
30], following the PRISMA-COSMIN reporting guidelines for Outcome Measurement Instruments [
31]. The protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42024425406).
Search strategy
A systematic search was performed across eight databases: PubMed, Web of Science, Embase, CINAHL, PsycINFO, China National Knowledge Infrastructure (CNKI), Wanfang, and the VIP Database for Chinese Technical Periodicals. The search included studies published from the inception of each database up to May 21, 2024, and the references of the included studies were traced. The search strategy combined MeSH terms, entry terms, and a filter developed by Terwee et al.[
32] to identify studies related to measurement properties. Detailed search strategies for all databases are provided in Supplementary Information S1.
Eligibility criteria
The inclusion criteria were: (I) studies on the development or validation of self-administered instruments for assessing PBT; (II) studies involving puerperal or postpartum women; and (III) reporting at least one measurement property of the instrument. The exclusion criteria were: (I) studies using the PROM solely as an outcome measure (e.g., in randomized controlled trials or in the validation of another instrument); (II) studies not published in English or Chinese; (III) secondary literature, including books, conference papers, reviews, systematic reviews, or meta-analyses; and (IV) duplicate publications.
Study selection and data extraction
Two researchers, trained in evidence-based methodologies and familiar with COSMIN guidelines, independently performed the literature screening and data extraction. EndNote X9 was initially used to remove duplicates. Titles and abstracts were screened to identify relevant studies, followed by a full-text review for further assessment. Reasons for exclusion were documented. For eligible studies, data were independently extracted using a standardized form, and accuracy and completeness were verified. Discrepancies were resolved through discussion with a third researcher. Extracted data included the first author, publication year, PROM name, country and language of the research, target population age, sample size, number of items, scoring method, and total score.
Data analysis
According to the COSMIN guidelines (
http://www.cosmin.nl), the evaluation included a methodological quality rating, assessment of measurement properties, evidence synthesis, and evidence grading.
Assessment of the methodological quality
The COSMIN Risk of Bias checklist [
33] consists of ten sections covering standards for PROM development and nine measurement properties: content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. The checklist comprises 116 items, with reliability and measurement error assessed using the 2021 updated version [
34]. Each item was rated as "very good (V)", "adequate (A)", "doubtful (D)", "inadequate (I)", or "not applicable (NA)". The overall methodological quality rating for each study was determined by the lowest score within the box. For instance, if the lowest rating in the structural validity box was "inadequate," the overall structural validity rating for that study would be "inadequate."
Evaluation of psychometric properties
PROM measurement attributes were evaluated using the COSMIN criteria [
35], which included content validity, construct validity, internal consistency, stability, measurement error, hypotheses testing for construct validity, cross-cultural validity or measurement invariance, criterion validity, and responsiveness. Content validity was assessed following the COSMIN methodology, focusing on three aspects: relevance, comprehensiveness, and comprehensibility [
36,
37]. Each item was rated as "sufficient ( +)", "insufficient (−)", or "indeterminate (?)". The detailed criteria for good measurement properties are provided in Supplementary Information S2. For construct validity, the review team defined a priori hypotheses for sufficient measurement properties, which are presented in Supplementary Information S3.
Summarizing the evidence and grading the quality of the evidence
When summarizing evidence for each measurement property of a PROM, if all studies yield results of either " + " or "−", the overall rating will be either " + " or "−". In cases of inconsistent results, the review team must select the most appropriate strategy. Possible strategies include: explaining and summarizing results by subgroup; not summarizing and rating the results as " ± " without grading the evidence; or rating based on 75% consistency while downgrading for inconsistencies. The quality of evidence was assessed using the modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach for systematic reviews of clinical trials in this study [
35]. The default assumption is that the overall result is of high quality; however, it can be downgraded by one or two levels based on risk of bias (study quality), inconsistency (variation in results), imprecision (due to small sample sizes), or indirectness (evidence from different populations, interventions, or outcomes). Details of the modified GRADE approach are provided in Supplementary Information S4, with quality levels defined as "high," "moderate," "low," and "very low" in Supplementary Information S5. PROM recommendations are categorized as follows: if the scale’s content validity is "sufficient ( +)" and the evidence for internal consistency is at least "low," it receives a Class A recommendation. A scale with "high quality" evidence of "insufficient (−)" content validity or other psychometric properties is not recommended and receives a Class C recommendation. PROMs not classified as A or C are designated as Class B. The COSMIN Recommended Criteria for Selecting the Most Suitable PROMs are detailed in Supplementary Information S6.
Discussion
Principal findings
This systematic review analyzed thirty-three studies on the psychometric properties of twelve tools for assessing maternal PBT. None fully adhered to COSMIN guidelines, with gaps in the assessment of measurement error and responsiveness. Although no PROMs received Grade A recommendations, the City BiTS emerged as the most evaluated and comprehensive tool, with moderate-quality evidence supporting its content validity, making it a provisional recommendation. Eight other Class B tools showed potential but require further research to comprehensively evaluate their measurement properties. The PCL and PDS, initially developed for non-postpartum populations, present limitations in specificity. The PPQ and PPQ-R also require further research due to uncertain content validity. The CTI-R, BTPS-WVD, PCTAS, and MCTS were each tested in only one study, with the latter three validated solely in Chinese populations, highlighting the need for further research on global applicability. The CTI, IES-R, and PBTAS received Class C recommendations based on high-quality evidence of insufficient measurement properties. Overall, the City BiTS stands out as the most robust and reliable tool for clinicians to assess maternal PBT.
Psychometric properties
Although several scales exist for the explicit measurement of PBT, most PROMs included in this review lack comprehensive evidence regarding a range of psychometric properties. Content validity is widely considered the most important property of a scale, as it assesses how well a PROM reflects the construct it is designed to measure [
65]. In this systematic review, most studies focused on expert consultation for correlation and evaluation, often neglecting patient perspectives. As Selby and Velikova [
66] highlight, public involvement is essential to both the design and implementation of PROMs. Patient perspectives are crucial to understanding the impact of childbirth on postpartum mental health. Future studies should integrate rigorous qualitative interviews with quantitative surveys to more effectively incorporate both patient and expert input, thereby enhancing the content validity of PROMs.
The most commonly reported psychometric property was internal consistency. According to COSMIN guidelines, assessing internal consistency requires at least low evidence of "sufficient" structural validity, which can be derived from various studies [
30]. One-third of the included studies primarily utilized exploratory factor analysis (EFA) rather than confirmatory factor analysis (CFA) to evaluate structural validity. The limited use of CFA resulted in inadequate structural validity, which consequently affected the assessment of internal consistency. Reliability reflects the stability of the PROM over time, it was assessed in nearly half of the studies included in this review. However, none of these studies provided a rationale for the chosen test–retest interval, introducing potential bias in the reliability validity. The COSMIN guidelines suggest a retest interval of 2 weeks, as shorter or longer intervals may lead to either an overestimation or underestimation of reliability [
67]. Furthermore, when reporting quantitative reliability outcomes, it is recommended to present intraclass correlation coefficients (ICCs) instead of correlation coefficients.
A PROM developed in a specific context may not be suitable for application in a different setting. Therefore, the same PROM should be used when directly comparing populations from diverse regions and linguistic backgrounds. In this review, cross-cultural validity was evaluated using differential item functioning (DIF) based on item response theory (IRT) [
68] in only one study[
47], while other studies relied solely on simple translations of the PROMs, neglecting the critical process of cross-cultural adaptation. As a result, the measurement equivalence of these scales across varying cultures, regions, and socioeconomic groups remains uncertain.
Criterion validity, which refers to the extent to which a scale score reflects the gold standard, was assessed using various structured clinical interviews [
69]. Two studies [
22,
52] used the Structured Clinical Interview for DSM-5 Disorders-Clinical Version (SCID-5-CV) [
64] as the gold standard, while two others [
42,
58] employed the Clinician-Administered PTSD Scale (CAPS) [
70]. In some studies, widely used scales were incorrectly treated as the gold standard, leading to confusion with hypotheses testing for construct validity. According to COSMIN guidelines [
30], comparisons between newly developed scales and existing ones should follow the "hypotheses testing for construct validity" procedure. Therefore, to properly assess the criterion validity of a PROM, clinical interviews should be used instead of other commonly used scales.
None of the twelve assessment tools included in the review reported measurement error or responsiveness. None of the twelve assessment tools included in the review reported measurement error or responsiveness. Responsiveness, which reflects a scale's sensitivity to change, is essential for evaluating the effectiveness of clinical interventions aimed at improving health outcomes [
71]. Therefore, future research should adopt longitudinal or experimental study designs to comprehensively assess PROM responsiveness.
Implications for clinical practice and future directions
PBT has significant negative effects on women’s mental health, maternal role adjustment, subsequent reproductive experiences, and healthcare utilization. Early identification and intervention are therefore essential. PROMs can improve outcome monitoring and provide valuable supplementary information alongside clinical data [
72]. Additionally, PROMs have the potential to enhance care and health outcomes at individual, institutional, and population levels [
73]. In light of this, this review adheres to COSMIN guidelines and provisionally recommends the City BiTS scale for use. Comprising 29 items across two dimensions, the City BiTS is noted for its short completion time, clinical practicality, and ease of use, which help reduce survey fatigue. Furthermore, the City BiTS demonstrates strong psychometric properties across 15 language versions, making it well-suited for global epidemiological studies on postpartum PBT.
The growing use of electronic questionnaires in medicine, driven by technological advancements, has streamlined data collection, reducing both labor and time costs. Electronic Patient-Reported Outcome Measures (ePROMs) minimize errors and facilitate complex survey management with user-friendly statistical displays. Future research should prioritize evaluating the equivalence between electronic and traditional paper questionnaires [
74]. In clinical settings, integrating electronic questionnaires with electronic health records or developing dedicated applications can improve the routine assessment of postpartum trauma (PBT), providing timely and accessible information for healthcare providers and enabling prompt diagnosis and treatment.
Strengths and limitations
This systematic review is the first to apply the PRISMA-COSMIN reporting guidelines for Outcome Measurement Instruments [
31] and the updated COSMIN guidelines to evaluate the methodological quality and psychometric properties of postpartum trauma (PBT) instruments in postpartum women. We employed a comprehensive search strategy, exploring eight Chinese and English databases and reviewing references to ensure a thorough identification of relevant studies. Quality assessment was conducted independently by two researchers trained in evidence-based methodology, with a third reviewer from the Chinese evidence-based field resolving any disagreements. This approach provides reliable and transparent evidence-based guidance for selecting measures and identifies areas for further research. However, several limitations exist. Language restrictions excluded studies on instruments developed or validated in non-English and non-Chinese languages. Additionally, full texts [
75,
76] for two studies were inaccessible, and some PROMs lacked comprehensive reporting on measurement properties, which may affect the results of this review. Finally, to address the subjective nature of methodological quality and evidence grading, we involved at least two researchers in each review step to minimize variance and maintain objectivity and reliability.
Conclusion
This systematic review identified twelve patient-reported measurement instruments for assessing PBT. None of these tools received a Class A recommendation according to the COSMIN guidelines. Among the nine tools that received a Class B recommendation, the City BiTS is provisionally recommended for use and is considered credible in assessing maternal PBT. However, the methodological quality and reporting of these instruments varied across studies. Consequently, we encourage future researchers to conduct more comprehensive validations of the psychometric properties of existing PROMs or to develop new, higher-quality tools for more scientifically reliable assessments of PBT in postpartum women. Overall, these findings may provide valuable guidance to healthcare providers and researchers in selecting high-quality PBT PROMs for their work.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.