Introduction
Autistic adolescents (N.B. we use ‘autistic’ based on the research by Kenny et al.,
2015) may struggle with social interaction and communication e.g., starting and continuing conversations with their peers (Ke et al.,
2017). These difficulties may lead to lower quality friendships, higher feelings of loneliness, and more social anxiety than in their typically developing peers (Bauminger,
2002). As a result, social conversational skills have been a major intervention target amongst autistic adolescents. Although research on the effectiveness of social conversational skills interventions has increased, evidence for the efficacy of these interventions is limited, owing to a lack of psychometrically validated outcome measures that capture clinically meaningful changes in social conversational skills.
Behavioral observations may be one of the most objective outcome measures in general social skills assessment, and has several advantages. Employing trained and masked raters to code the frequency of target behaviors may increase sensitivity to specific changes during the observation (e.g., initiation of conversation, sudden topic changes, and silences) (Cunningham,
2012). Yet another advantage of observational assessments is that it does not rely on introspection or self-report, which permits greater flexibility in use (e.g., with those who have limited verbal ability or who struggle to self-report) (Jibril,
2018).
One published rating system designed to assess social skills in autistic individuals is the Contextual Assessment of Social Skills (CASS; Ratto et al.,
2011). The CASS is an observational measure used to assess social interaction skills within the context of a conversation in a short dyadic interaction with an unfamiliar and similarly aged peer, who is a confederate asked to follow specific instructions for social engagement with the participant. The conversations are videotaped. Subsequently, nine items relating to conversational skills are used to code participant behavior in the videotaped conversations (i.e., Asking Questions, Topic Changes, Vocal Expressiveness, Gestures, Positive Affect, Kinesic Arousal, Social Anxiety, Overall Involvement/Interest in the Conversation, and Overall Quality of Rapport). Ratings are performed by trained raters who, depending on study needs, can be masked to (unaware of) certain factors, such as diagnostic status, intervention arm, or assessment timepoint (i.e., pre, post).
The reliability and the validity of the CASS have been examined by the original US developers of this instrument (Ratto et al.,
2011). Internal consistency amongst all 9 items was high (standardized alpha = .83) and inter-rater reliability (interclass correlation coefficient) was acceptable with a mean value of 0.68. The 4-item total score (i.e., Asking Questions, Topic Changes, Overall Involvement, and Overall Quality of Rapport) was also analyzed separately and was acceptable (standardized alpha = .75). The CASS total score showed good convergent and divergent validity, since it was significantly associated with verbal IQ (
r = .32,
p < .04) and theory of mind (
r = .47,
p < .002) but not significantly correlated with performance IQ (
r = .006,
ns) (Ratto et al.,
2011). Correlations were also conducted with autism severity as measured using the SRS (a parent-rated social behaviour questionnaire) within the autism group only, as these data were not available for the control group (Ratto,
2010)
. Contrary to expectations, this correlation was not statistically significant (r = − .22,
ns). In a more recent US study, the CASS 4-item total scores were found to be inversely correlated with the Social Affect subscale of Autism Diagnostic Observation Schedule (ADOS-SA) (
r = − .44,
p = .02), suggesting that the CASS is a valid measure of social ability (Simmons et al.,
2020). Yet, Simmon et al., (
2020) also examined the convergent validity of the CASS domains with Social Responsiveness Scale-Social Communication Index (SRS-SCI), but again, the SRS-SCI was not significantly associated with any of the CASS domains. This lack of correlations might be because the SRS-SCI asks to rate behaviors in the previous 6 months, which may not be directly related to the specific behavior during the direct observation, or due to limited power. Previous studies on the psychometric properties of the CASS were limited by their moderate sample size, and all but one study (Rabin et al.,
2018) to date have been conducted with US samples.
Despite the growing use of the CASS as an outcome measure, limited research is available on its psychometric properties. Moreover, no clear consensus recommendations have been made regarding the computation of a total score. For example, Dolan et al. (
2016) used only the 7 Likert scale items (excluding the count items: Asking Questions, Topic Changes) to compute a total score, (i.e., Vocal Expressiveness, Gestures, Positive Affect, Kinesic Arousal, Social Anxiety, Overall Involvement, and Overall Quality of Rapport). While the original developers and the Hebrew/Israeli version of the CASS used four items (i.e., Asking Questions, Topic Changes, Overall Involvement, and Overall Quality of Rapport) to compute a total score (Rabin et al.,
2018; Ratto et al.,
2011; Simmons et al.,
2020). It would be useful to reach more consensus internationally on the composition of the total score to improve comparability amongst studies.
The CASS has been used in several studies to evaluate the effects of interventions, including the Program for the Education and Enrichment of Relationship Skills (PEERS®) (Dolan et al.,
2016; Rabin et al.,
2018; White et al.,
2015). Although these studies show interesting results regarding the CASS, enquiry of the psychometric properties of the CASS never is the focus. Although CASS has been used in other cultures, these previous studies did not report in detail on the cultural adaption process. The current study therefore fills this gap in the literature, by not only describing the procedures on how the CASS was linguistically translated, but also describing in detail how it was culturally adapted, to ensure the CASS is relevant in the Dutch culture, as well as to provide an example for other cultures. Moreover, the CASS was aligned with the PEERS intervention program, to make it an even more suitable outcome measure in our own randomized controlled trial (Idris et al,
2022), but these examples of creating aligning items might also inspire future studies. Since the CASS was designed as an outcome measure to assess change in response to social skills interventions in general, but was not specifically tailored to the outcomes of the PEERS® social skills intervention, there is not a one-on-one association between the skills taught in PEERS® and the items rated on the CASS. In fact, some items may show change in the unanticipated direction; for instance, it may be that a client is taught to gradually withdraw from a conversation when the conversational partner does not seem interested, rather than ask more questions or change the topic. As such, the content validity could be tailored better to intervention goals. The addition of new items that are directly related to the skills taught in PEERS® may improve the performance of the CASS as an outcome measure with this specific intervention. To our knowledge, literature on improving the content validity of the CASS is not yet available.
Taken together, research on the psychometric properties of the CASS is scarce. Hence, we aimed to contribute to the psychometric evaluation of the CASS, by investigating the reliability and validity of the Dutch CASS. Additionally, we aimed to tailor the content of the CASS more directly to the skills taught in the PEERS intervention, to improve its sensitivity to change following this specific intervention.
Results
The Translation, Adaptation, and Pre-testing
Originally, the CASS consists of nine rating items: (1) Asking Questions, (2) Topic Changes, (3) Vocal Expressiveness, (4) Gestures, (5) Positive Affect, (6) Kinesic Arousal, (7) Social Anxiety, (8) Overall Involvement/Interest in the Conversation, and (9) Overall Quality of Rapport. Besides the above mentioned original nine rating items, the Dutch developers have added four additional items to the D-CASS. The original item 1 was separated between (1a) Initiating and (1b) Follow-up Questions, as these behaviors are specifically instructed within PEERS®. The other three new items are binary coded items (i.e., yes/no), namely (0) Starting the Conversation, (10) Initiating the end of the conversation, (11) Giving a reason to end the conversation. The decision to add the new items was made based on the suggestions from earlier research and international experts to fit more closely to the PEERS® learning objectives. For example, initiating a conversation might depict some form of self-confidence. Floyd and Burgoon (
1999) found that participants who initiate the conversation showed the most nonverbal liking behavior responses to the confederate. Obviously, the behaviors of participants at the beginning of a conversation may not tell the whole story. Therefore, we also include the items associated with ending the conversation, especially since these skills are also taught within PEERS.
First, an independent native Dutch language expert carried out the forward translation of the CASS rating manual. Then, the backward translation into the English language was carried out by another independent translator. Besides the translation of the CASS rating manual, the Dutch assessment procedure has also been adjusted. Originally, the procedure was introduced as a role-play. Based on the input of trained confederates from Phase 2, it was decided that introducing it as a natural, ‘getting-to-know-each other’ conversation is less stressful and is feeling more naturally. Second, the two original conditions of the CASS, the interested and the bored condition have been replaced by one interested condition in the Dutch version, consistent with prior trials using the CASS as an outcome for PEERS. Third, the behavioral coding forms of the Dutch CASS provide some space to write down whether a participant asked questions or made statements which were too personal or offensive (verbal content alert). Besides, the forms also provide space to write down some nonverbal inappropriateness (non-verbal behavioral alert) like getting too close, too amicable touches or inappropriate nonverbal behavior for the situation like staring. Fourth, the confederate initiated the conversation after 5-s instead of 10-s in the original CASS. Fifth, after a knock on the door, the adolescents with ASD got the opportunity to finalize the conversation. Finally, the behaviors of the confederates were also rated within the Dutch version. All these elements have been added to the Dutch rating manual by FtH.
The study also extended and adapted the CRS to create the D-CRS. Two items about self-confidence and three items about the perspective taking of the conversational partner on the conversation were added, to align more closely to the learning goals of the PEERS® program. The reason for adding two items on self-confidence is to (a) first measure their feeling/idea of trust in their own ability, probably based on their current self-esteem (potentially increased/decreased during intervention) with the item assessed before the actual conversation, and then (b) to assess their own (potentially more objective) judgement of their actual performance, with the item assessed after the conversation.
Field Testing and Psychometric Analysis
The percentage of missing values was minimal (< 5%). Because the number of missing values was small, pairwise exclusion of missing data was used to deal with the missing values. The normality of distributions was inspected using histograms and normal q–q plots. Because the distributions were normal, variable transformations were considered unnecessary. The Mahalanobis distance was used to identify multivariate outliers using p < .001 and no outliers were identified.
Out of 106 autistic adolescents who participated in the RCT, and who completed the initial D-CRS items before the start of the actual conversation, seven adolescents were excluded from the analyses on the D-CASS observational items, because there was no useful D-CASS video recording. Of these seven individuals, two adolescents—after filling out the D-CRS self-confidence item, subsequently both refused to perform the actual conversation, because they had realized they were too anxious/unconfident. For the other five adolescents, we did not have a suitable D-CASS video recording due to technical problems (
n = 2) or because of the confederate not turning up in time
(n = 3). Therefore, in total, reliable data on the D-CASS observational items were available for
n = 99. Table
1 shows the descriptive data from the sample (n = 99) whose conversations were observed and coded.
Table 1
Means and standard deviations for demographic information (n = 99)
Age | | | 14.66 (1.53) |
Gender | | | |
Male | 69 | 69.7 | |
Female | 30 | 30.3 | |
Birth country | | | |
Netherlands | 93 | 93.9 | |
Belgium | 2 | 2.0 | |
China | 1 | 1.0 | |
Others | 3 | 3.0 | |
Relationship with parents | | | |
Biological | 92 | 92.9 | |
Foster | 3 | 3.0 | |
Adoption | 3 | 3.0 | |
Grandparent | 1 | 1.0 | |
Special education | | | |
Yes (No) | 65 (27) | 65.7 (27.3) | |
ADOS-2 CSS | 74 | | 5.53 (2.46) |
Total IQ | 80 | | 103.45 (17.27) |
Performance IQ | 81 | | 100.43 (16.14) |
Verbal IQ | 81 | | 105.15 (13.09) |
SRS-SCI | 98 | | 71.80 (20.17) |
SRS-RRB | 98 | | 14.73 (6.00) |
SSIS-A | 86 | | 84.65 (18.23) |
CRS total adolescents CRS total confederate | 98* 99 | 14–35 12–35 | 26.59 (4.62) 25.00 (5.90) |
Reliability
Inter-Item Correlations
The inter item correlations of the eleven items of the D-CASS were inspected to see which items correlated significantly with each other. Table
2 shows that the new additional items (i.e., Items 0, 1a, 1b, 10, and 11) did not strongly correlate with the other original items. Items 1 and 2 also did not strongly correlate with the other items. Since removal of these items resulted in a higher Cronbach’s alpha, we decided to work with a total score of 7 items, in line with Dolan and colleagues (
2016).
Table 2
Inter item correlations of the CASS rating domains
0 Starting conversation |
1 Asking questions | .19 | | | | | | | | | | | | |
1a Initiating questions | .37** | .63** | | | | | | | | | | | |
1b Follow-up questions | .00 | .87** | .17 | | | | | | | | | | |
2 Topic changes | .30** | .48** | .78** | .11 | | | | | | | | | |
3 Vocal expressiveness | .31** | .34** | .28** | .26** | .14 | | | | | | | | |
4 Gestures | .25* | − .03 | .03 | − .06 | − .02 | .44** | | | | | | | |
5 Positive affect | .34** | .23* | .14 | .21* | .06 | .71** | .51** | | | | | | |
6 Kinesic arousal | .25* | .33** | .22* | .28** | .29** | .38** | .21* | .33** | | | | | |
7 Social anxiety | .27** | .29** | .24* | .21* | .24* | .48** | .29** | .43** | .58** | | | | |
8 Overall involvement | .36** | .39** | .26** | .33** | .16 | .66** | .41** | .72** | .32** | .59** | | | |
9 Overall quality of rapport | .22* | .27** | .20* | .22* | .07 | .56** | .38** | .65** | .30** | .60** | .75** | | |
10 Initiating end of conversation | − .00 | .17 | .10 | .16 | .21* | .04 | − .02 | .09 | .17 | .07 | .20 | .20 | |
11 Giving reason to end conversation | .02 | .09 | .07 | .07 | .08 | .08 | .01 | .14 | .18 | .14 | .17 | .19 | .35** |
Internal Consistency
In our sample, a high Cronbach’s alpha (α) value was obtained for the D-CASS total score consisting of the sum of the
seven original rating items of the D-CASS (Cronbach’s alpha = .86), as used in the study of Dolan (Dolan et al.,
2016). Meanwhile, Cronbach’s alpha for the
four items as used by the original developer Ratto et al. (
2011) and Rabin et al. (
2018) had a moderate internal consistency (Cronbach’s alpha = .69).The internal consistency for D-CRS is high (Cronbach’s alpha = .83).
Confirmatory Factor Analysis
To confirm the factorial validity of the D-CASS total score based on the previous studies (Dolan et al.,
2016; Rabin et al.,
2018; Ratto et al.,
2011), we compared the 4-item and 7-item models on their goodness-of-fit using the data from the present study (see Table
3).
Table 3
Results of the comparison of different factorial models for the Dutch CASS total score
| 4 | 23.84(2) | .81 | .05 | .32 | .16–1.0 |
| 7 | 49.05(14) | .90 | .79 | .15 | .44–.88 |
After estimating the models, goodness-of-fit statistics were obtained. The 7-item model showed a better fit, with CFI and TLI above .90. These findings provided further support for use of the 7-item total score rather than the 4-item total score.
Validity
Convergent and Divergent Validity
Based on results of the previous analyses (i.e., inter-item correlation, internal consistency, and CFA) indicating better fit for the 7-item model, this model was used to evaluate the validity of the D-CASS. The correlations between the D-CASS total score with the subscales of the self-report SSIS-Adolescent, SRS-SCI (parent-report), SRS Autistic Mannerism, and the Verbal IQ at baseline are shown in Table
4.
Table 4
Correlations between CASS total score, SSIS-adolescent total score, SSIS subscales, SRS-SCI, and SRS-2 RRB
Pearson r Sig. (2-tailed) | .12 .24 | .14 .18 | − .02 .82 | .26 .01* | .12 .24 | .10 .34 | .16 .12 | .05 .65 | − .21 .04* | − .08 .43 | .23 .04* |
Regarding the convergent validity, the 7-item D-CASS total score was significantly correlated with the SSIS sub-subscale Assertion (r = .26, p = .01) and with the SRS-SCI (r = − .21, p = .04). Regarding the divergent validity, the D-CASS total score was not significantly correlated with the RRB subscale (r = − .08, p = .43), indicating that the D-CASS is a measure of social skills and not of other autistic symptoms. The D-CASS Total and Verbal IQ are significantly correlated (r = .23, p = .04).
Discussion
The purpose of this study was to adapt the CASS to the Dutch population and to evaluate the psychometric properties of the D-CASS using confirmatory factor analyses (CFA). The translator and the expert panel all agreed on the translation during the translation process. Some behavioral coding modifications were done to create the D-CASS. Four new binary items; Initiating and Follow-up Questions, Starting the Conversation, Initiating the End of the Conversation, and Giving Reason to End the Conversation were added to better align with the social skills learning objectives of PEERS.
The first two count items of the original CASS (i.e., Item 1: Asking Questions and Item 2: Topic Changes) were not well correlated with the other items during the analyses. Aside from that, the frequency count items may be incompatible with the specific skills and social customs taught in PEERS® (e.g., the rule “don’t be an interviewer”), that conflict with counting the number of questions asked as an index of social skills, since asking too many questions might be considered too interruptive/dominant (Dolan et al.,
2016; Laugeson & Frankel,
2010). Therefore, in this study, these two frequency items were not integrated into the CASS total score. The four additional new items were also not included in the D-CASS total score, because these items are on a binary scale, whereas the other seven original items are a 7-point Likert scale. A combination of these scales may bring to a low/high variability level as well as floor and ceiling challenges (Grassi et al.,
2007). In a paper about the outcomes of the Dutch version of PEERS®, we did report on these items (Idris et al,
2022). In the near future, in consultation and collaboration with all international researchers who use the CASS, we will discuss the (potential) value of the additional new items that were introduced in the Dutch CASS for broader purposes, on which we intend to co-create a follow-up paper (i.e., on the shared creation of the international CASS-2).
Two distinct item sets have been used to calculate the CASS total score: the original 4-item set (Ratto et al.,
2011) and a 7-item set (Dolin et al.,
2016). The two models were compared in terms of psychometric properties. The 7-item model produced more robust fit indices (e.g., CFI), relative to the 4-item model that showed relatively low CFI, TLI and the RMSEA (Table
2).
The Dutch CASS total score was modestly correlated with the SSIS Assertion subscale, assessing a person’s capacity to effectively express feelings, wants, and desires, which is crucial during a conversation. This suggests that the D-CASS has sufficient convergent validity and could be a useful treatment outcome measure for adolescents who need to enhance their social skills. Divergent validity was also evaluated. In line with Ratto et al. (
2011), the CASS total score was not significantly correlated with the SRS-autism mannerism subscale, indicating sufficient divergent validity.
Apart from demonstrating that the Dutch CASS is a reliable and valid observational measurement, the current study has other important strengths. First, the current study is the first to go through a translation and validation procedure based on established guidelines (Hall et al.,
2017; Tsang et al.,
2017) in the Dutch population with a large sample (
n = 99). Second, it should be noted that the sample employed in this study, can be regarded a good representation of the Dutch autistic population in terms of generalizability (94% are Dutch, from ranging areas of the country). The current study’s sample was heterogenous and included both boys and girls. Boys and girls with ASD are represented in the Dutch society in a 4:1 ratio (Nederlands Jeugdinstituut,
2015) and the sample was drawn from various mental health institutions around the Netherlands.
Aside from the benefits mentioned above, the current study had certain drawbacks. First, there is always a small possibility that there might be slight rater bias in the coding of CASS videos. Even though the time point of the conversation and the intervention condition of the autistic adolescents were meticulously kept hidden from the raters, the raters sometimes were smart members of the RCT research team, therefore, sometimes they may have used specific information about the project organization (i.e. specific location of assessments/trainings, starting date of the project/season) to brightly reason about group-membership (i.e. condition) and/or timepoint (pre, post or follow-up) during their video coding. In this study, we only used the pre-assessment videos; nonetheless, some raters may have subconsciously been slightly biased to give these pre-assessment videos lower scores. Second, the logistics and implementation of the CASS are both complex and time-consuming. For example, finding suitable confederates and training the coders to reliably score the CASS takes considerable time and effort. Therefore, at some rare occasions, there was a larger age difference between conversational partners than would have been ideal. Furthermore, participating in the CASS can be anxiety-provoking for certain participants, as illustrated by the two participants who along the way declined to participate. They were asked to start a conversation with a stranger while being recorded. Some of the participants became overwhelmed (had a black out) and some might have become more cautious (acting very shy or nicer), as reported in the CRS. Therefore, these issues should be taken into consideration when implementing the D-CASS.
Our findings also provide points for consideration in future treatment research that will use the CASS or similar observational measures. Confederates were trained during the trial to ensure the uniformity in procedures and social communicative behavior of the confederates during the CASS. This training is essential, as it may help ensure reliability across participants and time points. Here we provide two suggestions to improve consistency during the CASS procedures and to ensure that the conversations are truly social and reciprocal in nature: First, we modified the instruction given at the beginning of the assessment outside the room rather than inside the room (e.g., “You will have 3-min to talk and get to know each other. After 3-min, I will knock on the door and both of you need to finalize the conversation”). In comparison the previous versions, in this way, there is no test leader present in the room. Second, we changed the procedure for finalizing the conversation. Rather than coming into the room and interrupting the ongoing conversation, after 3 min the test leader knocked on the door and the autistic adolescents got the opportunity to themselves finalize the conversation. The confederates were asked to leave the finishing of the conversation to the participants, to allow the participants to demonstrate their conversation finishing skills (i.e., in line with the social etiquette as taught during PEERS). If the conversation paused, confederates were told to wait 5-s before reinitiating the conversation.
This study may contribute to the critical need for an observational measurement for assessing the efficacy of social skills interventions. Therefore, the current study was a preliminary step in describing the Dutch CASS and providing a foundation for future, larger international studies. In the future, social skills interventions may be evaluated using observations rather than questionnaires. The CASS videos could also be incorporated into social skills interventions as a video feedback instrument for autisticindividuals, assisting them in reflecting on their social skills and identifying concrete goals to work on during the intervention. Obviously, such a clinical implementation would be time-consuming and thus expensive, but it could improve the efficacy of the intervention and thus be worth considering for future innovations.
Conclusion
This study used pre-assessment RCT data to investigate the reliability and validity of the Dutch CASS. Results suggested that a total score of 7-item had the best Cronbach’s alpha and a sufficient CFI. Other researchers should however conduct their own reliability and factor analyses to assess which total score is most appropriate in their dataset. Consensus on how to consistently use and present results of the CASS will be important to establish if this instrument is to be used to compare results amongst studies.
The D-CASS has the potential to be a suitable treatment outcome measure for evaluating the outcomes of social skills interventions for two reasons: (1) it reflects the most common social obstacles struggled with by autistic adolescents; (2) it is a direct assessment of an individual’s social interaction with similarly aged peers, which is a difficult task for individuals with ASD (White et al.,
2015). Findings from this research and other research (Corbett et al.,
2020; Dolan et al.,
2016; Rabin et al.,
2018; Simmons et al.,
2020) using CASS as the outcome indicate that the CASS is a feasible alternative in research settings. As a social interaction skills measure, the CASS allows autistic adolescents to practice in an engaging, semi-structured, and supportive environment. It is likely that this planned reciprocal social interaction activity will also help set the stage for interactions with peers in other social settings, such as at home, at the playground, and in community environments, as usually reported on by the parents (e.g. using the SRS-SCI).
Finally, the use of a peer-mediated approach in interventions for autistic children and adolescents showed positive outcomes (Barry et al.,
2003; Kamps et al.,
1992; Kasari et al.,
2012; Lang et al.,
2011; Odom & Strain,
1984). Similarly in the CASS, the confederates were trained and supervised peers, who delivered learning opportunities based on the intervention protocol with a high degree of reliability and competence. Utilizing the CASS as a clinical tool besides the purpose of a research outcome measure is therefore estimated to be of high value.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.