Back to W. B. Stiles home page
Session Evaluation
Questionnaire:
Structure and Use
William B. Stiles
Department of Psychology, Miami University
Oxford, Ohio 45056, USA
Email: stileswb@miamioh.edu
This handout is an edited version of material written for a chapter entitled "Session evaluation and the Session Evaluation Questionnaire" (Stiles, Gordon, & Lani, 2002).
Psychotherapy and counseling sessions are judged as good or bad in at least two distinct ways simultaneously: (a) as powerful and valuable versus weak and worthless and (b) as relaxed and comfortable versus tense and distressing. On the Session Evaluation Questionnaire (SEQ), these two session evaluation dimensions are called Depth and Smoothness, respectively. The SEQ was designed to measure these two dimensions through several iterations, using factor analysis to obtain independent, robust, internally consistent sets of items (Stiles, 1980; Stiles, Reynolds, et al., 1994; Stiles & Snow, 1984b; Stiles, Tupler & Carpenter, 1982). These analyses found the same groupings of items in ratings by therapists and clients, suggesting that the same scales may be used for both participants.
In addition to session evaluation, the SEQ measures two dimensions of participants' post-session mood, Positivity and Arousal (Stiles & Snow, 1984b, Stiles et al., 1994). These are widely considered as basic theoretical dimensions of mood and emotion (Larsen & Diener, 1992; Reisenzein, 1994), and they account for most of the rating variance on a wide variety of measures of mood and emotion in a wide variety of circumstances (Russell, 1978, 1979).
The SEQ, Form 5, includes 21 items in a 7-point bipolar adjective format. Respondents are instructed: "Please circle the appropriate number to show how you feel about this session." The items are divided into two sections, session evaluation and post-session mood. The stem "This session was:" precedes the first 11 items (session evaluation), bad-good, difficult-easy, valuable-worthless, shallow-deep, relaxed-tense, unpleasant-pleasant, full-empty, weak-powerful, special-ordinary, rough-smooth, and comfortable-uncomfortable. The stem "Right now I feel:" precedes the second 10 items (post-session mood), happy-sad, angry-pleased, moving-still, uncertain-definite, calm-excited, confident-afraid, friendly-unfriendly, slow-fast, energetic-peaceful, and quiet-aroused The adjective scales within each section were selected to clearly represent the two evaluative and two mood dimensions, respectively, and to avoid skew in the distribution of ratings..
Each item is scored from 1 to 7, reversed as appropriate, with higher scores indicating greater Depth, Smoothness, Positivity, or Arousal. An index for each dimension is calculated as the mean rating on the items that have had the highest and most consistent loadings in factor analyses. Unit-weighted item ratings are used, rather than factor scores, for simplicity and for generality across perspectives and levels of analysis (exact values of factor loadings have varied across analyses, but the items loading highest on each dimension have been consistent).
A score for each of the dimensions is calculated as the mean of the constituent item ratings, rather than the sum of the item ratings. Consequently, the dimension scores lie on the same 7-point scale as the individual items, making interpretation easier. The midpoint of each SEQ scale is 4.00, and the possible range (e.g., from maximum Shallowness to maximum Depth) is 1.00 to 7.00.
The SEQ Form 5 is identical to Form 4 (Dill-Standiford, Stiles, & Rorer, 1988; Stiles et al., 1994) except that three unused items were deleted to save space (safe-dangerous from the session evaluation section and wakeful-sleepy and involved-detached from the post-session mood section). The items used in scoring Depth, Smoothness, and Positivity are identical on Forms 3, 4 and 5. Form 4 differed from Form 3 (Stiles & Snow, 1984a, 1984b) only in that some of the adjective pairs were added or changed to construct the Arousal mood factor. Form 1 (Stiles et al., 1982) and Form 2 (Stiles, 1980) used somewhat different sets of adjectives, but results involving corresponding scales should be comparable. It has been translated into several other languages besides English.
The SEQ has been applied to many types of individual therapy sessions, to group therapy and encounter group sessions, family and marital sessions, and supervision sessions (Stiles, Gordon, & Lani, 2002). It is typically completed by participants immediately following the session to be evaluated, but it can be (and has been, in some studies) completed by participants at a later time or completed by external raters based on tape recordings. The SEQ's content and format make it appropriate for assessing anything that could be called a session. The psychological phenomena that it measures -- evaluation and mood -- appear to be components of essentially all human activity.
Reliability: Internal Consistency, Agreement, and Stability of SEQ Indexes
In discussing reliability of SEQ indexes, one must distinguish among (a) internal consistency, the degree to which each index's items measure the same thing, (b) interperspective agreement, the degree to which individuals from different perspectives (e.g., therapist, client, observer) give similar ratings, and (c) stability, the degree to which respondents give the same ratings on different occasions.
Internal consistency, measured by coefficient alpha, has been high for all SEQ indexes across a wide variety of conditions and settings (e.g., .90 for Depth, .93 for Smoothness; Reynolds et al., 1997). This is to be expected of scales constructed from robust factors, as the SEQ has been.
Interperspective agreement about the evaluation of a session is more complex. It is operationally parallel to inter-rater reliability (i.e., it compares different raters' ratings of a target). There are important and valid reasons, however, why therapists, clients, and external observers might evaluate sessions differently, so low interperspective agreement is not necessarily a drawback. On the contrary, it may have important substantive interpretations and consequences.
According to a framework proposed by Dill-Standiford et al. (1988), consensus, the similarity of clients' and therapists' own session evaluations, should be distinguished from other types of interperspective agreement based on participants' estimates each other's session ratings. Therapist awareness involves comparing therapists' estimate of their clients' ratings with their clients' actual ratings. Client awareness involves comparing clients' estimate of their therapists' ratings with their therapists' actual ratings. In principle, participants might disagree about how to evaluate a session (low consensus) but be aware of how they disagreed.
Though operationally similar to test-retest reliability, stability across sessions should not be considered as reliability at all because the target of each rating is different. SEQ ratings vary from session to session (because sessions are not all alike), so that a single session rating is a poor measure of a respondent's typical experience. The stability of ratings by a particular client depends on the proportion of the rating's variance that is at the client level, as explained in the next section. Usually, however, the mean rating across 4‑6 sessions gives a reasonably stable index of Depth or Smoothness (see Stiles & Snow, 1984a, especially footnote 1).
Hierarchical Organization of Session Evaluation Data
Session evaluation data often have a hierarchical structure because sessions ratings are collected on several of each therapist's clients and several of each client's sessions. In such a hierarchical data set, treating raw session ratings as independent observations in statistical tests would fail to discriminate among several distinct sources of variation: (a) differences among therapists, (b) differences among clients of each therapist, and (c) differences among sessions of each client. (Dill-Standiford et al., 1988; Stiles et al., 1994; Stiles & Snow, 1984a).
To ameliorate this problem, some investigators make statistical adjustments and conduct separate analyses at different levels: Session-level analyses use ratings of multiple sessions from each case that have been statistically adjusted for differences among cases (e.g., by using deviation scores, which are the differences between the observed session scores and the mean for that client) or that are averaged across cases. Client-level analyses use mean ratings across each client's sessions that have been adjusted for differences among therapists (i.e., one adjusted mean per client) or that are averaged across therapists. Therapist-level analyses use mean ratings across each therapist's clients (i.e., one mean per therapist). There is a good deal variability in SEQ scores at both the session and client levels (but less at the therapist level), so it is interesting to study relations with other variables at both levels. For example, comparisons of session evaluations with measures of therapy process are typically made at the session level, whereas comparisons of session evaluations with measures of therapy outcomes or individual differences are typically made at the client level.
Separate factor analyses at the session and client levels have confirmed that the items on the Depth, Smoothness, Positivity, and Arousal scales form the same factors at both levels (Stiles et al., 1994; Stiles & Snow, 1984b), suggesting that the constituent adjectives have similar meanings when used to distinguish among sessions as when used to distinguish among clients. Nevertheless, variables at these different levels have different interpretations, and they can (and sometimes do) have different relations with each other and with other variables (Dill-Standiford et al., 1988; Norman, 1967).
Scoring the SEQ (Form 5)
Indexes of session Depth, session Smoothness, post-session Positivity, and post-session Arousal are calculated as the mean ratings on the appropriate items, as indicated in the formulas below. On the form, item order is mixed within each section (session evaluation and post-session mood), and item directionality is approximately balanced. Each item is scored from 1 to 7, reversed as appropriate, with higher scores indicating greater Depth, Smoothness, Positivity, or Arousal. . The mean is used as an index, rather than the sum of the item scores, so that the scores lie on the same 7-point scale as the individual items, making interpretation easier. Specifically, using the adjective on the right-hand side of the form as the name of each item:
Depth = [(8-worthless) + deep + (8-empty) + powerful + (8-ordinary)] / 5.
Smoothness = [easy + (8-tense) + pleasant + smooth + (8-uncomfortable)] / 5.
Positivity = [(8-sad) + pleased + definite + (8-afraid) + (8-unfriendly)] / 5.
Arousal = [(8-still) + excited + fast + (8-peaceful) + aroused] / 5.
Note that only 20 of the 21 SEQ items are used in these indexes. The remaining one is the first session evaluation item, bad-good, which tends to be used differently by clients than by therapists (Stiles, 1980; Stiles & Snow, 1984b). For therapists, bad-good has loaded on the Depth factor, but for clients it has often been split between Depth and Smoothness. The bad-good item has been retained on Form 5 because of its intrinsic interest as a global evaluation item (see Stiles et al., 1994).
Download spreadsheet for scoring the SEQ, courtesy of James Tighe: SEQ spreadsheet.xls SEQ spreadsheet.xlsx
Dill-Standiford, T. J., Stiles, W. B., & Rorer, L. G. (1988). Counselor-client agreement on session impact. Journal of Counseling Psychology, 35, 47-55.
Larsen, R.J. & Diener, E. (1992). Promises and problems with the circumplex model of emotion. In Clark, M. (ed.) Emotion (Review of Personality and Social Psychology Vol. 13.) Sage Publications.
Norman, W. T. (1967). On estimating psychological relationships: Social desirability and self-report. Psychological Bulletin, 67, 273-293.
Reisenzein, R. (1994). Pleasure-Arousal Theory and the intensity of emotions. Journal of Personality and Social Psychology, 67, 525 - 539.
Reynolds, S., Stiles, W. B., Barkham, M., Shapiro, D. A., Hardy, G. E., & Rees, A. (1996). Acceleration of changes in session impact during contrasting time-limited psychotherapies. Journal of Consulting and Clinical Psychology, 64, 577-586.
Russell, J. A. (1978). Evidence of convergent validity on dimensions of affect. Journal of Personality and Social Psychology, 36, 1152-1168. 1978
Russell, J. A. (1979). Affective space is bipolar. Journal of Personality and Social Psychology, 37, 345-356.
Stiles, W. B. (1980). Measurement of the impact of psychotherapy sessions. Journal of Consulting and Clinical Psychology, 48, 176-185.
Stiles, W. B., Gordon, L. E., & Lani, J. A. (2002). Session evaluation and the Session Evaluation Questionnaire. In G. S. Tryon (Ed.), Counseling based on process research: Applying what we know (pp. 325-343). Boston, MA: Allyn & Bacon.
Stiles, W. B., Reynolds, S., Hardy, G. E., Rees, A., Barkham, M., & Shapiro, D. A. (1994). Evaluation and description of psychotherapy sessions by clients using the Session Evaluation Questionnaire and the Session Impacts Scale. Journal of Counseling Psychology, 41, 175-185.
Stiles, W. B., & Snow, J. S. (1984a). Counseling session impact as viewed by novice counselors and their clients. Journal of Counseling Psychology, 31, 3-12.
Stiles, W. B., & Snow, J. S. (1984b). Dimensions of psychotherapy session impact across sessions and across clients. British Journal of Clinical Psychology, 23, 59-63.
Stiles, W. B., Tupler, L. A., & Carpenter, J. C. (1982). Participants' perceptions of self-analytic group sessions. Small Group Behavior, 13, 237-254.
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Session Evaluation Questionnaire (Form 5)
ID# |
Date: |
This session was: |
|||||||||
|
bad |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
good |
|
difficult |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
easy |
|
valuable |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
worthless |
|
shallow |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
deep |
|
relaxed |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
tense |
|
unpleasant |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
pleasant |
|
full |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
empty |
|
weak |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
powerful |
|
special |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
ordinary |
|
rough |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
smooth |
|
comfortable |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
uncomfortable |
Right now I feel: |
|||||||||
|
happy |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
sad |
|
angry |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
pleased |
|
moving |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
still |
|
uncertain |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
definite |
|
calm |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
excited |
|
confident |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
afraid |
|
friendly |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
unfriendly |
|
slow |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
fast |
|
energetic |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
peaceful |
|
quiet |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
aroused |