Handbook of psychology volume 7 educational psychology

A Stage Model of Educational /Psychological

bet	133/153
Sana	16.07.2017
Hajmi	9.82 Mb.
	#11404

1 ... 129 130 131 132 133 134 135 136 ... 153

A Stage Model of Educational /Psychological

Intervention Research

Our vision of how to close one of intervention research’s

undamental credibility gaps, while at the same time better

informing practice, is presented in Figure 22.1’s stage model

of educational/psychological intervention research. In con-

trast to currently popular modes of intervention-research in-

quiry and reporting, the present conceptualization (a) makes

explicit different research stages, each of which is associated

with its own assumptions, purposes, methodologies, and stan-

dards of evidence; (b) concerns itself with research credibility

through high standards of internal validity; (c) concerns itself

with research creditability through high standards of external

570

Educational / Psychological Intervention Research

validity and educational/societal importance; and, most sig-

niﬁcantly, (d) includes a critical stage that has heretofore been

missing in the vast majority of intervention research, namely,

a randomized classroom trials link (modeled after the clinical

trials stage of medical research) between the initial develop-

ment and limited testing of the intervention and the prescrip-

tion and implementation of it. Alternatively, Stage 3 could be

referred to as an instructional trials stage or, more generically,

as an educational trials stage. To simplify matters, for the re-

mainder of the chapter we continue to refer to Stage 3 as the

randomized classroom trials stage of credible intervention re-

search studies.

Stages 1 and 2 of the Figure 22.1 model are likely very

familiar to readers of this chapter, as studies in those tradi-

tions comprise the vast majority of intervention research as

we know it. In addition, throughout the chapter we have pro-

vided details of the two Stage 2 components of the model in

our consideration of the research-ﬁrst (controlled laboratory

experiments) versus practice-ﬁrst (case studies, demonstra-

tions, and design experiments) perspectives. Both controlled

laboratory experiments and applied studies are preliminary,

though in different complementary senses. The former are

preliminary in that their careful scrutiny of interventions

lacks an applied-implementation component, whereas the lat-

ter are preliminary in that their intervention prescriptions are

often not founded on scientiﬁcally credible evidence. Stage 1

and Stage 2 studies are crucial to developing an understand-

ing of the phenomena that inform practice (Stage 4) but that

ﬁrst must be rigorously, complexly, and intelligently evalu-

ated in Stage 3. Failure to consider possibilities beyond

Stages 1 and 2 may result in a purposelessness to research, a

temptation never to go beyond understanding a phenomenon

and determining whether it is a stable phenomenon with gen-

uine practice implications. The accumulation of applied, sci-

entiﬁcally credible evidence is precisely the function of the

randomized classroom trials stage (Stage 3, highlighted in

Figure 22.1) of the model. As in medical research, this

process consists of an examination of the proposed treatment

or intervention under realistic, yet carefully controlled, con-

ditions (e.g., Angell & Kassirer, 1998).

Stage 1

Stage 2

Randomized classroom trials studies

Stage 3

Preliminary ideas, hypotheses, observations, and pilot work

Controlled laboratory

experiments

Classroom-based

demonstrations and design

experiments

Informed classroom practice

Stage 4

Figure 22.1

Stage Model of Educational/Psychological Intervention Research.

Source: From

Levin & O’Donnell (1999).

Enhancing the Credibility of Intervention Research

571

“Realistic conditions” refer to the speciﬁc populations and

contexts about which one wishes to offer conclusions regard-

ing treatment efﬁcacy (i.e., external validity desiderata). In

medical research the conditions of interest generally include

humans (rather than animals), whereas in psychological and

educational research the conditions of interest generally in-

clude children in community settings and school classrooms

(rather than isolated individuals). In addition, in both medical

and psychological/educational contexts, the interventions

(e.g., drugs or instructional methods, respectively) must be

administered in the appropriate fashion (dosage levels or in-

structional integrity, respectively) for a long enough duration

for them to have effect and to permit the assessment of both

the desired outcome (e.g., an improved physical or social-

academic condition, respectively) and any unwanted side

effects (adverse physical, cognitive, affective, or behavioral

consequences). In a classroom situation, an appropriately

implemented instructional intervention of at least one semes-

ter, or even one year, in duration would be expected to satisfy

the “long enough” criterion.

“Carefully controlled conditions” refer to internally valid

experiments based on the random assignment of multiple in-

dependent “units” to alternative treatment-intervention condi-

tions. Again, in medical research the randomized independent

units are typically humans, whereas in educational interven-

tion research the randomized independent units are frequently

groups, classrooms, or schools (Levin, 1992, 1994). As with

medical research, careful control additionally involves design

safeguards to help rule out contributors to the effects other

than the targeted intervention, such as including appropriate

alternative interventions, incorporating blind and double-

blind intervention implementations (to the extent possible) so

that child, teacher, therapist, and researcher biases are elimi-

nated, and being responsive to all other potential sources of

experimental internal invalidity (Campbell & Stanley, 1966;

Shadish, Cook, & Campbell, 2002).

The randomized classroom trials stage of this model is

sensitive to each of the earlier indicated CAREful research

components, in that (a) the inclusion of alternative interven-

tions (including appropriately packaged standard methods or

placebos) permits meaningful Comparison when assessing

the effects of the targeted intervention; (b) the use of multi-

ple independent units (both within a single study and,

ideally, as subsequent replication studies) permits general-

ization through the speciﬁed outcomes being produced

Again and again; and (c) with across-unit randomization of

interventions (and assuming adequate control and appropri-

ate implementation of them), whatever Relationship is found

between the targeted intervention and the speciﬁed outcomes

can be traced directly to the intervention because (d) with

such randomization, control, and implementation, one is bet-

ter able to Eliminate all other potential explanations for the

outcomes.

The randomized classroom trials stage of our proposed

model possesses a number of critical features that are worth

mentioning. These features represent the best of what CARE-

fully controlled and well-executed laboratory-based research

has to offer applied and clinical research. First and foremost

here is the inclusion of multiple units (or in single-participant

research designs, multiple phases and within-phase observa-

tions per unit; see, e.g., Kratochwill & Levin, 1992) that are

randomly assigned to receive either the targeted intervention

or an acceptable alternative. For example, when classrooms

are the units of analysis, the use of multiple independent

classrooms is imperative for combating evidence-credibility

concerns arising from both methodological and statistical

features of the research. Each of these will be brieﬂy consid-

ered here (for additional discussion, see Campbell & Boruch,

1975; Levin, 1985, 1992, 1994; Levin & Levin, 1993).

Methodological Rigor

Consider some examples from educational research to contex-

tualize our perspectives on methodological rigor. In a typical

instructional intervention study, the participants in one class-

room receive new instructional methods or materials (includ-

ing combinations of these, multicomponent versions, and

systemic curricular innovations), whereas those in another

classroom receive either alternative or standard instructional

methods/materials/curricula. One does not have to look very

hard to ﬁnd examples of this type of study in the intervention

research literature, as it is pervasive. The aforementioned

Graziano et al. (1999) training study is an example of this

methodological genre. The problem with such studies is that

any resultant claims about intervention-produced outcomes are

not credible because whatever effects are observed can be

plausibly attributed to a myriad of other factors not at all

connected with the intervention. In studies where there is only

one classroom/teacher per intervention, for example, any po-

tential intervention effects are inextricably confounded with

classroom/teacher differences—even if “equivalence” can be

demonstrated on a pretest. If students are not randomly as-

signed to classrooms and classrooms to interventions, inter-

vention effects are confounded with selection biases as well.

Indeed, as far as credible evidence is concerned, a reasonable

case can be made that a “one classroom per intervention” study

is just that—an individual case. Accordingly, one-classroom-

per-intervention cases fall into our earlier discussion of

intervention research that in actuality is a classroom-based

demonstration.

572

Educational / Psychological Intervention Research

With the addition of sequential modiﬁcations of the instruc-

tional intervention, the previously discussed design experi-

ment also resembles the one-classroom-per-intervention

prototype. Minor variations of that prototype include assigning

a couple classrooms to each intervention condition (e.g.,

Brown, 1992) or having one or a few teachers alternately

implement both interventions in a few classrooms (e.g.,

Collins, 1992). Unfortunately, methodological and statistical

concerns (related to nonrandomization; contaminating teacher,

student, classroom, and researcher effects; and inappropriate

units of analysis, among others), analogous to the ones raised

here, are associated with such variations as well. Recent

methodological and statistical developments out of the behav-

ior-analytic and clinical research traditions do, however, have

the potential to enhance the scientiﬁc credibility of the one-

or-few-classrooms-per-intervention study (e.g., Koehler &

Levin, 1998; Kratochwill & Levin, 1992; Levin & Wampold,

1999) and, therefore, should be given strong consideration in

classroom-based and other intervention studies.

Unfortunately,

adding

the

sequential

intervention-

modiﬁcation strategy of design experiments serves only to add

confounding variables to the interpretive mix. Although some

may regard confounding the effect of an intervention with

other variables to be acceptable in a design experiment—“Our

interventions are deliberately designed to be multiply con-

founded” (Brown, 1992, p. 167)—confoundings of the kind

described here clearly are not acceptable in the classroom tri-

als stage of educational intervention research. In Stage 3 of the

model, the random assignment of multiple classrooms or other

intact groups to interventions serves to counteract this

methodological concern; for actual research examples, see

Byrne and Fielding-Barnsley (1991); Duffy et al. (1987); and

Stevens, Slavin, and Farnish (1991).

Consistent with the earlier presented Comparison compo-

nent of CAREful research, the need for including appropriate

comparison classrooms (or other aggregates) is of paramount

importance in the Stage 3 model. As Slavin (1999) forcefully

pointed out in response to a critic advocating the documenta-

tion of an intervention’s effectiveness not by a comparison

with a nonintervention control condition but through the pre-

sentation of what seem to be surprising outcomes in the

intervention condition,

An experimental-control comparison between well-matched

(or, ideally, randomly assigned) participants is to be able to

provide powerful evidence for or against a causal relationship

[attributable to the intervention], because the researcher estab-

lishes the experimental and control groups in advance, before

the results are known, and then reports relative posttests or

gains. In contrast, [the critic’s] search for “surprising” scores or

gains begins after the fact, when the results are already known.

This cannot establish the effect of a given program on a given

outcome; any of a thousand other factors other than the treat-

ment could explain high scores in a given school in a given

year. . . . If an evaluation has data on 100 schools implementing

a given program but only reports on the 50 that produced the

most positive scores, it is utterly meaningless. In contrast, a

comparison of 10 schools to 10 well-matched control schools

provides strong evidence for or against the existence of a pro-

gram impact. If that experimental-control comparison is then

replicated elsewhere in a series of small but unbiased studies,

the argument for a causal relationship is further strengthened.

(Slavin, 1999, pp. 36–37)

Slavin’s hypothetical example should evoke readers’

memories of the perils and potential for deception that are in-

herent in the examine aspect of the ESP model of educational

intervention research. The example also well illustrates the

adapted adage: A randomized experiment is worth more than

100 school demonstrations!

Analytic Appropriateness

Early and often in the history of educational research, much

has been written on the inappropriateness of researchers’ sta-

tistically analyzing the effects of classroom-implemented

interventions as though the interventions had been indepen-

dently administered to individual students (e.g., Barcikowski,

1981; Levin, 1992; Lindquist, 1940; Page, 1965; Peckham,

Glass, & Hopkins, 1969). That is, there is a profound mis-

match between the units of intervention administration

(groups, classrooms) and the units of analysis (children, stu-

dents) and conducting child/student-level statistical analyses

in such situations typically results in a serious misrepresenta-

tion of both the reality and the magnitude of the intervention

effect. [As an interesting aside, units of analysis is another

term with a speciﬁc statistical meaning that is now being ca-

sually used in the educational research literature to refer to

the researcher’s substantive grain-size perspective: the indi-

vidual student, the classroom collective, the school, the com-

munity, etc. (see, e.g., Cobb & Bowers, 1999, pp. 6–8).]

Consider, for example, a hypothetical treatment study in

which one classroom of 20 students receives a classroom

management instructional intervention and another class-

room of 20 students receives standard classroom protocol. It

is indisputably incorrect to assess the intervention effect in

that study on the basis of a conventional student-level t test,

analysis of variance, chi-square test, or other statistical pro-

cedures that assume that 40 independently generated student

outcomes comprise the data. Analyzing the data in that fash-

ion will produce invalid results and conclusions.

Enhancing the Credibility of Intervention Research

573

Even today, most “one group per intervention” (or even “a

couple groups per intervention”) researchers continue to

adopt units-inappropriate analytic practices, in spite of the

earlier noted cautions and evidence that such practices lead to

dangerously misleading inferences (e.g., Graziano et al.,

1999). In a related context, Muthen (1989, p. 184) speculated

on the reason for researchers’ persistent misapplication of

statistical procedures: “The common problem is that mea-

surement issues and statistical assumptions that are incidental

to the researchers’ conceptual ideas become stumbling blocks

that invalidate the statistical analysis.”

In the randomized classroom trials stage of the model, the

critical units-of-analysis issue can be dealt with through the

inclusion of multiple randomized units (e.g., multiple class-

rooms randomly assigned to intervention and control condi-

tions) in conjunction with the application of statistical

models that are both appropriate and sensitive to the applied

implementation nature of the experiment (e.g., Bryk &

Raudenbush, 1992; Levin, 1992). In the medical and health

ﬁelds, group-randomized intervention trials (Braun & Feng,

2001) have been referred to as cluster randomization trials

(e.g., Donner & Klar, 2000), with the corresponding pitfalls

of inappropriate statistical analyses well documented. The

number of multiple units to be included in a given study is not

a speciﬁed constant. Rather, that number will vary from study

to study as a function of substantive, resource, and unit-based

statistical power considerations (e.g., Barcikowski, 1981;

Levin, 1997a; Levin & Serlin, 1993), as well as of the scope

of curricular policy implications associated with the particu-

lar intervention. In addition, appropriate statistical methods

to accompany multiple-baseline and other “few units per in-

tervention” single-participant designs (alluded to earlier) are

now available (see, e.g., Koehler & Levin, 1998; Levin &

Wampold, 1999; Marascuilo & Busk, 1988; Wampold &

Worsham, 1986).

Two additional critical features of the randomized class-

room trials stage should also be indicated.

Intervention-Effect Robustness

The use of multiple randomized units in the randomized

classroom trials stage permits legitimate intervention-effect

generalizations across classrooms, teachers, and students—

something that is not legitimate in the prototypical interven-

tion study. With the additional feature of random selection of

groups or classrooms within a school, district, or other pop-

ulation, statistical analyses that permit even grander general-

izations are possible (e.g., Bryk & Raudenbush, 1992), a

desirable and deﬁning characteristic of Slavin’s (1997) pro-

posed design competition for instructional interventions.

(A design competition should not be confused with a design

experiment, as has already occurred in the literature. The

critical attributes of the former have been discussed earlier in

this article; those of the latter are discussed in a following

section.) Finally, replication of the randomized classroom

trials stage of the model, across different sites and with dif-

ferent investigators, increases one’s degree of conﬁdence in

the reality, magnitude, and robustness of the intervention ef-

fect. In summary, each of the just-mentioned sampling aug-

mentations of the randomized classroom trials stage can be

considered in relation to enhancing the research’s external

validity.

Interaction Potential

The randomized classroom trials stage lends itself not just to

generalization, but also to speciﬁcity, in the form of determin-

ing whether a particular intervention is better suited to certain

kinds of groups, classrooms, teachers, or students than to oth-

ers. With one-unit-per-intervention and conventional analy-

ses, investigating intervention-by-characteristics interactions

is not possible, or at least not possible without the method-

ological shortcomings and statistical assumption violations

mentioned earlier. Just as different drugs or medical treat-

ments may be expected to affect different patients differently,

different classroom interventions likely have different effects

on students differing in academic ability, aptitude, motiva-

tional levels, or demographic characteristics. The same would

be expected of instructional interventions delivered by teach-

ers with different personal and teaching characteristics. That

is, one size may not ﬁt all (Salomon & Almog, 1998, p. 224),

but that assumption can readily be incorporated into, and in-

vestigated in, the randomized classroom trials stage of inter-

vention research (e.g., Bryk & Raudenbush, 1992; Levin,

1992; Levin & Peterson, 1984); for an actual research exam-

ple, see Copeland (1991). Included in this analytic armament

are adaptations for studying intervention by outcome-measure

interactions, changes in intervention effectiveness over time,

and other large- or small-scale classroom-based multivariate

issues of interest (see also Levin & Wampold, 1999).

What Is Random in Randomized Classroom

Trials Studies?

It is important to clarify exactly what needs to be random and

controlled to yield scientiﬁcally credible unit-based evi-

dence, for we have witnessed substantial confusion among

intervention researchers concerning how to meet standards of

internal, as opposed to external, validity in such studies.

Reiterating that high internal validity alone is what makes an

574

Educational / Psychological Intervention Research

empirical study scientiﬁcally credible, we point out that in

randomized classroom trials research,

• Classrooms and teachers do not need to be randomly se-

lected.

• Participants do not need to be randomly assigned to class-

rooms.

• The only aspect that must be random is the assignment of

candidate units (e.g., groups, classrooms, schools) to the

different intervention conditions, either across all units or in

a matched-unit fashion. By “candidate,” we are referring to

all units for which there is a priori agreement to be included

in the study, which implies accepting the fact that there is an

equal chance of the candidates’ being assigned to any of the

study’s speciﬁed intervention conditions. A “wait-list” or

“crossover” arrangement (e.g., Levin, 1992; Shadish, Cook,

& Campbell, 2002) can also be implemented as a part of the

nontargeted-intervention units’ assignment.

• Scientiﬁcally credible studies based on whole unit random

assignment operations can be performed on targeted par-

ticipant subgroups. For example, classrooms containing

students both with and without learning disabilities could

be randomly assigned to intervention conditions, with the

focus of the study’s interventions being on just the former

student subgroup.

• When either out-of-classroom or unobtrusive within-

classroom interventions can be administered, within-

classroom blocked random assignment of participants to

intervention conditions represents a scientiﬁcally credible

strategy—for an actual research example, see McDonald,

Kratochwill, Levin, and Youngbear Tibbits (1998).

• Even if units are initially assigned to interventions ran-

domly (as just indicated), terminal conditions-composition

differences resulting from participant or group attrition can

undermine the scientiﬁc credibility of the study (see, e.g.,

the Graziano et al., 1999, training study). In such cases,

analyses representing different degrees of conservatism

should be provided, with the hope of obtaining compatible

evidence.

An important addendum is that statistical adjustments and

controls (e.g., analysis of covariance, path models) do not

represent acceptable substitutes for situations in which ran-

dom assignment of classrooms to intervention conditions

cannot be effected. Although this point has been underscored

by statisticians and methodologists for many years (e.g.,

Elashoff, 1969; Huitema, 1980), educational researchers

continue to believe that sophisticated statistical tools can

resurrect data from studies that are inadequately designed

and executed. Muthen (1989) aptly reminded us of that in

quoting Cliff (1983):

[Various multivariate] methods have greatly increased the rigor

with which one can analyze his correlational data, and they solve

many statistical problems that have plagued this kind of data.

However, they solve a much smaller proportion of the interpre-

tational . . . problems that go with such data. These interpreta-

tional problems are particularly severe in those increasingly

common cases where the investigator wishes to make causal

interpretations of his analyses. (Muthen, 1989, p. 185)

When random assignment of units to interventions has been

used, however, the concurrent application of analysis of co-

variance or other multivariate techniques is entirely appropri-

ate and may prove to be analytically advantageous (e.g., Levin

& Serlin, 1993); for actual research examples, see Torgesen,

Morgan, and Davis (1992) and Whitehurst et al. (1994).

Summary

Conducting randomized classroom trials studies is not an

easy task. We nonetheless claim that: (a) randomized experi-

ments are not impossible (or even impractical) to conduct, so

that (b) educational researchers must begin adding these to

their investigative repertoires to enhance the scientiﬁc credi-

bility of their research and research-based conclusions. Class-

room-based research (and its resultant scientiﬁc credibility)

can also be adversely affected by a variety of real-world

plagues, including within-classroom treatment integrity, be-

tween-classroom treatment overlap, and construct validity, as

well as other measurement issues (e.g., Cook & Campbell,

1979; Nye, Hedges, & Konstantopoulos, 1999). In addition, a

variety of external validity caveats—superbly articulated in a

persuasive treatise by Dressman (1999)—must be heeded

when attempting to extrapolate educational research ﬁndings

to educational policy recommendations. There can be no

denying that in contrast to the independent and dependent

variables of the prototypical laboratory experiment, the fac-

tors related to school or classroom outcomes are complex and

multidimensional. Yet, others have argued compellingly that

to understand the variables (and variable systems) that have

implications for social policy, randomized experiments

should, and can, be conducted in realistic ﬁeld settings (e.g.,

Boruch, 1975; Campbell & Boruch, 1975). Here we present a

similar argument for more carefully controlled classroom-

based research on instructional interventions and on other

educational prescriptions.

Download 9.82 Mb.

Do'stlaringiz bilan baham:

1 ... 129 130 131 132 133 134 135 136 ... 153