Handbook of psychology volume 7 educational psychology
A Stage Model of Educational /Psychological
Download 9.82 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- Enhancing the Credibility of Intervention Research 571
- Analytic Appropriateness
- Enhancing the Credibility of Intervention Research 573
- Intervention-Effect Robustness
- What Is Random in Randomized Classroom Trials Studies
A Stage Model of Educational /Psychological Intervention Research Our vision of how to close one of intervention research’s undamental credibility gaps, while at the same time better informing practice, is presented in Figure 22.1’s stage model of educational/psychological intervention research. In con- trast to currently popular modes of intervention-research in- quiry and reporting, the present conceptualization (a) makes explicit different research stages, each of which is associated with its own assumptions, purposes, methodologies, and stan- dards of evidence; (b) concerns itself with research credibility through high standards of internal validity; (c) concerns itself with research creditability through high standards of external 570 Educational / Psychological Intervention Research validity and educational/societal importance; and, most sig- nificantly, (d) includes a critical stage that has heretofore been missing in the vast majority of intervention research, namely, a randomized classroom trials link (modeled after the clinical trials stage of medical research) between the initial develop- ment and limited testing of the intervention and the prescrip- tion and implementation of it. Alternatively, Stage 3 could be referred to as an instructional trials stage or, more generically, as an educational trials stage. To simplify matters, for the re- mainder of the chapter we continue to refer to Stage 3 as the randomized classroom trials stage of credible intervention re- search studies. Stages 1 and 2 of the Figure 22.1 model are likely very familiar to readers of this chapter, as studies in those tradi- tions comprise the vast majority of intervention research as we know it. In addition, throughout the chapter we have pro- vided details of the two Stage 2 components of the model in our consideration of the research-first (controlled laboratory experiments) versus practice-first (case studies, demonstra- tions, and design experiments) perspectives. Both controlled laboratory experiments and applied studies are preliminary, though in different complementary senses. The former are preliminary in that their careful scrutiny of interventions lacks an applied-implementation component, whereas the lat- ter are preliminary in that their intervention prescriptions are often not founded on scientifically credible evidence. Stage 1 and Stage 2 studies are crucial to developing an understand- ing of the phenomena that inform practice (Stage 4) but that first must be rigorously, complexly, and intelligently evalu- ated in Stage 3. Failure to consider possibilities beyond Stages 1 and 2 may result in a purposelessness to research, a temptation never to go beyond understanding a phenomenon and determining whether it is a stable phenomenon with gen- uine practice implications. The accumulation of applied, sci- entifically credible evidence is precisely the function of the randomized classroom trials stage (Stage 3, highlighted in Figure 22.1) of the model. As in medical research, this process consists of an examination of the proposed treatment or intervention under realistic, yet carefully controlled, con- ditions (e.g., Angell & Kassirer, 1998). Stage 1
Stage 2 Randomized classroom trials studies Stage 3 Preliminary ideas, hypotheses, observations, and pilot work Controlled laboratory experiments Classroom-based demonstrations and design experiments Informed classroom practice Stage 4
Stage Model of Educational/Psychological Intervention Research. Source: From Levin & O’Donnell (1999). Enhancing the Credibility of Intervention Research 571 “Realistic conditions” refer to the specific populations and contexts about which one wishes to offer conclusions regard- ing treatment efficacy (i.e., external validity desiderata). In medical research the conditions of interest generally include humans (rather than animals), whereas in psychological and educational research the conditions of interest generally in- clude children in community settings and school classrooms (rather than isolated individuals). In addition, in both medical and psychological/educational contexts, the interventions (e.g., drugs or instructional methods, respectively) must be administered in the appropriate fashion (dosage levels or in- structional integrity, respectively) for a long enough duration for them to have effect and to permit the assessment of both the desired outcome (e.g., an improved physical or social- academic condition, respectively) and any unwanted side effects (adverse physical, cognitive, affective, or behavioral consequences). In a classroom situation, an appropriately implemented instructional intervention of at least one semes- ter, or even one year, in duration would be expected to satisfy the “long enough” criterion. “Carefully controlled conditions” refer to internally valid experiments based on the random assignment of multiple in- dependent “units” to alternative treatment-intervention condi- tions. Again, in medical research the randomized independent units are typically humans, whereas in educational interven- tion research the randomized independent units are frequently groups, classrooms, or schools (Levin, 1992, 1994). As with medical research, careful control additionally involves design safeguards to help rule out contributors to the effects other than the targeted intervention, such as including appropriate alternative interventions, incorporating blind and double- blind intervention implementations (to the extent possible) so that child, teacher, therapist, and researcher biases are elimi- nated, and being responsive to all other potential sources of experimental internal invalidity (Campbell & Stanley, 1966; Shadish, Cook, & Campbell, 2002). The randomized classroom trials stage of this model is sensitive to each of the earlier indicated CAREful research components, in that (a) the inclusion of alternative interven- tions (including appropriately packaged standard methods or placebos) permits meaningful Comparison when assessing the effects of the targeted intervention; (b) the use of multi- ple independent units (both within a single study and, ideally, as subsequent replication studies) permits general- ization through the specified outcomes being produced Again and again; and (c) with across-unit randomization of interventions (and assuming adequate control and appropri- ate implementation of them), whatever Relationship is found between the targeted intervention and the specified outcomes can be traced directly to the intervention because (d) with such randomization, control, and implementation, one is bet- ter able to Eliminate all other potential explanations for the outcomes. The randomized classroom trials stage of our proposed model possesses a number of critical features that are worth mentioning. These features represent the best of what CARE- fully controlled and well-executed laboratory-based research has to offer applied and clinical research. First and foremost here is the inclusion of multiple units (or in single-participant research designs, multiple phases and within-phase observa- tions per unit; see, e.g., Kratochwill & Levin, 1992) that are randomly assigned to receive either the targeted intervention or an acceptable alternative. For example, when classrooms are the units of analysis, the use of multiple independent classrooms is imperative for combating evidence-credibility concerns arising from both methodological and statistical features of the research. Each of these will be briefly consid- ered here (for additional discussion, see Campbell & Boruch, 1975; Levin, 1985, 1992, 1994; Levin & Levin, 1993). Methodological Rigor Consider some examples from educational research to contex- tualize our perspectives on methodological rigor. In a typical instructional intervention study, the participants in one class- room receive new instructional methods or materials (includ- ing combinations of these, multicomponent versions, and systemic curricular innovations), whereas those in another classroom receive either alternative or standard instructional methods/materials/curricula. One does not have to look very hard to find examples of this type of study in the intervention research literature, as it is pervasive. The aforementioned Graziano et al. (1999) training study is an example of this methodological genre. The problem with such studies is that any resultant claims about intervention-produced outcomes are not credible because whatever effects are observed can be plausibly attributed to a myriad of other factors not at all connected with the intervention. In studies where there is only one classroom/teacher per intervention, for example, any po- tential intervention effects are inextricably confounded with classroom/teacher differences—even if “equivalence” can be demonstrated on a pretest. If students are not randomly as- signed to classrooms and classrooms to interventions, inter- vention effects are confounded with selection biases as well. Indeed, as far as credible evidence is concerned, a reasonable case can be made that a “one classroom per intervention” study is just that—an individual case. Accordingly, one-classroom- per-intervention cases fall into our earlier discussion of intervention research that in actuality is a classroom-based demonstration.
572 Educational / Psychological Intervention Research With the addition of sequential modifications of the instruc- tional intervention, the previously discussed design experi- ment also resembles the one-classroom-per-intervention prototype. Minor variations of that prototype include assigning a couple classrooms to each intervention condition (e.g., Brown, 1992) or having one or a few teachers alternately implement both interventions in a few classrooms (e.g., Collins, 1992). Unfortunately, methodological and statistical concerns (related to nonrandomization; contaminating teacher, student, classroom, and researcher effects; and inappropriate units of analysis, among others), analogous to the ones raised here, are associated with such variations as well. Recent methodological and statistical developments out of the behav- ior-analytic and clinical research traditions do, however, have the potential to enhance the scientific credibility of the one- or-few-classrooms-per-intervention study (e.g., Koehler & Levin, 1998; Kratochwill & Levin, 1992; Levin & Wampold, 1999) and, therefore, should be given strong consideration in classroom-based and other intervention studies. Unfortunately, adding
the sequential intervention- modification strategy of design experiments serves only to add confounding variables to the interpretive mix. Although some may regard confounding the effect of an intervention with other variables to be acceptable in a design experiment—“Our interventions are deliberately designed to be multiply con- founded” (Brown, 1992, p. 167)—confoundings of the kind described here clearly are not acceptable in the classroom tri- als stage of educational intervention research. In Stage 3 of the model, the random assignment of multiple classrooms or other intact groups to interventions serves to counteract this methodological concern; for actual research examples, see Byrne and Fielding-Barnsley (1991); Duffy et al. (1987); and Stevens, Slavin, and Farnish (1991). Consistent with the earlier presented Comparison compo- nent of CAREful research, the need for including appropriate comparison classrooms (or other aggregates) is of paramount importance in the Stage 3 model. As Slavin (1999) forcefully pointed out in response to a critic advocating the documenta- tion of an intervention’s effectiveness not by a comparison with a nonintervention control condition but through the pre- sentation of what seem to be surprising outcomes in the intervention condition, An experimental-control comparison between well-matched (or, ideally, randomly assigned) participants is to be able to provide powerful evidence for or against a causal relationship [attributable to the intervention], because the researcher estab- lishes the experimental and control groups in advance, before the results are known, and then reports relative posttests or gains. In contrast, [the critic’s] search for “surprising” scores or gains begins after the fact, when the results are already known. This cannot establish the effect of a given program on a given outcome; any of a thousand other factors other than the treat- ment could explain high scores in a given school in a given year. . . . If an evaluation has data on 100 schools implementing a given program but only reports on the 50 that produced the most positive scores, it is utterly meaningless. In contrast, a comparison of 10 schools to 10 well-matched control schools provides strong evidence for or against the existence of a pro- gram impact. If that experimental-control comparison is then replicated elsewhere in a series of small but unbiased studies, the argument for a causal relationship is further strengthened. (Slavin, 1999, pp. 36–37) Slavin’s hypothetical example should evoke readers’ memories of the perils and potential for deception that are in- herent in the examine aspect of the ESP model of educational intervention research. The example also well illustrates the adapted adage: A randomized experiment is worth more than 100 school demonstrations!
Early and often in the history of educational research, much has been written on the inappropriateness of researchers’ sta- tistically analyzing the effects of classroom-implemented interventions as though the interventions had been indepen- dently administered to individual students (e.g., Barcikowski, 1981; Levin, 1992; Lindquist, 1940; Page, 1965; Peckham, Glass, & Hopkins, 1969). That is, there is a profound mis- match between the units of intervention administration (groups, classrooms) and the units of analysis (children, stu- dents) and conducting child/student-level statistical analyses in such situations typically results in a serious misrepresenta- tion of both the reality and the magnitude of the intervention effect. [As an interesting aside, units of analysis is another term with a specific statistical meaning that is now being ca- sually used in the educational research literature to refer to the researcher’s substantive grain-size perspective: the indi- vidual student, the classroom collective, the school, the com- munity, etc. (see, e.g., Cobb & Bowers, 1999, pp. 6–8).] Consider, for example, a hypothetical treatment study in which one classroom of 20 students receives a classroom management instructional intervention and another class- room of 20 students receives standard classroom protocol. It is indisputably incorrect to assess the intervention effect in that study on the basis of a conventional student-level t test, analysis of variance, chi-square test, or other statistical pro- cedures that assume that 40 independently generated student outcomes comprise the data. Analyzing the data in that fash- ion will produce invalid results and conclusions.
Enhancing the Credibility of Intervention Research 573 Even today, most “one group per intervention” (or even “a couple groups per intervention”) researchers continue to adopt units-inappropriate analytic practices, in spite of the earlier noted cautions and evidence that such practices lead to dangerously misleading inferences (e.g., Graziano et al., 1999). In a related context, Muthen (1989, p. 184) speculated on the reason for researchers’ persistent misapplication of statistical procedures: “The common problem is that mea- surement issues and statistical assumptions that are incidental to the researchers’ conceptual ideas become stumbling blocks that invalidate the statistical analysis.” In the randomized classroom trials stage of the model, the critical units-of-analysis issue can be dealt with through the inclusion of multiple randomized units (e.g., multiple class- rooms randomly assigned to intervention and control condi- tions) in conjunction with the application of statistical models that are both appropriate and sensitive to the applied implementation nature of the experiment (e.g., Bryk & Raudenbush, 1992; Levin, 1992). In the medical and health fields, group-randomized intervention trials (Braun & Feng, 2001) have been referred to as cluster randomization trials (e.g., Donner & Klar, 2000), with the corresponding pitfalls of inappropriate statistical analyses well documented. The number of multiple units to be included in a given study is not a specified constant. Rather, that number will vary from study to study as a function of substantive, resource, and unit-based statistical power considerations (e.g., Barcikowski, 1981; Levin, 1997a; Levin & Serlin, 1993), as well as of the scope of curricular policy implications associated with the particu- lar intervention. In addition, appropriate statistical methods to accompany multiple-baseline and other “few units per in- tervention” single-participant designs (alluded to earlier) are now available (see, e.g., Koehler & Levin, 1998; Levin & Wampold, 1999; Marascuilo & Busk, 1988; Wampold & Worsham, 1986). Two additional critical features of the randomized class- room trials stage should also be indicated. Intervention-Effect Robustness The use of multiple randomized units in the randomized classroom trials stage permits legitimate intervention-effect generalizations across classrooms, teachers, and students— something that is not legitimate in the prototypical interven- tion study. With the additional feature of random selection of groups or classrooms within a school, district, or other pop- ulation, statistical analyses that permit even grander general- izations are possible (e.g., Bryk & Raudenbush, 1992), a desirable and defining characteristic of Slavin’s (1997) pro- posed design competition for instructional interventions. (A design competition should not be confused with a design experiment, as has already occurred in the literature. The critical attributes of the former have been discussed earlier in this article; those of the latter are discussed in a following section.) Finally, replication of the randomized classroom trials stage of the model, across different sites and with dif- ferent investigators, increases one’s degree of confidence in the reality, magnitude, and robustness of the intervention ef- fect. In summary, each of the just-mentioned sampling aug- mentations of the randomized classroom trials stage can be considered in relation to enhancing the research’s external validity.
The randomized classroom trials stage lends itself not just to generalization, but also to specificity, in the form of determin- ing whether a particular intervention is better suited to certain kinds of groups, classrooms, teachers, or students than to oth- ers. With one-unit-per-intervention and conventional analy- ses, investigating intervention-by-characteristics interactions is not possible, or at least not possible without the method- ological shortcomings and statistical assumption violations mentioned earlier. Just as different drugs or medical treat- ments may be expected to affect different patients differently, different classroom interventions likely have different effects on students differing in academic ability, aptitude, motiva- tional levels, or demographic characteristics. The same would be expected of instructional interventions delivered by teach- ers with different personal and teaching characteristics. That is, one size may not fit all (Salomon & Almog, 1998, p. 224), but that assumption can readily be incorporated into, and in- vestigated in, the randomized classroom trials stage of inter- vention research (e.g., Bryk & Raudenbush, 1992; Levin, 1992; Levin & Peterson, 1984); for an actual research exam- ple, see Copeland (1991). Included in this analytic armament are adaptations for studying intervention by outcome-measure interactions, changes in intervention effectiveness over time, and other large- or small-scale classroom-based multivariate issues of interest (see also Levin & Wampold, 1999). What Is Random in Randomized Classroom Trials Studies? It is important to clarify exactly what needs to be random and controlled to yield scientifically credible unit-based evi- dence, for we have witnessed substantial confusion among intervention researchers concerning how to meet standards of internal, as opposed to external, validity in such studies. Reiterating that high internal validity alone is what makes an
574 Educational / Psychological Intervention Research empirical study scientifically credible, we point out that in randomized classroom trials research, • Classrooms and teachers do not need to be randomly se- lected. • Participants do not need to be randomly assigned to class- rooms. • The only aspect that must be random is the assignment of candidate units (e.g., groups, classrooms, schools) to the different intervention conditions, either across all units or in a matched-unit fashion. By “candidate,” we are referring to all units for which there is a priori agreement to be included in the study, which implies accepting the fact that there is an equal chance of the candidates’ being assigned to any of the study’s specified intervention conditions. A “wait-list” or “crossover” arrangement (e.g., Levin, 1992; Shadish, Cook, & Campbell, 2002) can also be implemented as a part of the nontargeted-intervention units’ assignment. • Scientifically credible studies based on whole unit random assignment operations can be performed on targeted par- ticipant subgroups. For example, classrooms containing students both with and without learning disabilities could be randomly assigned to intervention conditions, with the focus of the study’s interventions being on just the former student subgroup. • When either out-of-classroom or unobtrusive within- classroom interventions can be administered, within- classroom blocked random assignment of participants to intervention conditions represents a scientifically credible strategy—for an actual research example, see McDonald, Kratochwill, Levin, and Youngbear Tibbits (1998). • Even if units are initially assigned to interventions ran- domly (as just indicated), terminal conditions-composition differences resulting from participant or group attrition can undermine the scientific credibility of the study (see, e.g., the Graziano et al., 1999, training study). In such cases, analyses representing different degrees of conservatism should be provided, with the hope of obtaining compatible evidence. An important addendum is that statistical adjustments and controls (e.g., analysis of covariance, path models) do not represent acceptable substitutes for situations in which ran- dom assignment of classrooms to intervention conditions cannot be effected. Although this point has been underscored by statisticians and methodologists for many years (e.g., Elashoff, 1969; Huitema, 1980), educational researchers continue to believe that sophisticated statistical tools can resurrect data from studies that are inadequately designed and executed. Muthen (1989) aptly reminded us of that in quoting Cliff (1983): [Various multivariate] methods have greatly increased the rigor with which one can analyze his correlational data, and they solve many statistical problems that have plagued this kind of data. However, they solve a much smaller proportion of the interpre- tational . . . problems that go with such data. These interpreta- tional problems are particularly severe in those increasingly common cases where the investigator wishes to make causal interpretations of his analyses. (Muthen, 1989, p. 185) When random assignment of units to interventions has been used, however, the concurrent application of analysis of co- variance or other multivariate techniques is entirely appropri- ate and may prove to be analytically advantageous (e.g., Levin & Serlin, 1993); for actual research examples, see Torgesen, Morgan, and Davis (1992) and Whitehurst et al. (1994). Summary Conducting randomized classroom trials studies is not an easy task. We nonetheless claim that: (a) randomized experi- ments are not impossible (or even impractical) to conduct, so that (b) educational researchers must begin adding these to their investigative repertoires to enhance the scientific credi- bility of their research and research-based conclusions. Class- room-based research (and its resultant scientific credibility) can also be adversely affected by a variety of real-world plagues, including within-classroom treatment integrity, be- tween-classroom treatment overlap, and construct validity, as well as other measurement issues (e.g., Cook & Campbell, 1979; Nye, Hedges, & Konstantopoulos, 1999). In addition, a variety of external validity caveats—superbly articulated in a persuasive treatise by Dressman (1999)—must be heeded when attempting to extrapolate educational research findings to educational policy recommendations. There can be no denying that in contrast to the independent and dependent variables of the prototypical laboratory experiment, the fac- tors related to school or classroom outcomes are complex and multidimensional. Yet, others have argued compellingly that to understand the variables (and variable systems) that have implications for social policy, randomized experiments should, and can, be conducted in realistic field settings (e.g., Boruch, 1975; Campbell & Boruch, 1975). Here we present a similar argument for more carefully controlled classroom- based research on instructional interventions and on other educational prescriptions. Download 9.82 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling