Measuring student knowledge and skills
Measuring Student Knowledge and Skills
Download 0.68 Mb. Pdf ko'rish
|
measuring students\' knowledge
Measuring Student Knowledge and Skills
34 OECD 1999 the number of distractors increases, as the distractors share more features with the correct response, and as the distractors appear in closer proximity to the correct response. For instance, tasks tend to be judged more difficult when one or more distractors meet some but not all of the conditions specified in the ques- tion and appear in a paragraph or section of text other than the one containing the correct answer. Tasks are judged to be most difficult when two or more distractors share most of the features with the correct response and appear in the same paragraph or node of information as the correct response. d) Response formats Both multiple-choice and constructed-response questions have been used in assessing reading pro- ficiency, but the testing literature provides little guidance as to what strategies or processes are best measured by what formats. As Bennett (1993) noted, “Despite the strong assertions by cognitive theo- rists, the empirical research has afforded only equivocal evidence that constructed-response tasks nec- essarily measure skills fundamentally different from the ones tapped by multiple-choice questions” (p. 8). In particular, Traub’s survey of research on the differences between the two response formats in reading comprehension tests concluded that there was no sign of a strong format effect (Traub, 1993). The empirical literature on format effects, however, is quite limited. Traub’s survey found only two studies, one with college students and one with students in the third grade. Significantly, though, the one with college students (Ward, Dupree and Carlson, 1987) did measure the more complex aspects of comprehension. However, in his presidential address to the American Psychological Association, Frederickson (1984) noted that the real test bias stems from limitations imposed by the sole use of mul- tiple-choice items. In addition, students in some OECD countries may not be familiar with the format of standardised multiple-choice items. Therefore, including a mix of open-ended items will certainly pro- vide a better balance of the types of tasks with which students in classrooms around the world are famil- iar. This balance may also serve to broaden the constructs of what is being measured. There is a great range of constructed-response tasks. Some require little judgement on the marker’s part; these include tasks that ask the reader simply to mark parts of the text to indicate an answer or to list a few words. Others require considerable subjective judgement by markers, as when the reader is asked to summarise a text in his or her own words. Given the lack of strong evidence of a method effect, and advice from item developers, it seems wis- est to include both multiple-choice and constructed-response items in the reading literacy assessment. e) Marking Marking is relatively simple with dichotomously scored multiple-choice items; either the student has chosen the designated response or not. Partial credit models allow for more complex marking of multi- ple-choice items. Here, because some wrong answers are more “correct” than others, students who choose this “almost right” answer receive partial credit. Psychometric models for such polytomous mark- ing are well established and in some ways are preferable to dichotomous scoring as they make use of more of the information contained in the responses. Interpretation of polytomous scores is more com- plex, however, as each task has several locations on the difficulty scale: one for the full answer and others for each of the partially correct wrong answers. Marking is relatively simple with dichotomous constructed-response items, but the specification of correct answers is more difficult. The more students are expected to generate ideas rather than just to identify information in the text, the greater will be the differences among correct answers. Considerable training and monitoring of markers will be required to ensure comparability from marker to marker, even within one country. A balance needs to be found between specificity and openness. If marking guidelines are too specific, then oddly phrased correct answers may be marked as incorrect; if they are too open, then responses and answers that do not fully satisfy the task may be marked as correct. Constructed-response items lend themselves especially to partial credit scoring, though this does add some complexity to the marking process (and to the development of marking guidelines). Partial credit marking also enables the use of a variety of tasks in which one type of response indicates a more complex understanding of the text than another response, yet both are “correct” responses. It is recom- mended that partial credit marking be used, at least for the more complex constructed-response items. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling