Content introduction chapter-i assessing learner's writing skills according to cefr scales
Download 113.49 Kb.
|
Content introduction chapter-i assessing learner\'s writing skill
CONCLUSION
In this article we have analyzed rating data from a large-scale standards-based writing assessment study in Germany. The tasks were developed as level-specific tasks with reference to the lower five proficiency levels of the CEFR, which is the primary document upon which the German Educational Standards are based. Our analyses of the data with descriptive statistics, g-theory, and multifaceted Rasch modeling provide a consistent narrative about the quality of the tasks, the assessment instruments, and the rater training as well as the hierarchical order of task difficulties and distribution of student proficiencies. The level-specific task and rating approach allows for a transparent assessment, from designing tasks to rating performances to reporting on CEFR levels, an advantage to other, multilevel approaches (multilevel either in their task or rating design). In relation to the design factors criteria and raters, our analyses suggest that the rater training and the application of the detailed rating scale and benchmark texts effectively eliminated many differences between how the raters utilized the rating criteria across the different tasks, which speaks positively to the rater training and the assessment instruments in general. It was indispensable to select and train raters appropriately and to continually revise rating scales on the basis of incoming data from pretrials conducted between workshops. This also allowed for the careful selection of benchmark texts, which can be used for detailed discussions and comments so that raters can interpret and apply the rating levels and the detailed analytic criteria in a comparable way. This is pivotal to gain reliable and valid ratings which form the basis of inferences of task difficulty and proficiency estimates. On the basis of our findings, we suggest an integrative, iterative, and data-driven approach to assessment design and rater training for standards-based writing assessment. Although providing analytic ratings did not add much informative value as far as the scaling of the data is concerned, it was nonetheless the detailed analytic approach that ensured the high consistency of the overall rater performance. This is why we would recommend such a detailed approach even if only one overall score is to be reported. We would argue that the level-specific tasks were generally designed appropriately for a coarse differentiation of performance in that they seemed to elicit ranges of student responses that could be used to distinguish target-level performance from below-target-level performance. However, some of the tasks were somewhat too easy or too difficult for the student samples, suggesting that some tasks need to be better matched to the student samples based on the lessons learned in this study. Our analyses showed one possible way to derive reliable task difficulty estimates, the hierarchical order of which showed a high level of correspondence with the targeted difficulty level. Based on these findings, we could suggest possible regions where cut-scores in alignment with the CEFR proficiency levels could be set. This is the basis to set cut-scores operationally, using test-centered, consensus-based standard-setting procedures whereby the judges rate the tasks in terms of their targeted CEFR level, based on an analysis of task demands and task characteristics. This in turn represents the validation of the a priori CEFR-level ratings by the test developers. The empirical results presented in this study are currently combined with results from such formal, consensus-based standard-setting procedures using the Bookmark method and a novel adaptation of it, whereby the representativeness of the tasks in terms of their targeted and preclassified CEFR level was judged to confirm the targeted level or revise the task. As for future research in this area, an examinee-centered standard-setting method is also planned to link the rating scale and the benchmark texts to the CEFR levels by using expert ratings and the assessment grid suggested in the Manual. Furthermore, modern methods from latent class analysis that statistically determine cut-scores could be used in the future to provide additional empirical data that can complement the consensual approaches. Download 113.49 Kb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling