Reliability and Validity of Information About Student Achievement: Comparing Large-Scale and Classroom Testing Contexts

2009 ◽  
Vol 48 (1) ◽  
pp. 63-71 ◽  
Author(s):  
Gregory J. Cizek
2013 ◽  
Vol 115 (12) ◽  
pp. 1-35
Author(s):  
Stuart S. Yeh

Background In principle, value-added modeling (VAM) might be justified if it can be shown to be a more reliable indicator of teacher quality than existing indicators for existing low-stakes decisions that are already being made, such as the award of small merit bonuses. However, a growing number of researchers now advocate the use of VAM to identify and replace large numbers of low-performing teachers. There is a need to evaluate these proposals because the active termination of large numbers of teachers based on VAM requires a much higher standard of reliability and validity. Furthermore, these proposals must be evaluated to determine if they are cost-effective compared to alternative proposals for raising student achievement. While VAM might be justified as a replacement for existing indicators (for existing decisions regarding merit compensation), it might not meet the higher standard of reliability and validity required for large-scale teacher termination, and it may not be the most cost-effective approach for raising student achievement. If society devotes its resources to approaches that are not cost-effective, the increase in achievement per dollar of resources expended will remain low, inhibiting reduction of the achievement gap. Objective This article reviews literature regarding the reliability and validity of VAM, then focuses on an evaluation of a proposal by Chetty, Friedman, and Rockoff to use VAM to identify and replace the lowest-performing 5% of teachers with average teachers. Chetty et al. estimate that implementation of this proposal would increase the achievement and lifetime earnings of students. The results appear likely to accelerate the adoption of VAM by school districts nationwide. The objective of the current article is to evaluate the Chetty et al. proposal and the strategy of raising student achievement by using VAM to identify and replace low-performing teachers. Method This article analyzes the assumptions of the Chetty et al. study and the assumptions of similar VAM-based proposals to raise student achievement. This analysis establishes a basis for evaluating the Chetty et al. proposal and, in general, a basis for evaluating all VAM-based policies to raise achievement. Conclusion VAM is not reliable or valid, and VAM-based polices are not cost-effective for the purpose of raising student achievement and increasing earnings by terminating large numbers of low-performing teachers.


2020 ◽  
Vol 102 (3) ◽  
pp. 42-45
Author(s):  
Kristin E. Harbour ◽  
Evthokia Stephanie Saclarides

To support continuous professional development model in the teaching and learning of mathematics, many districts and schools have begun hiring elementary mathematics coaches and/or specialists (MCSs). However, limited large-scale empirical research exists that determines how the use of MCSs affect student learning and achievement. Kristin E. Harbour and Evthokia Stephanie Saclarides begin to fill in this gap by using data from the National Assessment of Educational Progress to explore the relationship between the presence and responsibilities of elementary MCSs and 4th-grade student achievement in mathematics. Based on their findings, they share practical implications for districts and administrators to consider.


2020 ◽  
Vol 7 (2) ◽  
pp. 1-19
Author(s):  
Mubashir Ali Khan ◽  
Zaibunnisa Khan

The aim of this pilot study is to test the reliability and validity of the survey instrument designed to measure the residents’ support for tourism. Since the study uses an adapted questionnaire the need to assess the reliability and validity appears to be desirable. The questionnaire was distributed to altogether 70 residents of Huna Valley. Initially the content and face validity was authenticated by field experts and later on the internal construct validity was calculated through various measures. Hence inter-item correlation shows that all the variables are correlated to each other at significant level. Secondly, construct validity results show that all the constructs used by study are reliable and met the level of acceptability. Therefore, the results validated that modified instrument is valid and reliable in the context of the social lab selected i.e. residents of Huna Velly and a full large scale study can be carried out using this instrument.


2019 ◽  
Vol 44 (6) ◽  
pp. 752-781
Author(s):  
Michael O. Martin ◽  
Ina V.S. Mullis

International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement’s Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development’s Program for International Student Assessment that have come to prominence over the past 25 years owe a great deal in methodological terms to pioneering work by National Assessment of Educational Progress (NAEP). Using TIMSS as an example, this article describes how a number of core techniques, such as matrix sampling, student population sampling, item response theory scaling with population modeling, and resampling methods for variance estimation, have been adapted and implemented in an international context and are fundamental to the international assessment effort. In addition to the methodological contributions of NAEP, this article illustrates how the large-scale international assessments go beyond measuring student achievement by representing important aspects of community, home, school, and classroom contexts in ways that can be used to address issues of importance to researchers and policymakers.


2019 ◽  
Vol 36 (4) ◽  
pp. 553-572 ◽  
Author(s):  
Suzanne Kleijn ◽  
Henk Pander Maat ◽  
Ted Sanders

Although there are many methods available for assessing text comprehension, the cloze test is not widely acknowledged as one of them. Critiques on cloze testing center on its supposedly limited ability to measure comprehension beyond the sentence. However, these critiques do not hold for all types of cloze tests; the particular configuration of a cloze determines its validity. We review various cloze configurations and discuss their strengths and weaknesses. We propose a new cloze procedure specifically designed to gauge text comprehension: the Hybrid Text Comprehension cloze (HyTeC-cloze). It employs a hybrid mechanical-rational deletion strategy and semantic scoring of answers. The procedure was tested in a large-scale study, involving 2926 Dutch secondary school students with 120 unique cloze tests. Our results show that, in terms of reliability and validity, the HyTeC-cloze matches and sometimes outperforms standardized tests of reading ability.


Assessment ◽  
2017 ◽  
Vol 26 (8) ◽  
pp. 1427-1443 ◽  
Author(s):  
Tanya L. Hopwood ◽  
Nicola S. Schutte ◽  
Natasha M. Loi

Two studies, with a total of 707 participants, developed and examined the reliability and validity of a measure for anticipatory traumatic reaction (ATR), a novel construct describing a form of distress that may occur in response to threat-related media reports and discussions. Exploratory and confirmatory factor analysis resulted in a scale comprising three subscales: feelings related to future threat; preparatory thoughts and actions; and disruption to daily activities. Internal consistency was .93 for the overall ATR scale. The ATR scale demonstrated convergent validity through associations with negative affect, depression, anxiety, stress, neuroticism, and repetitive negative thinking. The scale showed discriminant validity in relationships to Big Five characteristics. The ATR scale had some overlap with a measure of posttraumatic stress disorder, but also showed substantial separate variance. This research provides preliminary evidence for the novel construct of ATR as well as a measure of the construct. The ATR scale will allow researchers to further investigate anticipatory traumatic reaction in the fields of trauma, clinical practice, and social psychology.


2017 ◽  
Vol 11 (3) ◽  
pp. 10 ◽  
Author(s):  
Kirsti Klette ◽  
Marte Blikstad-Balas ◽  
Astrid Roe

AbstractEducational research into instructional quality would benefit from macro- and meso-level instructional data – such as achievement data or large-scale student surveys – in relation to data from the micro level – such as detailed analyses of classroom practices. Several scholars have specifically asked for studies that correlate achievement data with records of learning processes and teaching strategies, and ongoing projects attempting to do so have shown promising results. Linking different data sources on instructional quality is quite demanding because it requires a concerted effort by researchers from different fields of expertise and different traditions. A main ambition of our ongoing research project is precisely to advance such integration. As the title of the project reveals, we are dedicated to Linking Instruction and Student Achievement (LISA). In this article, we start by providing a theoretical background and status of knowledge related to instructional quality. We go on to argue that video data has shown particular promise in studies aiming to obtain systematic data from a range of classrooms in order to compare classroom practices. We then present the three components of the LISA project’s design – student perception surveys, systematic classroom observation, and achievement gains in national tests – and the value of combining these three data sources. Finally, we will outline some of our findings thus far and point to future research possibilities.Key words: instructional quality; classroom practices; video studies; mathematics; language arts Å koble undervisning med elevprestasjoner - Forskningsdesign for en ny generasjon klasseromsstudierSammendragFor å studere undervisningskvalitet vil det være en fordel å kombinere data fra et makro og meso- nivå  med detaljerte studier av hva som skjer i klasserommet. Flere har etterlyst studier som ser på sammenhenger mellom målbar faglig fremgang og lærerens undervisning. Å få til slike studier er krevende, da det forutsetter et tett samarbeid mellom forskere fra ulike felt med ulik ekspertise innenfor nokså ulike forskningstradisjoner. En hovedambisjon i vårt pågående forskningsprosjekt er nettopp å få til en slik integrasjon. Som tittelen avslører, er vi dedikert til «Linking Instruction and Student Achievement (LISA)». I denne artikkelen presenterer vi det teoretiske og empiriske grunnlaget knyttet til undervisningskvalitet. Videre argumenterer vi for verdien av videodata i studier som sammenligner undervisningspraksiser fra ulike klasserom på en systematisk måte. Deretter presenterer vi de tre datakildene i LISA-prosjektets forskningsdesign – spørreskjemaer til elever om deres oppfatninger om lærerens undervisning, systematiske klasseromsobservasjoner, og målt fremgang på nasjonale prøver i lesing og regning. Verdien av å kombinere nettopp disse tre datakildene vil også bli diskutert. Avslutningsvis deler vi noen av våre tidlige forskningsfunn.Nøkkelord: undervisningskvalitet; klasseromspraksis; video studier; matematikk; norskfaget


Author(s):  
Pāvels Pestovs ◽  
Dace Namsone

Latvia is undergoing a nation-wide curriculum reform in general education, with an aim to help students to develop 21st century skills. In order to successfully implement reform, not only teacher performance in the classroom is important, but also the transformation of the school culture is of high priority. One of the key dimensions that is characteristic for a school as learning organization culture is whether it has data-driven culture and is using data on continuous basis to improve student achievement. Large scale national level assessment data is used for many different purposes, however, this data only rarely is recognised as useful data source for planning actions to improve student achievement at school level. Authors argue that in different grades average performance of students cannot be compared in a meaningful way to develop action plan and evaluate the impact of the initiatives at the school level. It is based on the issues rising from varying difficulty level of the tests and different skills, which are being assessed. The study design is based on in-depth analysis of items of large-scale national level assessment in mathematics, defining minimum level of competency of mathematics and calculating percentage of students in school with minimum level of competence in a cohort. This analysis is conveyed for the students of 3rd, 6th and 9th grade by using Rasch model, thus allowing to effectively monitor the student performance during the general education and use of data to make informed decisions.  


Sign in / Sign up

Export Citation Format

Share Document