Det första nationella provet i samhällskunskap - en studie i bedömarsamstämmighet

Arne Löfstedt

doi:10.5617/adno.6283

Det första nationella provet i samhällskunskap - en studie i bedömarsamstämmighet

Acta Didactica Norge ◽

10.5617/adno.6283 ◽

2018 ◽

Vol 12 (4) ◽

pp. 13

Author(s):

Arne Löfstedt

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Ninth Graders ◽

National Agency ◽

Final Grade ◽

The Subject ◽

And Mathematics ◽

The Stability ◽

National Tests ◽

National Test

Skolämnet samhällskunskap som eget ämne existerar i princip enbart i de nordiska länderna. I många andra länder delar flera skolämnen på ämnesinnehållet, till exempel geografi och civics. Ämnesinnehållet är stort och genomgår ständig förändring. År 2013 genomfördes de första nationella proven i samhällskunskap i Sverige för årskurs 9. Med tanke på ämnets karaktär kan det vara speciellt viktigt att undersöka om dessa prov är ”rättvisa.” Avsikten med denna studie är att undersöka en aspekt av denna ”rättvisa”, nämligen interbedömarstabilitet, dvs om samma elevsvar ger upphov till samma bedömning oavsett bedömare. Skolverket i Sverige genomförde 2009 en större studie av de ämnen som då genomförde nationella prov och föreliggande studie försöker dels efterlikna och dels bygga ut upplägget från Skolverket. Studien genomfördes på de första nationella proven i samhällskunskap 2013. Genom att pröva olika reliabilitetsmått inom kategorierna ”consensus estimates”, och ”consistency estimates” analyseras resultaten, bland annat diskuteras måttet intraclass correlation. Syftet är också, då detta var de första proven, att skapa en ram för återkommande studier av Interbedömarreliabilitet. Upplägget med en större mängd lärare som genomför totalt tre bedömningar av de utvalda hela proven försöker också efterlikna bedömningssituationen ute på skolorna såtillvida att det var relativt många lärare med i studien, och de kom från olika skolor spridda över Sverige. Genom detta testas också bedömningsanvisningarnas stabilitet. Själva genomförandet var omfattande och tog två hela dagar. Resultaten pekar på en god överensstämmelse för provbetyget, det sammanfattande omdöme eleverna får. Studien avses att återupprepas under kommande år.Nyckelord: Samhällskunskap, nationella prov, interbedömarreliabilitet, intraclass correlationThe first national test in samhällskunskap – a study of interrater reliabilityAbstractThe Swedish school subject Samhällskunskap (Societal knowledge) exists basically only in the Nordic countries. In other countries a number of different subjects, such as geography and civics, share the content. The content of the subject is constantly changing, depending on how society is changing. The first national tests in Samhällskunskap for all Swedish ninth graders took place in 2013. A large part of the test contains constructed responses. Given the characteristics of the subject we consider it especially important to investigate whether these tests are “fair” or not. The intent of this study is to investigate one aspect of “fairness”, interrater reliability, meaning the degree to which the same student responses are scored equally by different raters. In 2009, the National Agency of Education in Sweden conducted a large study of the subjects Swedish, English and Mathematics. Our study aims to mimic and further develop the design of the study from 2009. Our study was carried out on the first national tests in 2013. The results were analyzed by exploring different reliability measures within the categories consensus estimates, and consistency estimates. As the 2013 tests were the first tests of its kind in Sweden the purpose was also to create a framework for regular studies of interrater reliability. The rater design with a relatively large number of teachers from all over the country, each assessing a total of three complete student test responses aimed at mimicking the way the tests are assessed in schools. This also allowed us to study the stability of our assessment rubrics. The study itself was extensive and took two days to perform. The results indicate a large compliance when it comes to the final grade of the test. The study is meant to be repeated in the coming years.Keywords: Social science, civics, national testing, interrater reliability, intraclass correlation

Download Full-text

The Mechanism of Creativity: Why We Discover the New

Вопросы философии ◽

10.21146/0042-8744-2021-9-82-89 ◽

2021 ◽

pp. 82-89

Author(s):

Diana B. Bogoyavlenskaya ◽

Keyword(s):

Sample Size ◽

Experimental Model ◽

Creative Thinking ◽

Cognitive Activity ◽

Fundamental Difference ◽

Traditional Model ◽

The Subject ◽

Productive Thinking ◽

And Mathematics ◽

The Stability

The article presents an attempt to substantiate the mechanism of creativity as a development of the activity on its own initiative, the emergence and development of which is the highest level of human cognitive activity. The author correlates this understanding of creativity with the phenomenon of giftedness and sees the fundamental difference between creative thinking and productive thinking – the latter does not go beyond the solution of the proposed problems and consists in mastering the required algorithm for this. The article describes in detail the method developed for research on the development of cognition in humans, and the experimental model, which differs from the traditional model of "stimulus-response", which, according to the author, allows us to observe the phenomenon of creativity as an activity initiated by the subject himself. Such an experiment was conducted by the author and his followers for half a century on the same subjects: first they were students of the physics and mathematics school, then graduates of universities and institutes, and eventually became scientists, teachers or employees of firms. As a proof of the validity and prognosticality of the method, the author considers not the sample size, but the stability of the indicators of creativity of the participants of the experiment shown in the diagnosis over half a century.

Download Full-text

Evaluation of App-Embedded Disease Scales for Aiding Visual Severity Estimation of Cercospora Leaf Spot of Table Beet

Plant Disease ◽

10.1094/pdis-10-18-1718-re ◽

2019 ◽

Vol 103 (6) ◽

pp. 1347-1356 ◽

Cited By ~ 5

Author(s):

Emerson M. Del Ponte ◽

Scot C. Nelson ◽

Sarah J. Pethybridge

Keyword(s):

Correlation Coefficient ◽

Leaf Spot ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Concordance Correlation Coefficient ◽

Concordance Correlation ◽

Rating Systems ◽

Cercospora Leaf Spot ◽

Severity Estimation ◽

The Subject

Two diagrammatic ordinal scales are available in the Estimate app (2017 version) for Cercospora leaf spot (CLS) severity on table beet: 10% linear (linear-based diagrammatic scale [LIN]) and logarithmic based (Horsfall–Barratt [HB]). These allow for estimating severity data of four types depending on the system used. A group of 30 raters assigned percentage severity on 30 photographs of diseased table beet leaves during five rounds first without an aid and then using each of the four rating systems in Estimate. In two, the perceived ordinal score of the HB or LIN scale was assigned where severity of the subject fit best. HB2 and LIN2 involved a second choice of unitary severity within the perceived score interval. There was large variation in unaided ability of raters to estimate severity: 13% were accurate (Lin’s concordance correlation [LCC] > 0.9), 23% were inaccurate (LCC < 0.7), and the remaining had moderate accuracy. Larger disparities between assigned and actual ordinal scores (mostly overestimates) occurred using the LIN compared with the HB. The LIN2 produced the most accurate estimates (Lin’s concordance correlation coefficient, ρc= 0.96; generalized bias parameter, Cb= 0.99; Pearson’s correlation coefficient r = 0.95) and the greatest interrater reliability (overall concordance correlation coefficient and intraclass correlation coefficient > 0.93). The two-step process using the 10% linear scale is recommended for severity estimates of CLS in table beet.

Download Full-text

Interobserver Reliability Using the Phonetic Level Evaluation With Severely and Profoundly Hearing-Impaired Children

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.989 ◽

1991 ◽

Vol 34 (5) ◽

pp. 989-999 ◽

Cited By ~ 6

Author(s):

Stephanie Shaw ◽

Truman E. Coggins

Keyword(s):

Interrater Reliability ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Hearing Impaired ◽

Intraclass Correlation Coefficients ◽

Assessment Measure ◽

Impaired Children ◽

Speech Assessment ◽

Hearing Impaired Children

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

Download Full-text

CODEM Instrument

GeroPsych ◽

10.1024/1662-9647/a000100 ◽

2014 ◽

Vol 27 (1) ◽

pp. 23-31 ◽

Cited By ~ 4

Author(s):

Anne Kuemmel (This author contributed eq ◽

Julia Haberstroh (This author contributed ◽

Johannes Pantel

Keyword(s):

Convergent Validity ◽

Interrater Reliability ◽

Discriminant Validity ◽

Assessment Tool ◽

Intraclass Correlation ◽

Well Being ◽

Communication Behavior ◽

People With Dementia ◽

Pearson's R ◽

Pearson’S R

Communication and communication behaviors in situational contexts are essential conditions for well-being and quality of life in people with dementia. Measuring methods, however, are limited. The CODEM instrument, a standardized observational communication behavior assessment tool, was developed and evaluated on the basis of the current state of research in dementia care and social-communicative behavior. Initially, interrater reliability was examined by means of videoratings (N = 10 people with dementia). Thereupon, six caregivers in six German nursing homes observed 69 residents suffering from dementia and used CODEM to rate their communication behavior. The interrater reliability of CODEM was excellent (mean κ = .79; intraclass correlation = .91). Statistical analysis indicated that CODEM had excellent internal consistency (Cronbach’s α = .95). CODEM also showed excellent convergent validity (Pearson’s R = .88) as well as discriminant validity (Pearson’s R = .63). Confirmatory factor analysis verified the two-factor solution of verbal/content aspects and nonverbal/relationship aspects. With regard to the severity of the disease, the content and relational aspects of communication exhibited different trends. CODEM proved to be a reliable, valid, and sensitive assessment tool for examining communication behavior in the field of dementia. CODEM also provides researchers a feasible examination tool for measuring effects of psychosocial intervention studies that strive to improve communication behavior and well-being in dementia.

Download Full-text

Meet fractal curves with 1C:MathKit

Informatics and Education ◽

10.32517/0234-0453-2020-35-3-38-48 ◽

2020 ◽

pp. 38-48

Author(s):

O. M. Korchazhkina

Keyword(s):

Methodological Approach ◽

Fractal Curve ◽

Iterative Processes ◽

Geometric Figures ◽

Modern Natural ◽

Geometric Objects ◽

Ict Tools ◽

Subject Areas ◽

The Subject ◽

And Mathematics

The article presents a methodological approach to studying iterative processes in the school course of geometry, by the example of constructing a Koch snowflake fractal curve and calculating a few characteristics of it. The interactive creative environment 1C:MathKit is chosen to visualize the method discussed. By performing repetitive constructions and algebraic calculations using ICT tools, students acquire a steady skill of work with geometric objects of various levels of complexity, comprehend the possibilities of mathematical interpretation of iterative processes in practice, and learn how to understand the dialectical unity between finite and infinite parameters of flat geometric figures. When students are getting familiar with such contradictory concepts and categories, that replenishes their experience of worldview comprehension of the subject areas they study through the concept of “big ideas”. The latter allows them to take a fresh look at the processes in the world around. The article is a matter of interest to schoolteachers of computer science and mathematics, as well as university scholars who teach the course “Concepts of modern natural sciences”.

Download Full-text

Reliability of a Modified Medication Appropriateness Index in Community Pharmacies

Annals of Pharmacotherapy ◽

10.1177/106002800303700101 ◽

2003 ◽

Vol 37 (1) ◽

pp. 40-46

Author(s):

Rosemin Kassam ◽

Linda G Martin ◽

Karen B Farris ◽

Homero A Monsanto ◽

Jean-Marie Kaiser

Keyword(s):

Community Pharmacy ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Community Setting ◽

Medication Appropriateness Index ◽

Paired Samples ◽

Psychometric Data ◽

Ambulatory Patients ◽

Pharmacy Setting ◽

Medication Appropriateness

Background The medication appropriateness index (MAI) has demonstrated reliability in selected outpatient clinics where medical data were easily accessible from medical charts. However, its use in the community setting where patient data may be limited has not been examined. Objective To evaluate the usefulness of a modified MAI for use in the community pharmacy setting by testing interrater reliability using 3 different rating schemes. Methods Two raters evaluated 160 medications for 32 elderly ambulatory patients. Patient information was acquired using community pharmacist-collected medication histories. A summated MAI score, percent agreement, κ, positive agreement, negative agreement, and intraclass correlation coefficient were calculated for each criterion using 3 scoring schemes. A paired samples t-test (95% CI) was used to test interrater reliability. Results The κ statistics were >0.75 for indication and effectiveness, but good (0.41–0.66) for the remaining criteria using the Hanlon scoring scheme. The intraclass coefficients (0.82, 0.86, 0.87) and overall κ (0.65, 0.66, 0.61) were similar for the 3 schemes. Conclusions This study suggests that the modified MAI has the potential to detect medication appropriateness and inappropriateness in the community pharmacy setting; however, it is not without limitations. Because the MAI has the most clinimetric and psychometric data available, the instrument should be studied further to increase its reliability and generalizability.

Download Full-text

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Educational and Psychological Measurement ◽

10.1177/0013164419899731 ◽

2020 ◽

Vol 80 (4) ◽

pp. 808-820

Author(s):

Cindy M. Walker ◽

Sakine Göçer Şahin

Keyword(s):

Differential Item Functioning ◽

Interrater Reliability ◽

Rating Scales ◽

Rating Scale ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Promising Alternative ◽

Constructed Response ◽

Polytomous Item ◽

Item Functioning

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.

Download Full-text

Development and Initial Validation of a Project-Based Rubric to Assess the Systems-Based Practice Competency of Residents in the Clinical Chemistry Rotation of a Pathology Residency

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2013-0046-oa ◽

2014 ◽

Vol 138 (6) ◽

pp. 809-813

Author(s):

Carolyn R. Vitek ◽

Jane C. Dale ◽

Henry A. Homburger ◽

Sandra C. Bryant ◽

Amy K. Saenger ◽

...

Keyword(s):

Critical Thinking ◽

Interrater Reliability ◽

Clinical Chemistry ◽

Core Competencies ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Thinking Skills ◽

Project Evaluation ◽

Critical Thinking Skills

Context.— Systems-based practice (SBP) is 1 of 6 core competencies required in all resident training programs accredited by the Accreditation Council for Graduate Medical Education. Reliable methods of assessing resident competency in SBP have not been described in the medical literature. Objective.— To develop and validate an analytic grading rubric to assess pathology residents' analyses of SBP problems in clinical chemistry. Design.— Residents were assigned an SBP project based upon unmet clinical needs in the clinical chemistry laboratories. Using an iterative method, we created an analytic grading rubric based on critical thinking principles. Four faculty raters used the SBP project evaluation rubric to independently grade 11 residents' projects during their clinical chemistry rotations. Interrater reliability and Cronbach α were calculated to determine the reliability and validity of the rubric. Project mean scores and range were also assessed to determine whether the rubric differentiated resident critical thinking skills related to the SBP projects. Results.— Overall project scores ranged from 6.56 to 16.50 out of a possible 20 points. Cronbach α ranged from 0.91 to 0.96, indicating that the 4 rubric categories were internally consistent without significant overlap. Intraclass correlation coefficients ranged from 0.63 to 0.81, indicating moderate to strong interrater reliability. Conclusions.— We report development and statistical analysis of a novel SBP project evaluation rubric. The results indicate the rubric can be used to reliably assess pathology residents' critical thinking skills in SBP.

Download Full-text

Development of a Model for the Acquisition and Assessment of Advanced Laparoscopic Suturing Skills Using an Automated Device

Surgical Innovation ◽

10.1177/1553350618764221 ◽

2018 ◽

Vol 25 (3) ◽

pp. 286-290 ◽

Cited By ~ 2

Author(s):

Elif Bilgic ◽

Madoka Takao ◽

Pepa Kaneva ◽

Satoshi Endo ◽

Toshitatsu Takao ◽

...

Keyword(s):

Laparoscopic Surgery ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Instructional Video ◽

Validity Evidence ◽

Laparoscopic Suturing ◽

Intraclass Correlation Coefficients ◽

Operative Assessment ◽

Suturing Skills

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.

Download Full-text

Bankenwettbewerb und die Stabilität von Finanzsektoren

Zeitschrift für Wirtschaftspolitik ◽

10.1515/zfwp-2021-2044 ◽

2021 ◽

Vol 70 (1) ◽

pp. 1-36

Author(s):

Toni Richter

Keyword(s):

Economic Policy ◽

Precise Measurement ◽

Policy Discourse ◽

Competitive Effect ◽

Too Big To Fail ◽

Financial Systems ◽

The Subject ◽

State Of Research ◽

The Stability ◽

Elementary Basis

Abstract Since the financial crisis of 2008 and intensified during the corona crisis, the interdependence between the stability of the financial systems and the prevailing degree of competition (DC) has been the subject of scientific and economic policy discourse on fragmented markets and „too-big-to-fail“ banks. In theory and empiricism, two fundamentally contrary causal concepts are opposed, the elementary basis of which is the precise measurement of the DC: Competition-stability- versus Fragility-Hypothesis. Based on the recent state of research, it can be shown that alternative DC-Measurements consistently show significantly different competitive conditions and in consequence the evidence for or against a stability-enhancing competitive effect seems to be predetermined by the chosen DC-Measurement.

Download Full-text