Skip to main content

Home/ Education Links/ Group items tagged scoring

Rss Feed Group items tagged

Jeff Bernstein

Todd Farley: Lies, Damn Lies, and Statistics, or What's Really Up With Automated Essay ... - 0 views

  •  
    As any astute reader but no automated essay scoring program might have gleaned by now, I actually do have my doubts about the automated essay scoring study. I have my doubts because I worked in the test-scoring business for the better part of fifteen years (1994-2008), and most of that job entailed making statistics dance: I saw the industry fix distribution statistics when they might have showed different results than a state wanted; I saw it fudge reliability numbers when those showed human readers weren't scoring in enough of a standardized way; and I saw it fake qualifying scores to ensure enough temporary employees were kept on projects to complete them on time even when those temporary employees were actually not qualified for the job. Given my experience in the duplicitous world of standardized test-scoring, I couldn't help but have my doubts about the statistics provided in support of the automated essay scoring study -- and, unfortunately, that study lost me with its title alone. "Contrasting State-of-the-Art Automated Scoring of Essays: Analysis," it is named, with p. 5 reemphasizing exactly what the study is supposed to be focused on: "Phase I examines the machine scoring capabilities for extended-response essays." A quick perusal of Table 3, however, on page 33, suggests that the "essays" scored in the study are barely essays at all: "Essays" tested in five of the eight sets of student responses averaged only about a hundred and fifty words.
Jeff Bernstein

Growth scores a formula for failure « Opine I will - 0 views

  •  
    "I received my 'growth score' today from the New York State Education Department. I know,  I really shouldn't care what my score is. I know 100% of my students tested at or above grade level in Math and English Language Arts.  I know my class' scores were near or at the very top of my district's scores. I know my district is also at or nearly at the top of the region's and states' scores. I know I work my heart out and push my students to excel. My students always, ALWAYS  succeed. Yet according to the NYSED my growth score is so so. I'm rated effective with a growth score of 14 out of 20. Keep in mind, my student's mean scale in math  is 708.4 and ELA it is 678.  I'm confident both scores are well above that state mean. So why did I get a mediocre growth score? The state's explanation of it's calculation should be a eye opener for all  of us."
Jeff Bernstein

NBER: Knowledge, Tests, and Fadeout in Educational Interventions - 0 views

  •  
    Educational interventions are often evaluated and compared on the basis of their impacts on test scores. Decades of research have produced two empirical regularities: interventions in later grades tend to have smaller effects than the same interventions in earlier grades, and the test score impacts of early educational interventions almost universally "fade out" over time. This paper explores whether these empirical regularities are an artifact of the common practice of rescaling test scores in terms of a student's position in a widening distribution of knowledge. If a standard deviation in test scores in later grades translates into a larger difference in knowledge, an intervention's effect on normalized test scores may fall even as its effect on knowledge does not. We evaluate this hypothesis by fitting a model of education production to correlations in test scores across grades and with college-going using both administrative and survey data. Our results imply that the variance in knowledge does indeed rise as children progress through school, but not enough for test score normalization to fully explain these empirical regularities.
Jeff Bernstein

Manipulation in the Grading of New York's Regents Examinations - 0 views

  •  
    The challenge of designing effective performance measurement and incentives is a general one in economic settings where behavior and outcomes are not easily observable. These issues are particularly prominent in education where, over the last two decades, test-based accountability systems for schools and students have proliferated. In this study, we present evidence that the design and decentralized, school-based grading of New York's high-stakes Regents Examinations have led to pervasive manipulation of student test scores that are just below performance thresholds. Specifically, we document statistically significant discontinuities in the distributions of subject-specific Regent scores that align with the cut scores used to determine both student eligibility to graduate and school accountability. Our results suggest that roughly 3 to 5 percent of the exam scores that qualified for a high-school diploma actually had performance below the state requirements. Moreover, we find that the rates of test manipulation in NYC were roughly twice as high as those in the entire state. We estimate that roughly 6 to 10 percent of NYC students who scored above the passing threshold for a Regents Diploma actually had scores below the state requirement.
Jeff Bernstein

Shanker Blog » The Weighting Game - 0 views

  •  
    A while back, I noted that states and districts should exercise caution in assigning weights (importance) to the components of their teacher evaluation systems before they know what the other components will be. For example, most states that have mandated new evaluation systems have specified that growth model estimates count for a certain proportion (usually 40-50 percent) of teachers' final scores (at least those in tested grades/subjects), but it's critical to note that the actual importance of these components will depend in no small part on what else is included in the total evaluation, and how it's incorporated into the system. In slightly technical terms, this distinction is between nominal weights (the percentage assigned) and effective weights (the percentage that actually ends up being the case). Consider an extreme hypothetical example - let's say a district implements an evaluation system in which half the final score is value-added and half is observations. But let's also say that every teacher gets the same observation score. In this case, even though the assigned (nominal) weight for value-added is 50 percent, the actual importance (effective weight) will be 100 percent, since every teacher receives the same observation score, and so all the variation between teachers' final scores will be determined by the value-added component.
Jeff Bernstein

Review of Gathering Feedback for Teaching: Combining High-Quality Observation with Stud... - 0 views

  •  
    This second report from the Measures of Effective Teaching (MET) project offers ground-breaking descriptive information regarding the use of classroom observation instruments to measure teacher performance. It finds that observation scores have somewhat low reliabilities and are weakly though positively related to value-added measures. Combining multiple observations can enhance reliabilities, and combining observation scores with student evaluations and test-score information can increase their ability to predict future teacher value-added. By highlighting the variability of classroom observation measures, the report makes an important contribution to research and provides a basis for the further development of observation rubrics as evaluation tools. Although the report raises concerns regarding the validity of classroom observation measures, we question the emphasis on validating observations with test-score gains. Observation scores may pick up different aspects of teacher quality than test-based measures, and it is possible that neither type of measure used in isolation captures a teacher's contribution to all the useful skills students learn. From this standpoint, the authors' conclusion that multiple measures of teacher effectiveness are needed appears justifiable. Unfortunately, however, the design calls for random assignment of students to teachers in the final year of data collection, but the classroom observations were apparently conducted prior to randomization, missing a valuable opportunity to assess correlations across measures under relatively bias-free conditions.
Jeff Bernstein

Analyzing Released NYC Value-Added Data Part 4 | Gary Rubinstein's Blog - 0 views

  •  
    Value-added has been getting a lot of media attention lately but, unfortunately, most stories are missing the point.  In Gotham Schools I read about a teacher who got a low score but it was because her score was based on students who were not assigned to her.  In The New York Times I read about three teachers who were rated in the single digits, but it was because they had high performing students and a few of their scores went down.  In The Washington Post I read about a teacher who was fired for getting low value-added on her IMPACT report, but it was because her students had inflated pretest scores because it is possible that the teachers from the year before cheated. Each of these stories makes it sound like there are very fixable flaws in value-added.  Get the student data more accurate, make some kind of curve for teachers of high performing students, get better test security so cheating can't affect the next year's teacher's score.  But the flaws in value-added go WAY beyond that, which is what I've been trying to show in my posts - not just some exceptional scenarios, but how it affects the majority of teachers.
Jeff Bernstein

Dear NYSED, Please Send Answers - 0 views

  •  
    So a teacher can be effective in each of the sub-components and developing overall? How is that possible? You have a problem Sir. And it goes without saying that it will be as difficult for our best teachers to be in the Highly Effective Range, EVER, as it is for our smartest fourth graders to achieve a 4 on the State ELA test. Which we're working on, by the way. We want more 4′s and more 3′s and well, even without the TESTS, we aim to do a better job, aligning to the common core, making data driven decisions, doing all of the things well that you've asked us to do. Believe it or not, we do want every child to succeed and we understand we've got to be more deliberate in making that happen through the common core curriculum and data analysis, NOT through fear and intimidation. Not through the composite scores you're instituting. Two things will happen. One, I'll have to hire three more administrators to help me with all of the teacher improvement plans indicated by your scoring bands. Two, our teachers will be demoralized, defeated, and ready to give up. We get it Commissioner King. We are going to transform this district from the wonderful, productive place that it already is into a more focused PK-12 continuum of curriculum that positively affects student achievement in big ways. And we're also going to be sure that while productive, we don't suck all of the joy out of learning. Your insanely punitive scoring bands are not going to help make that happen. Raise expectations, think the best of us, help us to get there. Reward us when we do. The scoring bands and the publicly reported composite scores will not help us get there.
Jeff Bernstein

Shanker Blog » Revisiting The "5-10 Percent Solution" - 0 views

  •  
    In a post over a year ago, I discussed the common argument that dismissing the "bottom 5-10 percent" of teachers would increase U.S. test scores to the level of high-performing nations. This argument is based on a calculation by economist Eric Hanushek, which suggests that dismissing the lowest-scoring teachers based on their math value-added scores would, over a period of around ten years  (when the first cohort of students would have gone through the schooling system without the "bottom" teachers), increase U.S. math scores dramatically - perhaps to the level of high-performing nations such as Canada or Finland.* This argument is, to say the least, controversial, and it invokes the full spectrum of reactions. In my opinion, it's best seen as a policy-relevant illustration of the wide variation in test-based teacher effects, one that might suggest a potential of a course of action but can't really tell us how it will turn out in practice. To highlight this point, I want to take a look at one issue mentioned in that previous post - that is, how the instability of value-added scores over time (which Hanushek's simulation doesn't address directly) might affect the projected benefits of this type of intervention, and how this is turn might modulate one's view of the huge projected benefits. One (admittedly crude) way to do this is to use the newly-released New York City value-added data, and look at 2010 outcomes for the "bottom 10 percent" of math teachers in 2009.
Jeff Bernstein

Shanker Blog » Value-Added Versus Observations, Part One: Reliability - 0 views

  •  
    Although most new teacher evaluations are still in various phases of pre-implementation, it's safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers' final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many - perhaps most - teachers strongly prefer the former (observations, especially peer observations) over the latter (VA). One of the most common arguments against VA is that the scores are error-prone and unstable over time - i.e., that they are unreliable. And it's true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than "real" performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class. These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe "true" teacher performance, it's tough to say which is "better" or "worse," despite the certainty with which both "sides" often present their respective cases. And, the fact that both entail some level of measurement error doesn't by itself speak to whether they should be part of evaluations.*
Jeff Bernstein

Larry Cuban: How high stakes corrupt performance on tests, other indicators - The Answe... - 0 views

  •  
    Test scores are the coin of the educational realm in the United States. No Child Left Behind demands that scores be used to reward and punish districts, schools, and teachers for how well or poorly students score on state tests. In pursuit of federal dollars, the Obama administration's Race to the Top competition has shoved state after state into legislating that teacher evaluations include student test scores as part of judging teacher effectiveness. Numbers glued to high stakes consequences, however, corrupt performance. Since the mid-1970s, social scientists have documented the untoward results of attaching high stakes to quantitative indicators not only for education but also across numerous institutions. They have pointed out that those who implement policies using specific quantitative measures will change their practices to insure better numbers.
Jeff Bernstein

Education Department's obsession with test scores deepens - The Answer Sheet - The Wash... - 0 views

  •  
    Apparently it's not enough for the Obama administration that standardized test scores are now used to evaluate students, schools, teachers and principals. In a new display of its obsession with test scores, the Education Department is embarking on a study to determine which parts of clinical teacher training lead to higher average test scores among the teachers' students.
Jeff Bernstein

Shock Doctrine: five reasons not to trust the results of the new state tests - 0 views

  •  
    "Dear parents: As you may have probably heard, the new state test scores were released to the press and they are disastrous. Only 31% of students in New York State passed the new Common Core exams in reading and math. More than one third -- or 36% -- of 3rd graders throughout the state got a level I in English; which means they essentially flunked.  In NYC, only 26 percent of students passed the exams in English, and 30 percent passed in math - meaning they had a level 3 or 4.  Only 5% of students in Rochester passed.  Though children's individual scores won't be available to parents until late August, I urge you not to panic when you see them.  My advice is not to believe a word of any of this.  The new Common Core exams and test scores are politically motivated, and are based neither on reason or evidence.  They were pre-ordained to fit the ideological goals of Commissioner King and the other educrats who are intent on imposing damaging policies on our schools.  Here are five reasons not to trust the new scores"
Jeff Bernstein

N.Y.C. Gains on Statewide School Tests - NYTimes.com - 1 views

  •  
    City students posted modest gains on elementary and middle school statewide tests this year, showing more improvement than students in the state as a whole and in the state's other large cities, state officials said Monday. But city and state scores both remain far below where they were two years ago, when sky-high scores made it seem that an education miracle might be at work in New York schools. Last year, state officials readjusted scoring after determining that the tests had become too easy to pass and were out of balance with national and college-preparation standards. As a result, scores plummeted.
Jeff Bernstein

What the decline in SAT scores really means - The Answer Sheet - The Washington Post - 0 views

  •  
    Anybody paying attention to the course of modern school reform will not be very surprised by this news: Newly released SAT scores show that scores in reading, writing and even math are down over last year and have been declining for years. And critical reading scores are the lowest in 40 years.
Jeff Bernstein

SAT Score Hysteria and the Missing Chart - 0 views

  •  
    When reporting test scores, it's essential to understand whether the scores reflect changes in performance or changes in the tested population (see here and here for how this plays out with NAEP results). In this case, while I don't have enough data to know exactly what is going on with SAT scores, there's no doubt that the story is more complex than meets the eye.
Jeff Bernstein

Limitations in the Use of Achievement Tests as Measures of Educators' Productivity - 1 views

  •  
    Test-based accountability rests on the assumption that accountability for scores on tests will provide needed incentives for teachers to improve student performance. Evidence shows, however, that simple test-based accountability can generate perverse incentives and seriously inflated scores. This paper discusses the logic of achievement tests, issues that arise in using them as proxy indicators of educational quality, and the mechanism underlying the inflation of scores. It ends with suggestions, some speculative, for improving the incentives faced by teachers by modifying systems of student assessment and combining them with numerous other measures, many of which are more subjective than are test scores.
Jeff Bernstein

A Legal Argument Against The Use of VAMs in Teacher Evaluation - 0 views

  •  
    "Value Added Models (VAMs) are irresistible. Purportedly they can ascertain a teacher's effectiveness by predicting the impact of a teacher on a student's test scores. Because test scores are the sin qua non of our education system, VAMs are alluring. They link a teacher directly to the most emphasized output in education today. What more can we want from an evaluative tool, especially in our pursuit of improving schools in the name of social justice? Taking this a step further, many see VAMs as the panacea for improving teacher quality. The theory seems straightforward. VAMs provide statistical predictions regarding a teacher's impact that can be compared to actual results. If a teacher cannot improve a student's test score in relatively positive ways, then they are ineffective. If they are ineffective, they can (and should) be dismissed (See, for instance, Hanushek, 2010). Consequently, state legislatures have rushed to codify VAMs into their statutes and regulations governing teacher evaluation. (See, for example, Florida General Laws, 2014). That has been a mistake. This paper argues for a complete reversal in policy course. To wit, state regulations that connect a teacher's continued employment to VAMs should be overhauled to eliminate the connection between evaluation and student test scores. The reasoning is largely legal, rather than educational. In sum, the legal costs of any use of VAMs in a performance-based termination far outweigh any value they may add.1 These risks are directly a function of the well-documented statistical flaws associated with VAMs (See, for example, Rothstein, 2010). The "value added" of VAMs in supporting a termination is limited, if it exists at all."
Jeff Bernstein

Using Student Test Scores to Fire Teachers: No More Reliable Than a Coin Toss - Living ... - 0 views

  •  
    "Public school teachers and principals deserve fair treatment on important decisions about who should be retained and who should be fired. They should not be fired based on student test scores because the variation in student test scores is random. It is no more reliable than a coin toss. How wise would it be to fire doctors or lawyers based on a coin toss? Heads they stay. Tails they go. Imagine what this would do the moral of staff who had also most no control over whether they stayed or were fired. In this report, we will look at the scientific research (or lack of it) on using student test scores to evaluate teachers."
Jeff Bernstein

Cheating our children: Suspicious school test scores across the nation  | ajc... - 0 views

  •  
    Suspicious test scores in roughly 200 school districts resemble those that entangled Atlanta in the biggest cheating scandal in American history, an investigation by The Atlanta Journal-Constitution shows. The newspaper analyzed test results for 69,000 public schools and found high concentrations of suspect math or reading scores in school systems from coast to coast. The findings represent an unprecedented examination of the integrity of school testing. The analysis doesn't prove cheating. But it reveals that test scores in hundreds of cities followed a pattern that, in Atlanta, indicated cheating in multiple schools.
1 - 20 of 562 Next › Last »
Showing 20 items per page