Skip to main content

Home/ Education Links/ Group items tagged reliability

Rss Feed Group items tagged

Jeff Bernstein

Shanker Blog » Value-Added Versus Observations, Part One: Reliability - 0 views

  •  
    Although most new teacher evaluations are still in various phases of pre-implementation, it's safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers' final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many - perhaps most - teachers strongly prefer the former (observations, especially peer observations) over the latter (VA). One of the most common arguments against VA is that the scores are error-prone and unstable over time - i.e., that they are unreliable. And it's true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than "real" performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class. These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe "true" teacher performance, it's tough to say which is "better" or "worse," despite the certainty with which both "sides" often present their respective cases. And, the fact that both entail some level of measurement error doesn't by itself speak to whether they should be part of evaluations.*
Jeff Bernstein

Shanker Blog » Value-Added Versus Observations, Part Two: Validity - 0 views

  •  
    In a previous post, I compared value-added (VA) and classroom observations in terms of reliability - the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren't useful unless they are valid - that is, unless they're measuring what we want them to measure. Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional - in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they're being used. Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places.
Jeff Bernstein

The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston ... - 0 views

  •  
    The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as "the most robust and reliable" system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation - the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.
Jeff Bernstein

Houston, You Have a Problem! | National Education Policy Center - 0 views

  •  
    Education Policy Analysis Archives recently published an article by Audrey Amrein-Beardsley and Clarin Collins that effectively exposes the Houston Independent School District use of a value-added teacher evaluation system as a disaster. The Educational Value-Added Assessment System (EVAAS) is alleged by its creators, the European software giant SAS, to be the "the most robust and reliable" system of teacher evaluation ever invented. Amrein-Beardsley and Collins demonstrate to the contrary that EVAAS is a psychometric bad joke and a nightmare to teachers. EVAAS produces "value-added" measures for the same teachers that jump around willy-nilly from large and negative to large and positive from year-to-year when neither the general nature of the students nor the nature of the teaching differs across time. In defense of the EVAAS one could note that this is common to all such systems of attributing students' test scores to teachers' actions so that EVAAS might still lay claim to being "most robust and reliable"-since they are all unreliable and who knows what "robust" means?
Jeff Bernstein

Review of Gathering Feedback for Teaching: Combining High-Quality Observation with Stud... - 0 views

  •  
    This second report from the Measures of Effective Teaching (MET) project offers ground-breaking descriptive information regarding the use of classroom observation instruments to measure teacher performance. It finds that observation scores have somewhat low reliabilities and are weakly though positively related to value-added measures. Combining multiple observations can enhance reliabilities, and combining observation scores with student evaluations and test-score information can increase their ability to predict future teacher value-added. By highlighting the variability of classroom observation measures, the report makes an important contribution to research and provides a basis for the further development of observation rubrics as evaluation tools. Although the report raises concerns regarding the validity of classroom observation measures, we question the emphasis on validating observations with test-score gains. Observation scores may pick up different aspects of teacher quality than test-based measures, and it is possible that neither type of measure used in isolation captures a teacher's contribution to all the useful skills students learn. From this standpoint, the authors' conclusion that multiple measures of teacher effectiveness are needed appears justifiable. Unfortunately, however, the design calls for random assignment of students to teachers in the final year of data collection, but the classroom observations were apparently conducted prior to randomization, missing a valuable opportunity to assess correlations across measures under relatively bias-free conditions.
Jeff Bernstein

Using Student Test Scores to Fire Teachers: No More Reliable Than a Coin Toss - Living ... - 0 views

  •  
    "Public school teachers and principals deserve fair treatment on important decisions about who should be retained and who should be fired. They should not be fired based on student test scores because the variation in student test scores is random. It is no more reliable than a coin toss. How wise would it be to fire doctors or lawyers based on a coin toss? Heads they stay. Tails they go. Imagine what this would do the moral of staff who had also most no control over whether they stayed or were fired. In this report, we will look at the scientific research (or lack of it) on using student test scores to evaluate teachers."
Jeff Bernstein

When Rater Reliability Is Not Enough - 0 views

  •  
    In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to specify important criteria regarding instrument use. In this article, the authors argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training, and certification. To illustrate how such a system might be developed and improved, the authors provide an empirical example that applies generalizability theory to data from a mathematics observational instrument.
Jeff Bernstein

Making the Grade in New York City - Room for Debate - NYTimes.com - 0 views

  •  
    "The latest progress reports for New York City elementary and middle schools came out last week, and many parents are baffled to see some of the city's top-performing schools getting "C's" and "B's." Proponents say, the "A" to "F" grading system is one of the best ways to get parents to pay attention, but critics say that the city's over emphasis on test performance skews the grades, making them unreliable for judging the quality of a school. If these progress reports are not reliable, what is the purpose of them?"
Jeff Bernstein

Research Shows... | National Education Policy Center - 0 views

  •  
    Chicago mayor, Rahm Emanuel has taken a page from the Koch Bros. book of tricks by using the "research says" tactic to push his longer-school-day campaign on resistant city schools. As I have shown numerous times on this blog and elsewhere, there is no reliable or valid research to support Rahm's claim that more seat time in school produces better learning outcomes. But buoyed by support from corporate reform groups like Stand For Children, the mayor's publicists at CPS, like Becky Carroll and his hand-picked CEO J.C. Brizard, continue to claim that there are studies to validate this obvious political and anti-union agenda.
Jeff Bernstein

Man vs. Computer: Who Wins the Essay-Scoring Challenge? - Curriculum Matters - Educatio... - 0 views

  •  
    Would you rather have an actual person score your carefully crafted essay, or an automated software program designed for that purpose? I'd still take the flawed human being any day-assuming, of course, the proper expertise and that he or she is operating on a good night's sleep-but a new study suggests there is little, if any, difference in the reliability and accuracy of the computer approach.
Jeff Bernstein

If it's not valid, reliability doesn't matter so much! More on VAM-ing & SGP-... - 0 views

  •  
    This post includes a few more preliminary musings regarding the use of value-added measures and student growth percentiles for teacher evaluation, specifically for making high-stakes decisions, and especially in those cases where new statutes and regulations mandate rigid use/heavy emphasis on these measures, as I discussed in the previous post.
Jeff Bernstein

Noam Chomsky: The Assault on Public Education - 0 views

  •  
    "There has been a shift from the belief that we as a nation benefit from higher education, to a belief that it's the people receiving the education who primarily benefit and so they should foot the bill," concludes Ronald G. Ehrenberg, a trustee of the State University system of New York and director of the Cornell Higher Education Research Institute. A more accurate description, I think, is "Failure by Design," the title of a recent study by the Economic Policy Institute, which has long been a major source of reliable information and analysis on the state of the economy. The EPI study reviews the consequences of the transformation of the economy a generation ago from domestic production to financialization and offshoring. By design; there have always been alternatives.
Jeff Bernstein

Shanker Blog » Beware Of Anecdotes In The Value-Added Debate - 0 views

  •  
    The reliability of value-added estimates, like that of all performance measures (including classroom observations), is an important issue, and is sometimes dismissed by supporters in a cavalier fashion. There are serious concerns here, and no absolute answers. But none of this can be examined or addressed with anecdotes.
Jeff Bernstein

Teacher Data Reports - 4000 Unreliable, 100% Wrong | Edwize - 0 views

  •  
    The Daily News reported yesterday that fully 1/3 of the Teacher Data Reports - 4000 reports - are unreliable. And just to add a little context: They all have multiple years of data, which are supposed to be more reliable. Hundreds and hundreds have margins of error of less than 10 percentage points - five on either side - giving the public, parents, and teachers assurances that these reports are quite correct. In fact, the DOE was so confident in its findings that 46 of these reports had no margins of error at all! That's 4000 reports. And that means since teachers are ranked against each other, that all the reports are unreliable. Or, let's just get right to it: "unreliable" is a euphemism for wrong.
Jeff Bernstein

Two Persistent Reformy Misrepresentations regarding VAM Estimates « School Fi... - 0 views

  •  
    I have written much on this blog about problems with the use of Value-added Estimates of teacher effect (used loosely) on student test score gains on this blog. I have addressed problems with both the reliability and validity of VAM estimates, and I have pointed out how SGP based estimates of student growth are invalid on their face for determining teacher effectiveness. But, I keep hearing two common refrains from the uber-reformy (those completely oblivious to the statistics and research of VAM while also lacking any depth of understanding of the complexities of the social systems [schools] into which they propose to implement VAM as a de-selection tool) crowd. Sadly, these are the people who seem to be drafting policies these days.
Jeff Bernstein

New York State Field Tests: 'Students Should Not Be Informed' Of Connection To Standard... - 0 views

  •  
    A memo has recently surfaced in which the New York State Department of Education appears to encourage educators to mislead students about upcoming standardized field tests meant to "provide the data necessary to ensure the validity and reliability of the New York State Testing program." "Students should not be informed of the connection between these field tests and State assessments," the memo reads. "The field tests should be described as brief tests of achievement in the subject."
Jeff Bernstein

Shanker Blog » Five Recommendations For Reporting On (Or Just Interpreting) S... - 0 views

  •  
    "From my experience, education reporters are smart, knowledgeable, and attentive to detail. That said, the bulk of the stories about testing data - in big cities and suburbs, in this year and in previous years - could be better. Listen, I know it's unreasonable to expect every reporter and editor to address every little detail when they try to write accessible copy about complicated issues, such as test data interpretation. Moreover, I fully acknowledge that some of the errors to which I object - such as calling proficiency rates "scores" - are well within tolerable limits, and that news stories need not interpret data in the same way as researchers. Nevertheless, no matter what you think about the role of test scores in our public discourse, it is in everyone's interest that the coverage of them be reliable. And there are a few mostly easy suggestions that I think would help a great deal. Below are five such recommendations. They are of course not meant to be an exhaustive list, but rather a quick compilation of points, all of which I've discussed in previous posts, and all of which might also be useful to non-journalists."
Jeff Bernstein

Analysis: The Principals' Revolt is Good for Education | NBC New York - 1 views

  •  
    School principals throughout the state are in revolt against the State Department of Education for imposing a system that is supposed to guarantee reliable testing for students. The leaders of the principals' rebellion charge that the system doesn't actually accomplish that. Instead, it degrades the educators in our schools. One principal, Bernard Kaplan, of Great Neck North High School on Long Island, told me: "It's stupid. It makes no sense."
Jeff Bernstein

Teachers Matter. Now What? | The Nation - 0 views

  •  
    Given the widespread, non-ideological worries about the reliability of standardized test scores when they are used in high-stakes ways, it makes good sense for reform-minded teachers' unions to embrace value-added as one measure of teacher effectiveness, while simultaneously pushing for teachers' rights to a fair-minded appeals process. What's more, just because we know that teachers with high value-added ratings are better for children, it doesn't necessarily follow that we should pay such teachers more for good evaluation scores alone. Why not use value-added to help identify the most effective teachers, but then require these professionals to mentor their peers in order to earn higher pay?
Jeff Bernstein

Race to Inflate: The Evaluation Conundrum for Teachers of Non-tested Subjects - Chartin... - 0 views

  •  
    Currently, many states plan to have teachers of non-tested subjects use a make-shift version of value-added measures where teachers identify learning objectives, choose assessments that correspond with these objectives, monitor student progress over the course of the year, and then present that progress as evidence of student growth. While this process may very well improve the overall quality of teaching, it is an ineffective way to evaluate teachers. If it takes complex statistical algorithms to measure student growth for English and math teachers, what makes us think that teachers of non-tested subjects can validly and reliably measure student growth on their own?
1 - 20 of 32 Next ›
Showing 20 items per page