Group items tagged measurement error - Education Links

Shanker Blog » Value-Added Versus Observations, Part Two: Validity - 0 views

shankerblog.org/?p=5670

education reform value-added teacher evaluations validity observation commentary

shared by Jeff Bernstein on 19 Apr 12 - No Cached

Jeff Bernstein on 19 Apr 12

In a previous post, I compared value-added (VA) and classroom observations in terms of reliability - the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren't useful unless they are valid - that is, unless they're measuring what we want them to measure. Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional - in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they're being used. Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places.

<div class="cArrow"> </div><div class="cContentInner">In a previous post, I compared value-added (VA) and classroom observations in terms of reliability - the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren't useful unless they are valid - that is, unless they're measuring what we want them to measure. Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional - in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they're being used. Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places.</div>

...

Cancel

Shanker Blog » Value-Added Versus Observations, Part One: Reliability - 0 views

shankerblog.org/?p=5621

education reform value-added teacher evaluations observation measurement error reliability commentary

shared by Jeff Bernstein on 12 Apr 12 - No Cached

Jeff Bernstein on 12 Apr 12

Although most new teacher evaluations are still in various phases of pre-implementation, it's safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers' final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many - perhaps most - teachers strongly prefer the former (observations, especially peer observations) over the latter (VA). One of the most common arguments against VA is that the scores are error-prone and unstable over time - i.e., that they are unreliable. And it's true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than "real" performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class. These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe "true" teacher performance, it's tough to say which is "better" or "worse," despite the certainty with which both "sides" often present their respective cases. And, the fact that both entail some level of measurement error doesn't by itself speak to whether they should be part of evaluations.*

<div class="cArrow"> </div><div class="cContentInner">Although most new teacher evaluations are still in various phases of pre-implementation, it's safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers' final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many - perhaps most - teachers strongly prefer the former (observations, especially peer observations) over the latter (VA). One of the most common arguments against VA is that the scores are error-prone and unstable over time - i.e., that they are unreliable. And it's true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than "real" performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class. These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe "true" teacher performance, it's tough to say which is "better" or "worse," despite the certainty with which both "sides" often present their respective cases. And, the fact that both entail some level of measurement error doesn't by itself speak to whether they should be part of evaluations.*</div>

...

Cancel

You've Been VAM-IFIED! Thoughts (& Graphs) on the NYC Teacher Data « School F... - 0 views

schoolfinance101.wordpress.com/...graphs-on-the-nyc-teacher-data

education reform value-added teacher evaluations data research commentary

shared by Jeff Bernstein on 29 Feb 12 - No Cached

Jeff Bernstein on 29 Feb 12

Readers of my blog know I'm both a data geek and a skeptic of the usefulness of Value-added data specifically as a human resource management tool for schools and districts. There's been much talk this week about the release of the New York City teacher ratings to the media, and subsequent publication of those data by various news outlets. Most of the talk about the ratings has focused on the error rates in the ratings, and reporters from each news outlet have spent a great deal of time hiding behind their supposed ultra-responsibleness of being sure to inform the public that these ratings are not absolute, that they have significant error ranges, etc. Matt Di Carlo over at Shanker Blog has already provided a very solid explanatory piece on the error ranges and how those ranges affect classification of teachers as either good or bad. But, the imprecision - as represented by error ranges - of each teacher's effectiveness estimate is but one small piece of this puzzle. And in my view, the various other issues involved go much further in undermining the usefulness of the value added measures which have been presented by the media as necessarily accurate albeit lacking in precision.

<div class="cArrow"> </div><div class="cContentInner">Readers of my blog know I'm both a data geek and a skeptic of the usefulness of Value-added data specifically as a human resource management tool for schools and districts. There's been much talk this week about the release of the New York City teacher ratings to the media, and subsequent publication of those data by various news outlets. Most of the talk about the ratings has focused on the error rates in the ratings, and reporters from each news outlet have spent a great deal of time hiding behind their supposed ultra-responsibleness of being sure to inform the public that these ratings are not absolute, that they have significant error ranges, etc. Matt Di Carlo over at Shanker Blog has already provided a very solid explanatory piece on the error ranges and how those ranges affect classification of teachers as either good or bad. But, the imprecision - as represented by error ranges - of each teacher's effectiveness estimate is but one small piece of this puzzle. And in my view, the various other issues involved go much further in undermining the usefulness of the value added measures which have been presented by the media as necessarily accurate albeit lacking in precision.</div>

...

Cancel

Overview of Measuring Effect Sizes: The Effect of Measurement Error - 0 views

www.urban.org/...264_measuring_effect_sizes.pdf

education measurement error teachers testing experience research

shared by Jeff Bernstein on 09 Aug 11 - No Cached

Jeff Bernstein on 09 Aug 11

The use of value-added models in education research has expanded rapidly. These models allow researchers to explore how a wide variety of policies and measured school inputs affect the academic performance of students. An important question is whether such effects are sufficiently large to achieve various policy goals. For example, would hiring teachers having stronger academic backgrounds sufficiently increase test scores for traditionally low-performing students to warrant the increased cost of doing so? Judging whether a change in student achievement is important requires some meaningful point of reference. In certain cases a grade-equivalence scale or some other intuitive and policy relevant metric of educational achievement can be used. However, this is not the case with item response theory (IRT) scale-score measures common to the tests usually employed in value-added analyses. In such cases, researchers typically describe the impacts of various interventions in terms of effect sizes, although conveying the intuition of such a measure to policymakers often is a challenge.

<div class="cArrow"> </div><div class="cContentInner">The use of value-added models in education research has expanded rapidly. These models allow researchers to explore how a wide variety of policies and measured school inputs affect the academic performance of students. An important question is whether such effects are sufficiently large to achieve various policy goals. For example, would hiring teachers having stronger academic backgrounds sufficiently increase test scores for traditionally low-performing students to warrant the increased cost of doing so? Judging whether a change in student achievement is important requires some meaningful point of reference. In certain cases a grade-equivalence scale or some other intuitive and policy relevant metric of educational achievement can be used. However, this is not the case with item response theory (IRT) scale-score measures common to the tests usually employed in value-added analyses. In such cases, researchers typically describe the impacts of various interventions in terms of effect sizes, although conveying the intuition of such a measure to policymakers often is a challenge. </div>

...

Cancel

Shanker Blog » Trial And Error Is Fine, So Long As You Know The Difference - 0 views

shankerblog.org/?p=4742

education reform value-added teacher evaluations policy commentary

shared by Jeff Bernstein on 18 Jan 12 - No Cached

Jeff Bernstein on 18 Jan 12

It's fair to say that improved teacher evaluation is the cornerstone of most current education reform efforts. Although very few people have disagreed on the need to design and implement new evaluation systems, there has been a great deal of disagreement over how best to do so - specifically with regard to the incorporation of test-based measures of teacher productivity (i.e., value-added and other growth model estimates). The use of these measures has become a polarizing issue. Opponents tend to adamantly object to any degree of incorporation, while many proponents do not consider new evaluations meaningful unless they include test-based measures as a major element (say, at least 40-50 percent). Despite the air of certainty on both sides, this debate has mostly been proceeding based on speculation. The new evaluations are just getting up and running, and there is virtually no evidence as to their effects under actual high-stakes implementation.

<div class="cArrow"> </div><div class="cContentInner">It's fair to say that improved teacher evaluation is the cornerstone of most current education reform efforts. Although very few people have disagreed on the need to design and implement new evaluation systems, there has been a great deal of disagreement over how best to do so - specifically with regard to the incorporation of test-based measures of teacher productivity (i.e., value-added and other growth model estimates). The use of these measures has become a polarizing issue. Opponents tend to adamantly object to any degree of incorporation, while many proponents do not consider new evaluations meaningful unless they include test-based measures as a major element (say, at least 40-50 percent). Despite the air of certainty on both sides, this debate has mostly been proceeding based on speculation. The new evaluations are just getting up and running, and there is virtually no evidence as to their effects under actual high-stakes implementation.</div>

...

Cancel

Firing teachers based on bad (VAM) versus wrong (SGP) measures of effectivene... - 0 views

schoolfinance101.wordpress.com/...es-of-effectiveness-legal-note

education reform value-added teachers SGP legal research commentary

shared by Jeff Bernstein on 31 Mar 12 - No Cached

Jeff Bernstein on 31 Mar 12

In the near future my article with Preston Green and Joseph Oluwole on legal concerns regarding the use of Value-added modeling for making high stakes decisions will come out in the BYU Education and Law Journal. In that article, we expand on various arguments I first laid out in this blog post about how use of these noisy and potentially biased metrics is likely to lead to a flood of litigation challenging teacher dismissals. In short, as I have discussed on numerous occasions on this blog, value-added models attempt to estimate the effect of the individual teacher on growth in measured student outcomes. But, these models tend to produce very imprecise estimates with very large error ranges, jumping around a lot from year to year. Further, individual teacher effectiveness estimates are highly susceptible to even subtle changes to model variables. And failure to address key omitted variables can lead to systemic model biases which may even lead to racially disparate teacher dismissals (see here & for follow up , here) .

<div class="cArrow"> </div><div class="cContentInner">In the near future my article with Preston Green and Joseph Oluwole on legal concerns regarding the use of Value-added modeling for making high stakes decisions will come out in the BYU Education and Law Journal. In that article, we expand on various arguments I first laid out in this blog post about how use of these noisy and potentially biased metrics is likely to lead to a flood of litigation challenging teacher dismissals. In short, as I have discussed on numerous occasions on this blog, value-added models attempt to estimate the effect of the individual teacher on growth in measured student outcomes. But, these models tend to produce very imprecise estimates with very large error ranges, jumping around a lot from year to year. Further, individual teacher effectiveness estimates are highly susceptible to even subtle changes to model variables. And failure to address key omitted variables can lead to systemic model biases which may even lead to racially disparate teacher dismissals (see here & for follow up , here) .</div>

...

Cancel

Don't Use Khan Academy without Watching this First - EdTech Researcher - Education Week - 0 views

blogs.edweek.org/...hout_watching_mmt2k_first.html

education reform online instruction flipping commentary

shared by Jeff Bernstein on 22 Jun 12 - No Cached

Jeff Bernstein on 22 Jun 12

The two teachers systematically dissect the video, noting a variety of missteps. There are a few unquestionable errors of mathematics: Khan uses incorrect terminology at a couple of points. Khan is also inconsistent in his language about positive and negative numbers (using plus when he means positive, or minus when he means negative), which is perhaps a lesser sin, but poor practice and misleading for students. He's also inconsistent in his use of symbols, sometimes writing "+4", sometimes writing "4", never explaining why he does or doesn't. He making the kind of mistakes that would reduce his score on the Mathematical Quality of Instruction observational instrument, used in the Gates-funded Measures of Effective Teaching Project.

<div class="cArrow"> </div><div class="cContentInner">The two teachers systematically dissect the video, noting a variety of missteps. There are a few unquestionable errors of mathematics: Khan uses incorrect terminology at a couple of points. Khan is also inconsistent in his language about positive and negative numbers (using plus when he means positive, or minus when he means negative), which is perhaps a lesser sin, but poor practice and misleading for students. He's also inconsistent in his use of symbols, sometimes writing "+4", sometimes writing "4", never explaining why he does or doesn't. He making the kind of mistakes that would reduce his score on the Mathematical Quality of Instruction observational instrument, used in the Gates-funded Measures of Effective Teaching Project.</div>

...

Cancel

Test scores mean nothing - NY Daily News - 0 views

www.nydailynews.com/...test-scores-article-1.1032155

education reform value-added teacher evaluations testing commentary

shared by Jeff Bernstein on 04 Mar 12 - No Cached

Jeff Bernstein on 04 Mar 12

Since the reports were released last week, the debate has been raging about whether a formula prone to as much as 53% in margin of error is the best way to judge the effectiveness of teachers. Self-proclaimed reformers say yes; those who understand teaching say otherwise. There is no question that teachers are responsible for the learning and growth that take place inside of their classrooms. However, standardized tests are just not a reliable measure of learning. If we are truly interested in increasing the quality of education, the conversation surrounding accountability must shift. Imagine if doctors were held accountable based on the death rate of their patients, regardless of environmental factors and whether prescribed treatment was followed. Imagine if firefighters were held accountable based on fire injuries and deaths, even though they didn't start the fires, their budgets had been cut and most of the homes in their district didn't have fire alarms. That would be unreasonable. So why do we only apply this impossible standard to teachers?

<div class="cArrow"> </div><div class="cContentInner">Since the reports were released last week, the debate has been raging about whether a formula prone to as much as 53% in margin of error is the best way to judge the effectiveness of teachers. Self-proclaimed reformers say yes; those who understand teaching say otherwise. There is no question that teachers are responsible for the learning and growth that take place inside of their classrooms. However, standardized tests are just not a reliable measure of learning. If we are truly interested in increasing the quality of education, the conversation surrounding accountability must shift. Imagine if doctors were held accountable based on the death rate of their patients, regardless of environmental factors and whether prescribed treatment was followed. Imagine if firefighters were held accountable based on fire injuries and deaths, even though they didn't start the fires, their budgets had been cut and most of the homes in their district didn't have fire alarms. That would be unreasonable. So why do we only apply this impossible standard to teachers?</div>

...

Cancel

About those Dice… Ready, Set, Roll! On the VAM-ification of Tenure « School F... - 0 views

schoolfinance101.wordpress.com/...on-the-vam-ification-of-tenure

education reform value-added teacher evaluations data research

shared by Jeff Bernstein on 01 Mar 12 - No Cached

Jeff Bernstein on 01 Mar 12

The standard reformy template is that teachers should only be able to get tenure after 3 years of good ratings in a row and that teachers should be subject to losing tenure if they get 2 bad years in a row. Further, it is possible that the evaluations might actually stipulate that you can only get a good rating if you achieve a certain rating on the quantitative portion of the evaluation - or the VAM score. Likewise for bad ratings (that is, the quantitative measure overrides all else in the system). The premise of the dice rolling activity from my previous post was that it is necessarily much less likely to roll the same number (or subset of numbers) three times in a row than twice (exponentially in fact). That is, it is much harder to overcome the odds based on error rates to achieve tenure, and much easier to lose it. Again, this is much due to the noisiness of the data, and less due to the difficulty of actually being "good" year after year. The ratings simply jump around a lot. See my previous post.

<div class="cArrow"> </div><div class="cContentInner">The standard reformy template is that teachers should only be able to get tenure after 3 years of good ratings in a row and that teachers should be subject to losing tenure if they get 2 bad years in a row. Further, it is possible that the evaluations might actually stipulate that you can only get a good rating if you achieve a certain rating on the quantitative portion of the evaluation - or the VAM score. Likewise for bad ratings (that is, the quantitative measure overrides all else in the system). The premise of the dice rolling activity from my previous post was that it is necessarily much less likely to roll the same number (or subset of numbers) three times in a row than twice (exponentially in fact). That is, it is much harder to overcome the odds based on error rates to achieve tenure, and much easier to lose it. Again, this is much due to the noisiness of the data, and less due to the difficulty of actually being "good" year after year. The ratings simply jump around a lot. See my previous post.</div>

...

Cancel

Deselection of the Bottom 8%: Lessons from Eugenics for Modern School Reform | Guest Bl... - 0 views

blogs.scientificamerican.com/...enics-for-modern-school-reform

education reform history testing teachers achievement value-added commentary

shared by Jeff Bernstein on 05 Mar 12 - No Cached

Jeff Bernstein on 05 Mar 12

One common strain of modern education reform has a direct, yet familiar logic: An education crisis persists despite more spending, smaller classes, or curricular changes. We have ignored the major cause of student achievement: teacher quality. Seniority and tenure have diluted the pool of talented teachers and impeded student learning. Reformers such as Michelle Rhee have acted on this assumption, implementing test-based accountability measures, merit pay, and lesser job protections. Unfortunately, the current educational reform movement shares its logic with the early-twentieth-century American eugenics movement, which in efforts to improve our gene pool, wrote a horrific chapter in our history. In suggesting this provocative comparison, I hope to guide readers through three shared errors. Both eugenics and modern school reform view education too deterministically, share a faith in standardized tests, and exaggerate the fixedness of traits.

<div class="cArrow"> </div><div class="cContentInner">One common strain of modern education reform has a direct, yet familiar logic: An education crisis persists despite more spending, smaller classes, or curricular changes. We have ignored the major cause of student achievement: teacher quality. Seniority and tenure have diluted the pool of talented teachers and impeded student learning. Reformers such as Michelle Rhee have acted on this assumption, implementing test-based accountability measures, merit pay, and lesser job protections. Unfortunately, the current educational reform movement shares its logic with the early-twentieth-century American eugenics movement, which in efforts to improve our gene pool, wrote a horrific chapter in our history. In suggesting this provocative comparison, I hope to guide readers through three shared errors. Both eugenics and modern school reform view education too deterministically, share a faith in standardized tests, and exaggerate the fixedness of traits.</div>

...

Cancel

Shanker Blog » Schools Aren't The Only Reason Test Scores Change - 0 views

shankerblog.org/?p=6435

education reform testing schools achievement measurement error media commentary

shared by Jeff Bernstein on 07 Aug 12 - No Cached

Jeff Bernstein on 07 Aug 12

"In all my many posts about the interpretation of state testing data, it seems that I may have failed to articulate one major implication, which is almost always ignored in the news coverage of the release of annual testing data. That is: raw, unadjusted changes in student test scores are not by themselves very good measures of schools' test-based effectiveness. In other words, schools can have a substantial impact on performance, but student test scores also increase, decrease or remain flat for reasons that have little or nothing to do with schools."

<div class="cArrow"> </div><div class="cContentInner">"In all my many posts about the interpretation of state testing data, it seems that I may have failed to articulate one major implication, which is almost always ignored in the news coverage of the release of annual testing data. That is: raw, unadjusted changes in student test scores are not by themselves very good measures of schools' test-based effectiveness. In other words, schools can have a substantial impact on performance, but student test scores also increase, decrease or remain flat for reasons that have little or nothing to do with schools."</div>

...

Cancel

Group items tagged