Todd Farley: Lies, Damn Lies, and Statistics, or What's Really Up With Automated Essay ... - 0 views
-
Jeff Bernstein on 09 Jun 12As any astute reader but no automated essay scoring program might have gleaned by now, I actually do have my doubts about the automated essay scoring study. I have my doubts because I worked in the test-scoring business for the better part of fifteen years (1994-2008), and most of that job entailed making statistics dance: I saw the industry fix distribution statistics when they might have showed different results than a state wanted; I saw it fudge reliability numbers when those showed human readers weren't scoring in enough of a standardized way; and I saw it fake qualifying scores to ensure enough temporary employees were kept on projects to complete them on time even when those temporary employees were actually not qualified for the job. Given my experience in the duplicitous world of standardized test-scoring, I couldn't help but have my doubts about the statistics provided in support of the automated essay scoring study -- and, unfortunately, that study lost me with its title alone. "Contrasting State-of-the-Art Automated Scoring of Essays: Analysis," it is named, with p. 5 reemphasizing exactly what the study is supposed to be focused on: "Phase I examines the machine scoring capabilities for extended-response essays." A quick perusal of Table 3, however, on page 33, suggests that the "essays" scored in the study are barely essays at all: "Essays" tested in five of the eight sets of student responses averaged only about a hundred and fifty words.