Skip to main content

Home/ CTLT and Friends/ Group items tagged validity

Rss Feed Group items tagged

Corinna Lo

IJ-SoTL - A Method for Collaboratively Developing and Validating a Rubric - 1 views

  •  
    "Assessing student learning outcomes relative to a valid and reliable standard that is academically-sound and employer-relevant presents a challenge to the scholarship of teaching and learning. In this paper, readers are guided through a method for collaboratively developing and validating a rubric that integrates baseline data collected from academics and professionals. The method addresses two additional goals: (1) to formulate and test a rubric as a teaching and learning protocol for a multi-section course taught by various instructors; and (2) to assure that students' learning outcomes are consistently assessed against the rubric regardless of teacher or section. Steps in the process include formulating the rubric, collecting data, and sequentially analyzing the techniques used to validate the rubric and to insure precision in grading papers in multiple sections of a course."
Gary Brown

Validity and Reliability in Higher Education Assessment - 2 views

  • Validity and Reliability in Higher Education Assessment
  • However, validity and reliability are not inherent features of assessments or assessment systems and must be monitored continuously throughout the design and implementation of an assessment system. Research studies of a theoretical or empirical nature addressing methodology for ensuring and testing validity and reliability in the higher education assessment process, results of validity and reliability studies, and novel approaches to the concepts of validity and reliability in higher education assessment are all welcome. To be most helpful in this academic exchange, empirical studies should be clear and explicit about their methodology so that others can replicate or advance their research.
  •  
    We should take this opportunity seriously and write up our work. Let me know if you want to join me.
Corinna Lo

Scoring rubric development: validity and reliability. Moskal, Barbara M. & Jon A. Leydens - 1 views

  •  
    "One purpose of this article is to provide clear definitions of the terms "validity" and "reliability" and illustrate these definitions through examples. A second purpose is to clarify how these issues may be addressed in the development of scoring rubrics."
Gary Brown

News: Assessing the Assessments - Inside Higher Ed - 2 views

  • The validity of a measure is based on evidence regarding the inferences and assumptions that are intended to be made and the uses to which the measure will be put. Showing that the three tests in question are comparable does not support Shulenburger's assertion regarding the value-added measure as a valid indicator of institutional effectiveness. The claim that public university groups have previously judged the value-added measure as appropriate does not tell us anything about the evidence upon which this judgment was based nor the conditions under which the judgment was reached. As someone familiar with the process, I would assert that there was no compelling evidence presented that these instruments and the value-added measure were validated for making this assertion (no such evidence was available at the time), which is the intended use in the VSA.
  • (however much the sellers of these tests tell you that those samples are "representative"), they provide an easy way out for academic administrators who want to avoid the time-and-effort consuming but incredibly valuable task of developing detailed major program learning outcome statements (even the specialized accrediting bodies don't get down to the level of discrete, operational statements that guide faculty toward appropriate assessment design)
  • f somebody really cared about "value added," they could look at each student's first essay in this course, and compare it with that same student's last essay in this course. This person could then evaluate each individual student's increased mastery of the subject-matter in the course (there's a lot) and also the increased writing skill, if any.
  • ...1 more annotation...
  • These skills cannot be separated out from student success in learning sophisticated subject-matter, because understanding anthropology, or history of science, or organic chemistry, or Japanese painting, is not a matter of absorbing individual facts, but learning facts and ways of thinking about them in a seamless, synthetic way. No assessment scheme that neglects these obvious facts about higher education is going to do anybody any good, and we'll be wasting valuable intellectual and financial resources if we try to design one.
  •  
    ongoing discussion of these tools. Note Longanecker's comment and ask me why.
Theron DesRosier

OECD Feasibility Study for the International Assessment of Higher Education Learning Ou... - 3 views

  •  
    "What is AHELO? The OECD Assessment of Higher Education Learning Outcomes (AHELO) is a ground-breaking initiative to assess learning outcomes on an international scale by creating measures that would be valid for all cultures and languages. Between ten and thirty-thousand higher education students in over ten different countries will take part in a feasibility study to determine the bounds of this ambitious project, with an eye to the possible creation of a full-scale AHELO upon its completion."
Gary Brown

A Critic Sees Deep Problems in the Doctoral Rankings - Faculty - The Chronicle of Highe... - 1 views

  • This week he posted a public critique of the NRC study on his university's Web site.
  • "Little credence should be given" to the NRC's ranges of rankings.
  • There's not very much real information about quality in the simple measures they've got."
  • ...4 more annotations...
  • The NRC project's directors say that those small samples are not a problem, because the reputational scores were not converted directly into program assessments. Instead, the scores were used to develop a profile of the kinds of traits that faculty members value in doctoral programs in their field.
  • For one thing, Mr. Stigler says, the relationships between programs' reputations and the various program traits are probably not simple and linear.
  • if these correlations between reputation and citations were plotted on a graph, the most accurate representation would be a curved line, not a straight line. (The curve would occur at the tipping point where high citation levels make reputations go sky-high.)
  • Mr. Stigler says that it was a mistake for the NRC to so thoroughly abandon the reputational measures it used in its previous doctoral studies, in 1982 and 1995. Reputational surveys are widely criticized, he says, but they do provide a check on certain kinds of qualitative measures.
  •  
    What is not challenged is the validity and utility of the construct itself--reputation rankings.
Gary Brown

Can We Promote Experimentation and Innovation in Learning as well as Accountability? In... - 0 views

  •  
    he VALUE project comes into the middle of this tension, as it proposes to create frameworks (or metarubrics) that provide flexible criteria for making valid judgments about student work that might result from a wide range of assessments and learning opportunities, over time. In this interview, Terrel Rhodes, Director of the VALUE project and Vice President of the Association for American Colleges and Universities (AAC&U) describes the assumptions and goals behind the Project. He especially addresses how electronic portfolios serve those goals as the locus of evaluation by educators, providing frameworks for judgments tailored to local contexts but calibrated to "Essential Learning Outcomes," with broad significance for student achievement. The aims and ambitions of the VALUE Project have the potential to move us further down the road toward a more systematic engagement with the expansion of learning. -Randy Bass
  •  
    This paragraph is the one with the most interesting set of assumptions. There are implications about "validity" Bass notes earlier and the role of numbers as "less robust" rather than, say, an interesting and important ingredient in that conversation. Mostly though I see the designation that the rubrics are "too broad to be useful" as a flag that these are not really rubrics, but, well, flags...
Theron DesRosier

Gartner Identifies the Top 10 Strategic Technologies for 2009 - 0 views

  •  
    Social Software and Social Networking. Social software includes a broad range of technologies, such as social networking, social collaboration, social media and social validation. Organizations should consider adding a social dimension to a conventional Web site or application and should adopt a social platform sooner, rather than later, because the greatest risk lies in failure to engage and thereby, being left mute in a dialogue where your voice must be heard.
Gary Brown

Educators Mull How to Motivate Professors to Improve Teaching - Curriculum - The Chroni... - 4 views

  • "Without an unrelenting focus on quality—on defining and measuring and ensuring the learning outcomes of students—any effort to increase college-completion rates would be a hollow effort indeed."
  • If colleges are going to provide high-quality educations to millions of additional students, they said, the institutions will need to develop measures of student learning than can assure parents, employers, and taxpayers that no one's time and money are being wasted.
  • "Effective assessment is critical to ensure that our colleges and universities are delivering the kinds of educational experiences that we believe we actually provide for students," said Ronald A. Crutcher, president of Wheaton College, in Massachusetts, during the opening plenary. "That data is also vital to addressing the skepticism that society has about the value of a liberal education."
  • ...13 more annotations...
  • But many speakers insisted that colleges should go ahead and take drastic steps to improve the quality of their instruction, without using rigid faculty-incentive structures or the fiscal crisis as excuses for inaction.
  • Handing out "teacher of the year" awards may not do much for a college
  • W.E. Deming argued, quality has to be designed into the entire system and supported by top management (that is, every decision made by CEOs and Presidents, and support systems as well as operations) rather than being made the responsibility solely of those delivering 'at the coal face'.
  • I see as a certain cluelessness among those who think one can create substantial change based on volunteerism
  • Current approaches to broaden the instructional repertoires of faculty members include faculty workshops, summer leave, and individual consultations, but these approaches work only for those relatively few faculty members who seek out opportunities to broaden their instructional methods.
  • The approach that makes sense to me is to engage faculty members at the departmental level in a discussion of the future and the implications of the future for their field, their college, their students, and themselves. You are invited to join an ongoing discussion of this issue at http://innovate-ideagora.ning.com/forum/topics/addressing-the-problem-of
  • Putting pressure on professors to improve teaching will not result in better education. The primary reason is that they do not know how to make real improvements. The problem is that in many fields of education there is either not enough research, or they do not have good ways of evaluationg the results of their teaching.
  • Then there needs to be a research based assessment that can be used by individual professors, NOT by the administration.
  • Humanities educatiors either have to learn enough statistics and cognitive science so they can make valid scientific comparisons of different strategies, or they have to work with cognitive scientists and statisticians
  • good teaching takes time
  • On the measurement side, about half of the assessments constructed by faculty fail to meet reasonable minimum standards for validity. (Interestingly, these failures leave the door open to a class action lawsuit. Physicians are successfully sued for failing to apply scientific findings correctly; commerce is replete with lawsuits based on measurement errors.)
  • The elephant in the corner of the room --still-- is that we refuse to measure learning outcomes and impact, especially proficiencies generalized to one's life outside the classroom.
  • until universities stop playing games to make themselves look better because they want to maintain their comfortable positions and actually look at what they can do to improve nothing is going to change.
  •  
    our work, our friends (Ken and Jim), and more context that shapes our strategy.
  •  
    How about using examples of highly motivational lecture and teaching techniques like the Richard Dawkins video I presented on this forum, recently. Even if teacher's do not consciously try to adopt good working techniques, there is at least a strong subconscious human tendency to mimic behaviors. I think that if teachers see more effective techniques, they will automatically begin to adopt adopt them.
Corinna Lo

A comparison of consensus, consistency, and measurement approaches to estimating interr... - 2 views

  •  
    "The three general categories for computing interrater reliability introduced and described in this paper are: 1) consensus estimates, 2) consistency estimates, and 3) measurement estimates. The assumptions, interpretation, advantages, and disadvantages of estimates from each of these three categories are discussed, along with several popular methods of computing interrater reliability coefficients that fall under the umbrella of consensus, consistency, and measurement estimates. Researchers and practitioners should be aware that different approaches to estimating interrater reliability carry with them different implications for how ratings across multiple judges should be summarized, which may impact the validity of subsequent study results."
Gary Brown

Want Students to Take an Optional Test? Wave 25 Bucks at Them - Students - The Chronicl... - 0 views

  • cash, appears to be the single best approach for colleges trying to recruit students to volunteer for institutional assessments and other low-stakes tests with no bearing on their grades.
  • American Educational Research Association
  • A college's choice of which incentive to offer does not appear to have a significant effect on how students end up performing, but it can have a big impact on colleges' ability to round up enough students for the assessments, the study found.
  • ...6 more annotations...
  • "I cannot provide you with the magic bullet that will help you recruit your students and make sure they are performing to the maximum of their ability," Mr. Steedle acknowledged to his audience at the Denver Convention Center. But, he said, his study results make clear that some recruitment strategies are more effective than others, and also offer some notes of caution for those examining students' scores.
  • The study focused on the council's Collegiate Learning Assessment, or CLA, an open-ended test of critical thinking and writing skills which is annually administered by several hundred colleges. Most of the colleges that use the test try to recruit 100 freshmen and 100 seniors to take it, but doing so can be daunting, especially for colleges that administer it in the spring, right when the seniors are focused on wrapping up their work and graduating.
  • The incentives that spurred students the least were the opportunity to help their college as an institution assess student learning, the opportunity to compare themselves to other students, a promise they would be recognized in some college publication, and the opportunity to put participation in the test on their resume.
  • The incentives which students preferred appeared to have no significant bearing on their performance. Those who appeared most inspired by a chance to earn 25 dollars did not perform better on the CLA than those whose responses suggested they would leap at the chance to help out a professor.
  • What accounted for differences in test scores? Students' academic ability going into the test, as measured by characteristics such as their SAT scores, accounted for 34 percent of the variation in CLA scores among individual students. But motivation, independent of ability, accounted for 5 percent of the variation in test scores—a finding that, the paper says, suggests it is "sensible" for colleges to be concerned that students with low motivation are not posting scores that can allow valid comparisons with other students or valid assessments of their individual strengths and weaknesses.
  • A major limitation of the study was that Mr. Steedle had no way of knowing how the students who took the test were recruited. "If many of them were recruited using cash and prizes, it would not be surprising if these students reported cash and prizes as the most preferable incentives," his paper concedes.
  •  
    Since it is not clear if the incentive to participate in this study influenced the decision to participate, it remains similarly unclear if incentives to participate correlate with performance.
Gary Brown

IJ-SoTL: Current Issue: Volumn 3, Number 2 - July 2009 - 0 views

  • A Method for Collaboratively Developing and Validating a Rubric Sandra Allen (Columbia College Chicago) & John Knight (University of Tennessee at Martin)
  •  
    at last a decent article on rubric development--a good place to jump off.
Theron DesRosier

Pontydysgu - Bridge to Learning » Blog Archive » Learning in practice - a soc... - 0 views

  •  
    Complex inter-relationship between: space, time, locality, practice, boundary crossings between different practices. For example trainee doctor in the hospital in one practice, translation of this experience into 'evidence for assessment purposes' needs to then be 'validated' by auditors in another community of practice.
Nils Peterson

Views: The Limitations of Portfolios - Inside Higher Ed - 1 views

  • Gathering valid data about student performance levels and performance improvement requires making comparisons relative to fixed benchmarks and that can only be done when the assessments are standardized. Consequently, we urge the higher education community to embrace authentic, standardized performance-assessment approaches so as to gather valid data that can be used to improve teaching and learning as well as meet its obligations to external audiences to account for its actions and outcomes regarding student learning.
    • Nils Peterson
       
      Diigoed because this is the counter-argument to our work.
Gary Brown

Grazing: Criteria for great assessment tools - 1 views

  •  
    perhaps these sum to utility, but number 5---generativity- would benefit from some unpacking.-
Gary Brown

Disciplines Follow Their Own Paths to Quality - Faculty - The Chronicle of Higher Educa... - 2 views

  • But when it comes to the fundamentals of measuring and improving student learning, engineering professors naturally have more to talk about with their counterparts at, say, Georgia Tech than with the humanities professors at Villanova
    • Gary Brown
       
      Perhaps this is too bad....
  • But there is no nationally normed way to measure the particular kind of critical thinking that students of classics acquire
  • er colleagues have created discipline-specific critical-reasoning tests for classics and political science
  • ...5 more annotations...
  • Political science cultivates skills that are substantially different from those in classics, and in each case those skills can't be measured with a general-education test.
  • he wants to use tests of reasoning that are appropriate for each discipline
  • I believe Richard Paul has spent a lifetime articulating the characteristics of discipline based critical thinking. But anyway, I think it is interesting that an attempt is being made to develop (perhaps) a "national standard" for critical thinking in classics. In order to assess anything effectively we need a standard. Without a standard there are no criteria and therefore no basis from which to assess. But standards do not necessarily have to be established at the national level. This raises the issue of scale. What is the appropriate scale from which to measure the quality and effectiveness of an educational experience? Any valid approach to quality assurance has to be multi-scaled and requires multiple measures over time. But to be honest the issues of standards and scale are really just the tip of the outcomes iceberg.
    • Gary Brown
       
      Missing the notion that the variance is in the activity more than the criteria.  We hear little of embedding nationally normed and weighted assignments and then assessing the implementation and facilitation variables.... mirror, not lens.
  • the UW Study of Undergraduate Learning (UW SOUL). Results from the UW SOUL show that learning in college is disciplinary; therefore, real assessment of learning must occur (with central support and resources)in the academic departments. Generic approaches to assessing thinking, writing, research, quantitative reasoning, and other areas of learning may be measuring something, but they cannot measure learning in college.
  • It turns out there is a six week, or 210+ hour serious reading exposure to two or more domains outside ones own, that "turns on" cross domain mapping as a robust capability. Some people just happen to have accumulated, usually by unseen and unsensed happenstance involvements (rooming with an engineer, son of a dad changing domains/careers, etc.) this minimum level of basics that allows robust metaphor based mapping.
Theron DesRosier

Ethics in Assessment. ERIC Digest. - 2 views

  •  
    "Those who are involved with assessment are unfortunately not immune to unethical practices. Abuses in preparing students to take tests as well as in the use and interpretation of test results have been widely publicized. Misuses of test data in high-stakes decisions, such as scholarship awards, retention/promotion decisions, and accountability decisions, have been reported all too frequently. Even claims made in advertisements about the success rates of test coaching courses have raised questions about truth in advertising. Given these and other occurrences of unethical behavior associated with assessment, the purpose of this digest is to examine the available standards of ethical practice in assessment and the issues associated with implementation of these standards. "
Gary Brown

Wise Men Gone: Stephen Toulmin and John E. Smith - The Chronicle Review - The Chronicle... - 0 views

  • Toulmin, born in London in 1922, earned his undergraduate degree in 1942 from King's College, Cambridge, in mathematics and physics. After participating in radar research and intelligence work during World War II in England and at Allied headquarters in Germany, he returned to Cambridge, where he studied with Ludwig Wittgenstein, the greatest influence on his thought, earning his Ph.D. in moral philosophy in 1948.
  • Toulmin moved to the United States, where he taught at Brandeis, Michigan State, and Northwestern Universities and the University of Chicago before landing in 1993 at the University of Southern California.
  • Toulmin's first, most enduring contribution to keeping philosophy sensible came in his 1958 book, The Uses of Argument (Cambridge University Press). Deceptively formalistic on its surface because it posited a general model of argument, Toulmin's view, in fact, was better described as taxonomic, yet flexible. He believed that formal systems of logic misrepresent the complex way that humans reason in most fields requiring what philosophers call "practical reason," and he offered, accordingly, a theory of knowledge as warranted belief.
  • ...4 more annotations...
  • Toulmin rejected the abstract syllogistic logic, meant to produce absolute standards for proving propositions true, that had become fashionable in analytic philosophy. Instead he argued (in the spirit of Wittgenstein) that philosophers must monitor how people actually argue if the philosophers' observations about persuasion are to make any sense. Toulmin took jurisprudential reasoning as his chief example in The Uses of Argument, but he believed that some aspects of a good argument depend on the field in which they're presented, while others are "field invariant."
  • Toulmin's "central thesis is that every sort of argumentation can in principle claim rationality and that the criteria to be applied when determining the soundness of the argumentation depend on the nature of the problems to which the argumentation relates."
  • But Toulmin, trained in the hard sciences and mathematics himself, saw through the science worship of less-credentialed sorts. He didn't relent, announcing "our need to reappropriate the wisdom of the 16th-century humanists, and develop a point of view that combines the abstract rigor and exactitude of the 17th-century 'new philosophy' with a practical concern for human life in its concrete detail."
  • Toulmin declared its upshot: "From now on, permanent validity must be set aside as illusory, and our idea of rationality related to specific functions of ... human reason. ... For me personally, the outcome of 40 years of philosophical critique was thus a new vision of—so to speak—the rhetoric of philosophy."
  •  
    FYI, Toulmin was the primary influence on the first WSU Critical Thinking Rubric. (Carella was the other philosopher.)
Nils Peterson

AAC&U News | April 2010 | Feature - 1 views

  • Comparing Rubric Assessments to Standardized Tests
  • First, the university, a public institution of about 40,000 students in Ohio, needed to comply with the Voluntary System of Accountability (VSA), which requires that state institutions provide data about graduation rates, tuition, student characteristics, and student learning outcomes, among other measures, in the consistent format developed by its two sponsoring organizations, the Association of Public and Land-grant Universities (APLU), and the Association of State Colleges and Universities (AASCU).
  • And finally, UC was accepted in 2008 as a member of the fifth cohort of the Inter/National Coalition for Electronic Portfolio Research, a collaborative body with the goal of advancing knowledge about the effect of electronic portfolio use on student learning outcomes.  
  • ...13 more annotations...
  • outcomes required of all UC students—including critical thinking, knowledge integration, social responsibility, and effective communication
  • “The wonderful thing about this approach is that full-time faculty across the university  are gathering data about how their  students are doing, and since they’ll be teaching their courses in the future, they’re really invested in rubric assessment—they really care,” Escoe says. In one case, the capstone survey data revealed that students weren’t doing as well as expected in writing, and faculty from that program adjusted their pedagogy to include more writing assignments and writing assessments throughout the program, not just at the capstone level. As the university prepares to switch from a quarter system to semester system in two years, faculty members are using the capstone survey data to assist their course redesigns, Escoe says.
  • the university planned a “dual pilot” study examining the applicability of electronic portfolio assessment of writing and critical thinking alongside the Collegiate Learning Assessment,
  • The rubrics the UC team used were slightly modified versions of those developed by AAC&U’s Valid Assessment of Learning in Undergraduate Education (VALUE) project. 
  • In the critical thinking rubric assessment, for example, faculty evaluated student proposals for experiential honors projects that they could potentially complete in upcoming years.  The faculty assessors were trained and their rubric assessments “normed” to ensure that interrater reliability was suitably high.
  • “We found no statistically significant correlation between the CLA scores and the portfolio scores,”
  • There were many factors that may have contributed to the lack of correlation, she says, including the fact that the CLA is timed, while the rubric assignments are not; and that the rubric scores were diagnostic and included specific feedback, while the CLA awarded points “in a black box”:
  • faculty members may have had exceptionally high expectations of their honors students and assessed the e-portfolios with those high expectations in mind—leading to results that would not correlate to a computer-scored test. 
  • “The CLA provides scores at the institutional level. It doesn’t give me a picture of how I can affect those specific students’ learning. So that’s where rubric assessment comes in—you can use it to look at data that’s compiled over time.”
  • Their portfolios are now more like real learning portfolios, not just a few artifacts, and we want to look at them as they go into their third and fourth years to see what they can tell us about students’ whole program of study.”  Hall and Robles are also looking into the possibility of forming relationships with other schools from NCEPR to exchange student e-portfolios and do a larger study on the value of rubric assessment of student learning.
  • “We’re really trying to stress that assessment is pedagogy,”
  • “It’s not some nitpicky, onerous administrative add-on. It’s what we do as we teach our courses, and it really helps close that assessment loop.”
  • In the end, Escoe says, the two assessments are both useful, but for different things. The CLA can provide broad institutional data that satisfies VSA requirements, while rubric-based assessment provides better information to facilitate continuous program improvement.
    • Nils Peterson
       
      CLA did not provide information for continuous program improvement -- we've heard this argument before
  •  
    The lack of correlation might be rephrased--there appears to be no corrlation between what is useful for faculty who teach and what is useful for the VSA. A corollary question: Of what use is the VSA?
Gary Brown

Faith in Prior Learning Was Well Placed - Letters to the Editor - The Chronicle of High... - 1 views

  • The recognition that a college that offered credit for experiential learning could stand with traditional institutions, while commonplace today, was a leap of faith then. Empire State had to demonstrate its validity through results—educational outcomes—and on that score, it stood tall. In fact, focusing on outcomes, as we did, led many of us to question how well traditional institutions would measure up!
  •  
    an important question.
1 - 20 of 25 Next ›
Showing 20 items per page