Skip to main content

Home/ TOK Friends/ Group items tagged Statistics

Rss Feed Group items tagged

Javier E

The decline effect and the scientific method : The New Yorker - 3 views

  • The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
  • But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable.
  • This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology.
  • ...39 more annotations...
  • If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe?
  • Schooler demonstrated that subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it. Schooler called the phenomenon “verbal overshadowing.”
  • The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time.
  • yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!
  • this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics
  • In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance.
  • Jennions admits that his findings are troubling, but expresses a reluctance to talk about them
  • publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
  • While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts.
  • Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for
  • Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments.
  • One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
  • suspects that an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
  • after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results. Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.”
  • Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
  • Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “sho
  • horning” process.
  • “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
  • For Simmons, the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
  • John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.”
  • In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals.
  • the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
  • the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
  • According to Ioannidis, the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher.
  • One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials.
  • The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong.
  • “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies
  • That’s why Schooler argues that scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,”
  • The current “obsession” with replicability distracts from the real problem, which is faulty design.
  • “Every researcher should have to spell out, in advance, how many subjects they’re going to use, and what exactly they’re testing, and what constitutes a sufficient level of proof. We have the tools to be much more transparent about our experiments.”
  • Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,”
  • scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging.
  • The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand.
  • The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected
  • This suggests that the decline effect is actually a decline of illusion. While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that.
  • Many scientific theories continue to be considered true even after failing numerous experimental tests.
  • Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.)
  • Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.)
  • The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe. ♦
Javier E

Nate Cohn Reviews Charles Wheelan's "Naked Statistics" | The New Republic - 0 views

  • With data-driven analysts gaining newfound fame, there’s a growing mythology about the power of statistics. Some appear to be convinced that statistics possess nearly unlimited explanatory power, at least if the data is good enough.
  • To the extent that growing segments of the media and public have become convinced that statistics can yield dispositive answers to difficult problems, it may be increasingly necessary to offer as much attention to the types of questions that statistics can’t answer as the ones that statistics can answer well.
  • The growing mythology around data-driven analysis isn’t just a testament to recent high-profile successes; it’s also the result of a lack of knowledge.
Emily Horwitz

'Naked Statistics' by Charles Wheelan - Review - NYTimes.com - 2 views

  • Whether you are healthy, moribund or traversing the stages of decrepitude in between, every morsel of medical advice you receive is pure conjecture — educated guesswork perhaps, but guesswork nonetheless. Your health care provider and your favorite columnist are both mere croupiers, enablers for your health gambling habit.
  • Staying well is all about probability and risk. So is the interpretation of medical tests, and so are all treatments for all illnesses, dire and trivial alike. Health has nothing in common with the laws of physics and everything in common with lottery cards, mutual funds and tomorrow’s weather forecast.
  • Are you impressed with studies showing that people who take Vitamin X or perform Exercise Y live longer? Remember, correlation does not imply causation. Do you obsess over studies claiming to show that various dietary patterns cause cancer? In fact, Mr. Wheelan points out, this kind of research examines not so much how diet affects the likelihood of cancer as how getting cancer affects people’s memory of what they used to eat.
  • ...3 more annotations...
  • the rest comes from his multiple real world examples illustrating exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life, whether that individual is watching football on the couch, picking a school for the children or jiggling anxiously in a hospital admitting office.
  • And while we’re talking about bias, let’s not forget publication bias: studies that show a drug works get published, but those showing a drug does nothing tend to disappear.
  • The same trade-off applies to the interpretation of medical tests. Unproven disease screens are likely to do little but feed lots of costly, anxiety-producing garbage into your medical record.
  •  
    An interesting article/review of a book that compares statistics and human health. Interestingly enough, it shows that statistics and studies about health are often taken to be true and misinterpreted because we want them to be true, and we want to believe that some minor change in our lifestyles may somehow prevent us from getting cancer, for example. More info about the book from the publisher: http://books.wwnorton.com/books/detail.aspx?ID=24713
Javier E

His Job Was to Make Instagram Safe for Teens. His 14-Year-Old Showed Him What the App W... - 0 views

  • The experience of young users on Meta’s Instagram—where Bejar had spent the previous two years working as a consultant—was especially acute. In a subsequent email to Instagram head Adam Mosseri, one statistic stood out: One in eight users under the age of 16 said they had experienced unwanted sexual advances on the platform over the previous seven days.
  • For Bejar, that finding was hardly a surprise. His daughter and her friends had been receiving unsolicited penis pictures and other forms of harassment on the platform since the age of 14, he wrote, and Meta’s systems generally ignored their reports—or responded by saying that the harassment didn’t violate platform rules.
  • “I asked her why boys keep doing that,” Bejar wrote to Zuckerberg and his top lieutenants. “She said if the only thing that happens is they get blocked, why wouldn’t they?”
  • ...39 more annotations...
  • For the well-being of its users, Bejar argued, Meta needed to change course, focusing less on a flawed system of rules-based policing and more on addressing such bad experiences
  • The company would need to collect data on what upset users and then work to combat the source of it, nudging those who made others uncomfortable to improve their behavior and isolating communities of users who deliberately sought to harm others.
  • “I am appealing to you because I believe that working this way will require a culture shift,” Bejar wrote to Zuckerberg—the company would have to acknowledge that its existing approach to governing Facebook and Instagram wasn’t working.
  • During and after Bejar’s time as a consultant, Meta spokesman Andy Stone said, the company has rolled out several product features meant to address some of the Well-Being Team’s findings. Those features include warnings to users before they post comments that Meta’s automated systems flag as potentially offensive, and reminders to be kind when sending direct messages to users like content creators who receive a large volume of messages. 
  • Meta’s classifiers were reliable enough to remove only a low single-digit percentage of hate speech with any degree of precision.
  • Bejar was floored—all the more so when he learned that virtually all of his daughter’s friends had been subjected to similar harassment. “DTF?” a user they’d never met would ask, using shorthand for a vulgar proposition. Instagram acted so rarely on reports of such behavior that the girls no longer bothered reporting them. 
  • Meta’s own statistics suggested that big problems didn’t exist. 
  • Meta had come to approach governing user behavior as an overwhelmingly automated process. Engineers would compile data sets of unacceptable content—things like terrorism, pornography, bullying or “excessive gore”—and then train machine-learning models to screen future content for similar material.
  • While users could still flag things that upset them, Meta shifted resources away from reviewing them. To discourage users from filing reports, internal documents from 2019 show, Meta added steps to the reporting process. Meta said the changes were meant to discourage frivolous reports and educate users about platform rules. 
  • The outperformance of Meta’s automated enforcement relied on what Bejar considered two sleights of hand. The systems didn’t catch anywhere near the majority of banned content—only the majority of what the company ultimately removed
  • “Please don’t talk about my underage tits,” Bejar’s daughter shot back before reporting his comment to Instagram. A few days later, the platform got back to her: The insult didn’t violate its community guidelines.
  • Also buttressing Meta’s statistics were rules written narrowly enough to ban only unambiguously vile material. Meta’s rules didn’t clearly prohibit adults from flooding the comments section on a teenager’s posts with kiss emojis or posting pictures of kids in their underwear, inviting their followers to “see more” in a private Facebook Messenger group. 
  • “Mark personally values freedom of expression first and foremost and would say this is a feature and not a bug,” Rosen responded
  • Narrow rules and unreliable automated enforcement systems left a lot of room for bad behavior—but they made the company’s child-safety statistics look pretty good according to Meta’s metric of choice: prevalence.
  • Defined as the percentage of content viewed worldwide that explicitly violates a Meta rule, prevalence was the company’s preferred measuring stick for the problems users experienced.
  • According to prevalence, child exploitation was so rare on the platform that it couldn’t be reliably estimated, less than 0.05%, the threshold for functional measurement. Content deemed to encourage self-harm, such as eating disorders, was just as minimal, and rule violations for bullying and harassment occurred in just eight of 10,000 views. 
  • “There’s a grading-your-own-homework problem,”
  • Meta defines what constitutes harmful content, so it shapes the discussion of how successful it is at dealing with it.”
  • It could reconsider its AI-generated “beauty filters,” which internal research suggested made both the people who used them and those who viewed the images more self-critical
  • the team built a new questionnaire called BEEF, short for “Bad Emotional Experience Feedback.
  • A recurring survey of issues 238,000 users had experienced over the past seven days, the effort identified problems with prevalence from the start: Users were 100 times more likely to tell Instagram they’d witnessed bullying in the last week than Meta’s bullying-prevalence statistics indicated they should.
  • “People feel like they’re having a bad experience or they don’t,” one presentation on BEEF noted. “Their perception isn’t constrained by policy.
  • they seemed particularly common among teens on Instagram.
  • Among users under the age of 16, 26% recalled having a bad experience in the last week due to witnessing hostility against someone based on their race, religion or identity
  • More than a fifth felt worse about themselves after viewing others’ posts, and 13% had experienced unwanted sexual advances in the past seven days. 
  • The vast gap between the low prevalence of content deemed problematic in the company’s own statistics and what users told the company they experienced suggested that Meta’s definitions were off, Bejar argued
  • To minimize content that teenagers told researchers made them feel bad about themselves, Instagram could cap how much beauty- and fashion-influencer content users saw.
  • Proving to Meta’s leadership that the company’s prevalence metrics were missing the point was going to require data the company didn’t have. So Bejar and a group of staffers from the Well-Being Team started collecting it
  • And it could build ways for users to report unwanted contacts, the first step to figuring out how to discourage them.
  • One experiment run in response to BEEF data showed that when users were notified that their comment or post had upset people who saw it, they often deleted it of their own accord. “Even if you don’t mandate behaviors,” said Krieger, “you can at least send signals about what behaviors aren’t welcome.”
  • But among the ranks of Meta’s senior middle management, Bejar and Krieger said, BEEF hit a wall. Managers who had made their careers on incrementally improving prevalence statistics weren’t receptive to the suggestion that the approach wasn’t working. 
  • After three decades in Silicon Valley, he understood that members of the company’s C-Suite might not appreciate a damning appraisal of the safety risks young users faced from its product—especially one citing the company’s own data. 
  • “This was the email that my entire career in tech trained me not to send,” he says. “But a part of me was still hoping they just didn’t know.”
  • “Policy enforcement is analogous to the police,” he wrote in the email Oct. 5, 2021—arguing that it’s essential to respond to crime, but that it’s not what makes a community safe. Meta had an opportunity to do right by its users and take on a problem that Bejar believed was almost certainly industrywide.
  • fter Haugen’s airing of internal research, Meta had cracked down on the distribution of anything that would, if leaked, cause further reputational damage. With executives privately asserting that the company’s research division harbored a fifth column of detractors, Meta was formalizing a raft of new rules for employees’ internal communication.
  • Among the mandates for achieving “Narrative Excellence,” as the company called it, was to keep research data tight and never assert a moral or legal duty to fix a problem.
  • “I had to write about it as a hypothetical,” Bejar said. Rather than acknowledging that Instagram’s survey data showed that teens regularly faced unwanted sexual advances, the memo merely suggested how Instagram might help teens if they faced such a problem.
  • The hope that the team’s work would continue didn’t last. The company stopped conducting the specific survey behind BEEF, then laid off most everyone who’d worked on it as part of what Zuckerberg called Meta’s “year of efficiency.
  • If Meta was to change, Bejar told the Journal, the effort would have to come from the outside. He began consulting with a coalition of state attorneys general who filed suit against the company late last month, alleging that the company had built its products to maximize engagement at the expense of young users’ physical and mental health. Bejar also got in touch with members of Congress about where he believes the company’s user-safety efforts fell short. 
Javier E

The Data Vigilante - Christopher Shea - The Atlantic - 0 views

  • He is, on the contrary, seized by the conviction that science is beset by sloppy statistical maneuvering and, in some cases, outright fraud. He has therefore been moonlighting as a fraud-buster, developing techniques to help detect doctored data in other people’s research. Already, in the space of less than a year, he has blown up two colleagues’ careers.
  • In a paper called “False-Positive Psychology,” published in the prestigious journal Psychological Science, he and two colleagues—Leif Nelson, a professor at the University of California at Berkeley, and Wharton’s Joseph Simmons—showed that psychologists could all but guarantee an interesting research finding if they were creative enough with their statistics and procedures.
  • By going on what amounted to a fishing expedition (that is, by recording many, many variables but reporting only the results that came out to their liking); by failing to establish in advance the number of human subjects in an experiment; and by analyzing the data as they went, so they could end the experiment when the results suited them, they produced a howler of a result, a truly absurd finding. They then ran a series of computer simulations using other experimental data to show that these methods could increase the odds of a false-positive result—a statistical fluke, basically—to nearly two-thirds.
  • ...2 more annotations...
  • “I couldn’t tolerate knowing something was fake and not doing something about it,” he told me. “Everything loses meaning. What’s the point of writing a paper, fighting very hard to get it published, going to conferences?”
  • Simonsohn stressed that there’s a world of difference between data techniques that generate false positives, and fraud, but he said some academic psychologists have, until recently, been dangerously indifferent to both. Outright fraud is probably rare. Data manipulation is undoubtedly more common—and surely extends to other subjects dependent on statistical study, including biomedicine. Worse, sloppy statistics are “like steroids in baseball”: Throughout the affected fields, researchers who are too intellectually honest to use these tricks will publish less, and may perish. Meanwhile, the less fastidious flourish.
Javier E

Noam Chomsky on Where Artificial Intelligence Went Wrong - Yarden Katz - The Atlantic - 1 views

  • Skinner's approach stressed the historical associations between a stimulus and the animal's response -- an approach easily framed as a kind of empirical statistical analysis, predicting the future as a function of the past.
  • Chomsky's conception of language, on the other hand, stressed the complexity of internal representations, encoded in the genome, and their maturation in light of the right data into a sophisticated computational system, one that cannot be usefully broken down into a set of associations.
  • Chomsky acknowledged that the statistical approach might have practical value, just as in the example of a useful search engine, and is enabled by the advent of fast computers capable of processing massive data. But as far as a science goes, Chomsky would argue it is inadequate, or more harshly, kind of shallow
  • ...17 more annotations...
  • David Marr, a neuroscientist colleague of Chomsky's at MIT, defined a general framework for studying complex biological systems (like the brain) in his influential book Vision,
  • a complex biological system can be understood at three distinct levels. The first level ("computational level") describes the input and output to the system, which define the task the system is performing. In the case of the visual system, the input might be the image projected on our retina and the output might our brain's identification of the objects present in the image we had observed. The second level ("algorithmic level") describes the procedure by which an input is converted to an output, i.e. how the image on our retina can be processed to achieve the task described by the computational level. Finally, the third level ("implementation level") describes how our own biological hardware of cells implements the procedure described by the algorithmic level.
  • The emphasis here is on the internal structure of the system that enables it to perform a task, rather than on external association between past behavior of the system and the environment. The goal is to dig into the "black box" that drives the system and describe its inner workings, much like how a computer scientist would explain how a cleverly designed piece of software works and how it can be executed on a desktop computer.
  • As written today, the history of cognitive science is a story of the unequivocal triumph of an essentially Chomskyian approach over Skinner's behaviorist paradigm -- an achievement commonly referred to as the "cognitive revolution,"
  • While this may be a relatively accurate depiction in cognitive science and psychology, behaviorist thinking is far from dead in related disciplines. Behaviorist experimental paradigms and associationist explanations for animal behavior are used routinely by neuroscientists
  • Chomsky critiqued the field of AI for adopting an approach reminiscent of behaviorism, except in more modern, computationally sophisticated form. Chomsky argued that the field's heavy use of statistical techniques to pick regularities in masses of data is unlikely to yield the explanatory insight that science ought to offer. For Chomsky, the "new AI" -- focused on using statistical learning techniques to better mine and predict data -- is unlikely to yield general principles about the nature of intelligent beings or about cognition.
  • Behaviorist principles of associations could not explain the richness of linguistic knowledge, our endlessly creative use of it, or how quickly children acquire it with only minimal and imperfect exposure to language presented by their environment.
  • it has been argued in my view rather plausibly, though neuroscientists don't like it -- that neuroscience for the last couple hundred years has been on the wrong track.
  • Implicit in this endeavor is the assumption that with enough sophisticated statistical tools and a large enough collection of data, signals of interest can be weeded it out from the noise in large and poorly understood biological systems.
  • Brenner, a contemporary of Chomsky who also participated in the same symposium on AI, was equally skeptical about new systems approaches to understanding the brain. When describing an up-and-coming systems approach to mapping brain circuits called Connectomics, which seeks to map the wiring of all neurons in the brain (i.e. diagramming which nerve cells are connected to others), Brenner called it a "form of insanity."
  • These debates raise an old and general question in the philosophy of science: What makes a satisfying scientific theory or explanation, and how ought success be defined for science?
  • Ever since Isaiah Berlin's famous essay, it has become a favorite pastime of academics to place various thinkers and scientists on the "Hedgehog-Fox" continuum: the Hedgehog, a meticulous and specialized worker, driven by incremental progress in a clearly defined field versus the Fox, a flashier, ideas-driven thinker who jumps from question to question, ignoring field boundaries and applying his or her skills where they seem applicable.
  • Chomsky's work has had tremendous influence on a variety of fields outside his own, including computer science and philosophy, and he has not shied away from discussing and critiquing the influence of these ideas, making him a particularly interesting person to interview.
  • If you take a look at the progress of science, the sciences are kind of a continuum, but they're broken up into fields. The greatest progress is in the sciences that study the simplest systems. So take, say physics -- greatest progress there. But one of the reasons is that the physicists have an advantage that no other branch of sciences has. If something gets too complicated, they hand it to someone else.
  • If a molecule is too big, you give it to the chemists. The chemists, for them, if the molecule is too big or the system gets too big, you give it to the biologists. And if it gets too big for them, they give it to the psychologists, and finally it ends up in the hands of the literary critic, and so on.
  • An unlikely pair, systems biology and artificial intelligence both face the same fundamental task of reverse-engineering a highly complex system whose inner workings are largely a mystery
  • neuroscience developed kind of enthralled to associationism and related views of the way humans and animals work. And as a result they've been looking for things that have the properties of associationist psychology.
Javier E

Predicting the Future Is Easier Than It Looks - By Michael D. Ward and Nils Metternich ... - 0 views

  • The same statistical revolution that changed baseball has now entered American politics, and no one has been more successful in popularizing a statistical approach to political analysis than New York Times blogger Nate Silver, who of course cut his teeth as a young sabermetrician. And on Nov. 6, after having faced a torrent of criticism from old-school political pundits -- Washington's rough equivalent of statistically illiterate tobacco chewing baseball scouts -- the results of the presidential election vindicated Silver's approach, which correctly predicted the electoral outcome in all 50 states.
  • Today, there are several dozen ongoing, public projects that aim to in one way or another forecast the kinds of things foreign policymakers desperately want to be able to predict: various forms of state failure, famines, mass atrocities, coups d'état, interstate and civil war, and ethnic and religious conflict. So while U.S. elections might occupy the front page of the New York Times, the ability to predict instances of extreme violence and upheaval represent the holy grail of statistical forecasting -- and researchers are now getting close to doing just that.
  • In 2010 scholars from the Political Instability Task Force published a report that demonstrated the ability to correctly predict onsets of instability two years in advance in 18 of 21 instances (about 85%)
  • ...5 more annotations...
  • Let's consider a case in which Ulfelder argues there is insufficient data to render a prediction -- North Korea. There is no official data on North Korean GDP, so what can we do? It turns out that the same data science approaches that were used to aggregate polls have other uses as well. One is the imputation of missing data. Yes, even when it is all missing. The basic idea is to use the general correlations among data that you do have to provide an aggregate way of estimating information that we don't have.
  • In 2012 there were two types of models: one type based on fundamentals such as economic growth and unemployment and another based on public opinion surveys
  • As it turned out, in this month's election public opinion polls were considerably more precise than the fundamentals. The fundamentals were not always providing bad predictions, but better is better.
  • There is a tradition in world politics to go either back until the Congress of Vienna (when there were fewer than two dozen independent countries) or to the early 1950s after the end of the Second World War. But in reality, there is no need to do this for most studies.
  • Ulfelder tells us that "when it comes to predicting major political crises like wars, coups, and popular uprisings, there are many plausible predictors for which we don't have any data at all, and much of what we do have is too sparse or too noisy to incorporate into carefully designed forecasting models." But this is true only for the old style of models based on annual data for countries. If we are willing to face data that are collected in rhythm with the phenomena we are studying, this is not the case
Javier E

Noam Chomsky on Where Artificial Intelligence Went Wrong - Yarden Katz - The Atlantic - 0 views

  • If you take a look at the progress of science, the sciences are kind of a continuum, but they're broken up into fields. The greatest progress is in the sciences that study the simplest systems. So take, say physics -- greatest progress there. But one of the reasons is that the physicists have an advantage that no other branch of sciences has. If something gets too complicated, they hand it to someone else.
  • If a molecule is too big, you give it to the chemists. The chemists, for them, if the molecule is too big or the system gets too big, you give it to the biologists. And if it gets too big for them, they give it to the psychologists, and finally it ends up in the hands of the literary critic, and so on.
  • neuroscience for the last couple hundred years has been on the wrong track. There's a fairly recent book by a very good cognitive neuroscientist, Randy Gallistel and King, arguing -- in my view, plausibly -- that neuroscience developed kind of enthralled to associationism and related views of the way humans and animals work. And as a result they've been looking for things that have the properties of associationist psychology.
  • ...19 more annotations...
  • in general what he argues is that if you take a look at animal cognition, human too, it's computational systems. Therefore, you want to look the units of computation. Think about a Turing machine, say, which is the simplest form of computation, you have to find units that have properties like "read", "write" and "address." That's the minimal computational unit, so you got to look in the brain for those. You're never going to find them if you look for strengthening of synaptic connections or field properties, and so on. You've got to start by looking for what's there and what's working and you see that from Marr's highest level.
  • it's basically in the spirit of Marr's analysis. So when you're studying vision, he argues, you first ask what kind of computational tasks is the visual system carrying out. And then you look for an algorithm that might carry out those computations and finally you search for mechanisms of the kind that would make the algorithm work. Otherwise, you may never find anything.
  • "Good Old Fashioned AI," as it's labeled now, made strong use of formalisms in the tradition of Gottlob Frege and Bertrand Russell, mathematical logic for example, or derivatives of it, like nonmonotonic reasoning and so on. It's interesting from a history of science perspective that even very recently, these approaches have been almost wiped out from the mainstream and have been largely replaced -- in the field that calls itself AI now -- by probabilistic and statistical models. My question is, what do you think explains that shift and is it a step in the right direction?
  • AI and robotics got to the point where you could actually do things that were useful, so it turned to the practical applications and somewhat, maybe not abandoned, but put to the side, the more fundamental scientific questions, just caught up in the success of the technology and achieving specific goals.
  • The approximating unanalyzed data kind is sort of a new approach, not totally, there's things like it in the past. It's basically a new approach that has been accelerated by the existence of massive memories, very rapid processing, which enables you to do things like this that you couldn't have done by hand. But I think, myself, that it is leading subjects like computational cognitive science into a direction of maybe some practical applicability... ..in engineering? Chomsky: ...But away from understanding.
  • I was very skeptical about the original work. I thought it was first of all way too optimistic, it was assuming you could achieve things that required real understanding of systems that were barely understood, and you just can't get to that understanding by throwing a complicated machine at it.
  • if success is defined as getting a fair approximation to a mass of chaotic unanalyzed data, then it's way better to do it this way than to do it the way the physicists do, you know, no thought experiments about frictionless planes and so on and so forth. But you won't get the kind of understanding that the sciences have always been aimed at -- what you'll get at is an approximation to what's happening.
  • Suppose you want to predict tomorrow's weather. One way to do it is okay I'll get my statistical priors, if you like, there's a high probability that tomorrow's weather here will be the same as it was yesterday in Cleveland, so I'll stick that in, and where the sun is will have some effect, so I'll stick that in, and you get a bunch of assumptions like that, you run the experiment, you look at it over and over again, you correct it by Bayesian methods, you get better priors. You get a pretty good approximation of what tomorrow's weather is going to be. That's not what meteorologists do -- they want to understand how it's working. And these are just two different concepts of what success means, of what achievement is.
  • if you get more and more data, and better and better statistics, you can get a better and better approximation to some immense corpus of text, like everything in The Wall Street Journal archives -- but you learn nothing about the language.
  • the right approach, is to try to see if you can understand what the fundamental principles are that deal with the core properties, and recognize that in the actual usage, there's going to be a thousand other variables intervening -- kind of like what's happening outside the window, and you'll sort of tack those on later on if you want better approximations, that's a different approach.
  • take a concrete example of a new field in neuroscience, called Connectomics, where the goal is to find the wiring diagram of very complex organisms, find the connectivity of all the neurons in say human cerebral cortex, or mouse cortex. This approach was criticized by Sidney Brenner, who in many ways is [historically] one of the originators of the approach. Advocates of this field don't stop to ask if the wiring diagram is the right level of abstraction -- maybe it's no
  • if you went to MIT in the 1960s, or now, it's completely different. No matter what engineering field you're in, you learn the same basic science and mathematics. And then maybe you learn a little bit about how to apply it. But that's a very different approach. And it resulted maybe from the fact that really for the first time in history, the basic sciences, like physics, had something really to tell engineers. And besides, technologies began to change very fast, so not very much point in learning the technologies of today if it's going to be different 10 years from now. So you have to learn the fundamental science that's going to be applicable to whatever comes along next. And the same thing pretty much happened in medicine.
  • that's the kind of transition from something like an art, that you learn how to practice -- an analog would be trying to match some data that you don't understand, in some fashion, maybe building something that will work -- to science, what happened in the modern period, roughly Galilean science.
  • it turns out that there actually are neural circuits which are reacting to particular kinds of rhythm, which happen to show up in language, like syllable length and so on. And there's some evidence that that's one of the first things that the infant brain is seeking -- rhythmic structures. And going back to Gallistel and Marr, its got some computational system inside which is saying "okay, here's what I do with these things" and say, by nine months, the typical infant has rejected -- eliminated from its repertoire -- the phonetic distinctions that aren't used in its own language.
  • people like Shimon Ullman discovered some pretty remarkable things like the rigidity principle. You're not going to find that by statistical analysis of data. But he did find it by carefully designed experiments. Then you look for the neurophysiology, and see if you can find something there that carries out these computations. I think it's the same in language, the same in studying our arithmetical capacity, planning, almost anything you look at. Just trying to deal with the unanalyzed chaotic data is unlikely to get you anywhere, just like as it wouldn't have gotten Galileo anywhere.
  • with regard to cognitive science, we're kind of pre-Galilean, just beginning to open up the subject
  • You can invent a world -- I don't think it's our world -- but you can invent a world in which nothing happens except random changes in objects and selection on the basis of external forces. I don't think that's the way our world works, I don't think it's the way any biologist thinks it is. There are all kind of ways in which natural law imposes channels within which selection can take place, and some things can happen and other things don't happen. Plenty of things that go on in the biology in organisms aren't like this. So take the first step, meiosis. Why do cells split into spheres and not cubes? It's not random mutation and natural selection; it's a law of physics. There's no reason to think that laws of physics stop there, they work all the way through. Well, they constrain the biology, sure. Chomsky: Okay, well then it's not just random mutation and selection. It's random mutation, selection, and everything that matters, like laws of physics.
  • What I think is valuable is the history of science. I think we learn a lot of things from the history of science that can be very valuable to the emerging sciences. Particularly when we realize that in say, the emerging cognitive sciences, we really are in a kind of pre-Galilean stage. We don't know wh
  • at we're looking for anymore than Galileo did, and there's a lot to learn from that.
kushnerha

New Critique Sees Flaws in Landmark Analysis of Psychology Studies - The New York Times - 0 views

  • A landmark 2015 report that cast doubt on the results of dozens of published psychology studies has exposed deep divisions in the field, serving as a reality check for many working researchers but as an affront to others who continue to insist the original research was sound.
  • On Thursday, a group of four researchers publicly challenged the report, arguing that it was statistically flawed and, as a result, wrong.The 2015 report, called the Reproducibility Project, found that less than 40 studies in a sample of 100 psychology papers in leading journals held up when retested by an independent team. The new critique by the four researchers countered that when that team’s statistical methodology was adjusted, the rate was closer to 100 percent.Neither the original analysis nor the critique found evidence of fraud or manipulation of data.
  • “That study got so much press, and the wrong conclusions were drawn from it,” said Timothy D. Wilson, a professor of psychology at the University of Virginia and an author of the new critique. “It’s a mistake to make generalizations from something that was done poorly, and this we think was done poorly.”
  • ...6 more annotations...
  • countered that the critique was highly biased: “They are making assumptions based on selectively interpreting data and ignoring data that’s antagonistic to their point of view.”
  • The challenge comes as the field of psychology is facing a generational change, with young researchers beginning to share their data and study designs before publication, to improve transparency. Still, the new critique is likely to feed an already lively debate about how best to conduct and evaluate so-called replication projects of studies. Such projects are underway in several fields, scientists on both sides of the debate said.
  • “On some level, I suppose it is appealing to think everything is fine and there is no reason to change the status quo,” said Sanjay Srivastava, a psychologist at the University of Oregon, who was not a member of either team. “But we know too much, from many other sources, to put too much credence in an analysis that supports that remarkable conclusion.”
  • One issue the critique raised was how faithfully the replication team had adhered to the original design of the 100 studies it retested. Small alterations in design can make the difference between whether a study replicates or not, scientists say.
  • Another issue that the critique raised had to do with statistical methods. When Dr. Nosek began his study, there was no agreed-upon protocol for crunching the numbers. He and his team settled on five measures
  • He said that the original replication paper and the critique use statistical approaches that are “predictably imperfect” for this kind of analysis.One way to think about the dispute, Dr. Simohnson said, is that the original paper found that the glass was about 40 percent full, and the critique argues that it could be 100 percent full. In fact, he said in an email, “State-of-the-art techniques designed to evaluate replications say it is 40 percent full, 30 percent empty, and the remaining 30 percent could be full or empty, we can’t tell till we get more data.”
oliviaodon

How scientists fool themselves - and how they can stop : Nature News & Comment - 1 views

  • In 2013, five years after he co-authored a paper showing that Democratic candidates in the United States could get more votes by moving slightly to the right on economic policy1, Andrew Gelman, a statistician at Columbia University in New York City, was chagrined to learn of an error in the data analysis. In trying to replicate the work, an undergraduate student named Yang Yang Hu had discovered that Gelman had got the sign wrong on one of the variables.
  • Gelman immediately published a three-sentence correction, declaring that everything in the paper's crucial section should be considered wrong until proved otherwise.
  • Reflecting today on how it happened, Gelman traces his error back to the natural fallibility of the human brain: “The results seemed perfectly reasonable,” he says. “Lots of times with these kinds of coding errors you get results that are just ridiculous. So you know something's got to be wrong and you go back and search until you find the problem. If nothing seems wrong, it's easier to miss it.”
  • ...6 more annotations...
  • This is the big problem in science that no one is talking about: even an honest person is a master of self-deception. Our brains evolved long ago on the African savannah, where jumping to plausible conclusions about the location of ripe fruit or the presence of a predator was a matter of survival. But a smart strategy for evading lions does not necessarily translate well to a modern laboratory, where tenure may be riding on the analysis of terabytes of multidimensional data. In today's environment, our talent for jumping to conclusions makes it all too easy to find false patterns in randomness, to ignore alternative explanations for a result or to accept 'reasonable' outcomes without question — that is, to ceaselessly lead ourselves astray without realizing it.
  • Failure to understand our own biases has helped to create a crisis of confidence about the reproducibility of published results
  • Although it is impossible to document how often researchers fool themselves in data analysis, says Ioannidis, findings of irreproducibility beg for an explanation. The study of 100 psychology papers is a case in point: if one assumes that the vast majority of the original researchers were honest and diligent, then a large proportion of the problems can be explained only by unconscious biases. “This is a great time for research on research,” he says. “The massive growth of science allows for a massive number of results, and a massive number of errors and biases to study. So there's good reason to hope we can find better ways to deal with these problems.”
  • Although the human brain and its cognitive biases have been the same for as long as we have been doing science, some important things have changed, says psychologist Brian Nosek, executive director of the non-profit Center for Open Science in Charlottesville, Virginia, which works to increase the transparency and reproducibility of scientific research. Today's academic environment is more competitive than ever. There is an emphasis on piling up publications with statistically significant results — that is, with data relationships in which a commonly used measure of statistical certainty, the p-value, is 0.05 or less. “As a researcher, I'm not trying to produce misleading results,” says Nosek. “But I do have a stake in the outcome.” And that gives the mind excellent motivation to find what it is primed to find.
  • Another reason for concern about cognitive bias is the advent of staggeringly large multivariate data sets, often harbouring only a faint signal in a sea of random noise. Statistical methods have barely caught up with such data, and our brain's methods are even worse, says Keith Baggerly, a statistician at the University of Texas MD Anderson Cancer Center in Houston. As he told a conference on challenges in bioinformatics last September in Research Triangle Park, North Carolina, “Our intuition when we start looking at 50, or hundreds of, variables sucks.”
  • One trap that awaits during the early stages of research is what might be called hypothesis myopia: investigators fixate on collecting evidence to support just one hypothesis; neglect to look for evidence against it; and fail to consider other explanations.
Javier E

Economic Statistics Miss the Benefits of Technology - NYTimes.com - 2 views

  • Value added by the information technology and communications industries — mostly hardware and software — has remained stuck at around 4 percent of the nation’s economic output for the last quarter century.
  • But these statistics do not tell the whole story. Because they miss much of what technology does for people’s well-being. News organizations that take advantage of computers to let go of journalists, secretaries and research assistants will show up in the economic statistics as more productive, making more with less. But statisticians have no way to value more thorough, useful, fact-dense articles. What’s more, gross domestic product only values the goods and services people pay for. It does not capture the value to consumers of economic improvements that are given away free. And until recently this is what news media organizations like The New York Times were doing online.
  • “G.D.P. is not a measure of how much value is produced for consumers,” said Erik Brynjolfsson of the Massachusetts Institute of Technology. “Everybody should recognize that G.D.P. is not a welfare metric.”
  • ...10 more annotations...
  • people who had access to a search engine took 15 minutes less to answer a question than those without online access.
  • how to measure the Internet’s contribution to our lives? A few years ago, Austan Goolsbee of the University of Chicago and Peter J. Klenow of Stanford gave it a shot. They estimated that the value consumers gained from the Internet amounted to about 2 percent of their income — an order of magnitude larger than what they spent to go online. Their trick was to measure not only how much money users spent on access but also how much of their leisure time they spent online.
  • Gross domestic product has always failed to capture many things — from the costs of pollution and traffic jams to the gains of unpaid household work. A
  • Measured in money — what it contributes to G.D.P. — the recording industry is shrinking. Yet never before have Americans had access to so much music.
  • the consumer surplus from free online services — the value derived by consumers from the experience above what they paid for it — has been growing by $34 billion a year, on average, since 2002. If it were tacked on as “economic output,” it would add about 0.26 of a percentage point to annual G.D.P. growth.
  • The Internet is hardly the first technology to offer consumers valuable free goods. The consumer surplus from television is about five times as large as that delivered by free stuff online, according to Mr. Brynjolfsson’s calculations.
  • Varian estimated that a search engine might be worth about $500 annually to the average worker. Across the working population, this would add up to $65 billion a year.
  • . The missed consumer surplus from the Internet may be no bigger than the unmeasured gains in the production, for example, of electric light.
  • The amount of time Americans devote to the Internet has doubled in the last five years.
  • “We know less about the sources of value in the economy than we did 25 years ago,”
manhefnawi

How Statistics Fool Juries | Mental Floss - 0 views

  • While my experience wasn't related to statistics, it was an issue of science which seemed like we should have been able to prove the right answer one way or another -- but we failed.
Javier E

Cognitive Biases and the Human Brain - The Atlantic - 1 views

  • Present bias shows up not just in experiments, of course, but in the real world. Especially in the United States, people egregiously undersave for retirement—even when they make enough money to not spend their whole paycheck on expenses, and even when they work for a company that will kick in additional funds to retirement plans when they contribute.
  • hen people hear the word bias, many if not most will think of either racial prejudice or news organizations that slant their coverage to favor one political position over another. Present bias, by contrast, is an example of cognitive bias—the collection of faulty ways of thinking that is apparently hardwired into the human brain. The collection is large. Wikipedia’s “List of cognitive biases” contains 185 entries, from actor-observer bias (“the tendency for explanations of other individuals’ behaviors to overemphasize the influence of their personality and underemphasize the influence of their situation … and for explanations of one’s own behaviors to do the opposite”) to the Zeigarnik effect (“uncompleted or interrupted tasks are remembered better than completed ones”)
  • If I had to single out a particular bias as the most pervasive and damaging, it would probably be confirmation bias. That’s the effect that leads us to look for evidence confirming what we already think or suspect, to view facts and ideas we encounter as further confirmation, and to discount or ignore any piece of evidence that seems to support an alternate view
  • ...48 more annotations...
  • Confirmation bias shows up most blatantly in our current political divide, where each side seems unable to allow that the other side is right about anything.
  • The whole idea of cognitive biases and faulty heuristics—the shortcuts and rules of thumb by which we make judgments and predictions—was more or less invented in the 1970s by Amos Tversky and Daniel Kahneman
  • versky died in 1996. Kahneman won the 2002 Nobel Prize in Economics for the work the two men did together, which he summarized in his 2011 best seller, Thinking, Fast and Slow. Another best seller, last year’s The Undoing Project, by Michael Lewis, tells the story of the sometimes contentious collaboration between Tversky and Kahneman
  • Another key figure in the field is the University of Chicago economist Richard Thaler. One of the biases he’s most linked with is the endowment effect, which leads us to place an irrationally high value on our possessions.
  • In an experiment conducted by Thaler, Kahneman, and Jack L. Knetsch, half the participants were given a mug and then asked how much they would sell it for. The average answer was $5.78. The rest of the group said they would spend, on average, $2.21 for the same mug. This flew in the face of classic economic theory, which says that at a given time and among a certain population, an item has a market value that does not depend on whether one owns it or not. Thaler won the 2017 Nobel Prize in Economics.
  • “The question that is most often asked about cognitive illusions is whether they can be overcome. The message … is not encouraging.”
  • that’s not so easy in the real world, when we’re dealing with people and situations rather than lines. “Unfortunately, this sensible procedure is least likely to be applied when it is needed most,” Kahneman writes. “We would all like to have a warning bell that rings loudly whenever we are about to make a serious error, but no such bell is available.”
  • At least with the optical illusion, our slow-thinking, analytic mind—what Kahneman calls System 2—will recognize a Müller-Lyer situation and convince itself not to trust the fast-twitch System 1’s perception
  • Kahneman and others draw an analogy based on an understanding of the Müller-Lyer illusion, two parallel lines with arrows at each end. One line’s arrows point in; the other line’s arrows point out. Because of the direction of the arrows, the latter line appears shorter than the former, but in fact the two lines are the same length.
  • Because biases appear to be so hardwired and inalterable, most of the attention paid to countering them hasn’t dealt with the problematic thoughts, judgments, or predictions themselves
  • Is it really impossible, however, to shed or significantly mitigate one’s biases? Some studies have tentatively answered that question in the affirmative.
  • what if the person undergoing the de-biasing strategies was highly motivated and self-selected? In other words, what if it was me?
  • Over an apple pastry and tea with milk, he told me, “Temperament has a lot to do with my position. You won’t find anyone more pessimistic than I am.”
  • I met with Kahneman
  • “I see the picture as unequal lines,” he said. “The goal is not to trust what I think I see. To understand that I shouldn’t believe my lying eyes.” That’s doable with the optical illusion, he said, but extremely difficult with real-world cognitive biases.
  • In this context, his pessimism relates, first, to the impossibility of effecting any changes to System 1—the quick-thinking part of our brain and the one that makes mistaken judgments tantamount to the Müller-Lyer line illusion
  • he most effective check against them, as Kahneman says, is from the outside: Others can perceive our errors more readily than we can.
  • “slow-thinking organizations,” as he puts it, can institute policies that include the monitoring of individual decisions and predictions. They can also require procedures such as checklists and “premortems,”
  • A premortem attempts to counter optimism bias by requiring team members to imagine that a project has gone very, very badly and write a sentence or two describing how that happened. Conducting this exercise, it turns out, helps people think ahead.
  • “My position is that none of these things have any effect on System 1,” Kahneman said. “You can’t improve intuition.
  • Perhaps, with very long-term training, lots of talk, and exposure to behavioral economics, what you can do is cue reasoning, so you can engage System 2 to follow rules. Unfortunately, the world doesn’t provide cues. And for most people, in the heat of argument the rules go out the window.
  • Kahneman describes an even earlier Nisbett article that showed subjects’ disinclination to believe statistical and other general evidence, basing their judgments instead on individual examples and vivid anecdotes. (This bias is known as base-rate neglect.)
  • over the years, Nisbett had come to emphasize in his research and thinking the possibility of training people to overcome or avoid a number of pitfalls, including base-rate neglect, fundamental attribution error, and the sunk-cost fallacy.
  • Nisbett’s second-favorite example is that economists, who have absorbed the lessons of the sunk-cost fallacy, routinely walk out of bad movies and leave bad restaurant meals uneaten.
  • When Nisbett asks the same question of students who have completed the statistics course, about 70 percent give the right answer. He believes this result shows, pace Kahneman, that the law of large numbers can be absorbed into System 2—and maybe into System 1 as well, even when there are minimal cues.
  • about half give the right answer: the law of large numbers, which holds that outlier results are much more frequent when the sample size (at bats, in this case) is small. Over the course of the season, as the number of at bats increases, regression to the mean is inevitabl
  • When Nisbett has to give an example of his approach, he usually brings up the baseball-phenom survey. This involved telephoning University of Michigan students on the pretense of conducting a poll about sports, and asking them why there are always several Major League batters with .450 batting averages early in a season, yet no player has ever finished a season with an average that high.
  • we’ve tested Michigan students over four years, and they show a huge increase in ability to solve problems. Graduate students in psychology also show a huge gain.”
  • , “I know from my own research on teaching people how to reason statistically that just a few examples in two or three domains are sufficient to improve people’s reasoning for an indefinitely large number of events.”
  • isbett suggested another factor: “You and Amos specialized in hard problems for which you were drawn to the wrong answer. I began to study easy problems, which you guys would never get wrong but untutored people routinely do … Then you can look at the effects of instruction on such easy problems, which turn out to be huge.”
  • Nisbett suggested that I take “Mindware: Critical Thinking for the Information Age,” an online Coursera course in which he goes over what he considers the most effective de-biasing skills and concepts. Then, to see how much I had learned, I would take a survey he gives to Michigan undergraduates. So I did.
  • he course consists of eight lessons by Nisbett—who comes across on-screen as the authoritative but approachable psych professor we all would like to have had—interspersed with some graphics and quizzes. I recommend it. He explains the availability heuristic this way: “People are surprised that suicides outnumber homicides, and drownings outnumber deaths by fire. People always think crime is increasing” even if it’s not.
  • When I finished the course, Nisbett sent me the survey he and colleagues administer to Michigan undergrads
  • It contains a few dozen problems meant to measure the subjects’ resistance to cognitive biases
  • I got it right. Indeed, when I emailed my completed test, Nisbett replied, “My guess is that very few if any UM seniors did as well as you. I’m sure at least some psych students, at least after 2 years in school, did as well. But note that you came fairly close to a perfect score.”
  • Nevertheless, I did not feel that reading Mindware and taking the Coursera course had necessarily rid me of my biases
  • For his part, Nisbett insisted that the results were meaningful. “If you’re doing better in a testing context,” he told me, “you’ll jolly well be doing better in the real world.”
  • The New York–based NeuroLeadership Institute offers organizations and individuals a variety of training sessions, webinars, and conferences that promise, among other things, to use brain science to teach participants to counter bias. This year’s two-day summit will be held in New York next month; for $2,845, you could learn, for example, “why are our brains so bad at thinking about the future, and how do we do it better?”
  • Philip E. Tetlock, a professor at the University of Pennsylvania’s Wharton School, and his wife and research partner, Barbara Mellers, have for years been studying what they call “superforecasters”: people who manage to sidestep cognitive biases and predict future events with far more accuracy than the pundits
  • One of the most important ingredients is what Tetlock calls “the outside view.” The inside view is a product of fundamental attribution error, base-rate neglect, and other biases that are constantly cajoling us into resting our judgments and predictions on good or vivid stories instead of on data and statistics
  • In 2006, seeking to prevent another mistake of that magnitude, the U.S. government created the Intelligence Advanced Research Projects Activity (iarpa), an agency designed to use cutting-edge research and technology to improve intelligence-gathering and analysis. In 2011, iarpa initiated a program, Sirius, to fund the development of “serious” video games that could combat or mitigate what were deemed to be the six most damaging biases: confirmation bias, fundamental attribution error, the bias blind spot (the feeling that one is less biased than the average person), the anchoring effect, the representativeness heuristic, and projection bias (the assumption that everybody else’s thinking is the same as one’s own).
  • most promising are a handful of video games. Their genesis was in the Iraq War
  • Together with collaborators who included staff from Creative Technologies, a company specializing in games and other simulations, and Leidos, a defense, intelligence, and health research company that does a lot of government work, Morewedge devised Missing. Some subjects played the game, which takes about three hours to complete, while others watched a video about cognitive bias. All were tested on bias-mitigation skills before the training, immediately afterward, and then finally after eight to 12 weeks had passed.
  • “The literature on training suggests books and classes are fine entertainment but largely ineffectual. But the game has very large effects. It surprised everyone.”
  • he said he saw the results as supporting the research and insights of Richard Nisbett. “Nisbett’s work was largely written off by the field, the assumption being that training can’t reduce bias,
  • even the positive results reminded me of something Daniel Kahneman had told me. “Pencil-and-paper doesn’t convince me,” he said. “A test can be given even a couple of years later. But the test cues the test-taker. It reminds him what it’s all about.”
  • Morewedge told me that some tentative real-world scenarios along the lines of Missing have shown “promising results,” but that it’s too soon to talk about them.
  • In the future, I will monitor my thoughts and reactions as best I can
Javier E

McSweeney's Internet Tendency: Nate Silver Offers Up a Statistical Analysis of Your Fai... - 1 views

  • Nate Silver Offers Up a Statistical Analysis of Your Failing Relationship.
  • Ultimately, please don’t give me too much credit for this accumulated data. Although 0.0 percent of your mutual friends were willing to say anything, 93.9 percent of them saw this coming from the start.
Javier E

Putting Economic Data Into Context - The New York Times - 0 views

  • economic historians have been wrestling with this problem for years and have produced an excellent calculator for converting historical data into contemporary figures. The site is called Measuring Worth,
  • Today we use price indexes to convert monetary values from the past into “real” values today. The best-known such index is the Consumer Price Index published monthly by the Bureau of Labor Statistics. For those interested only in a simple inflation adjustment, the bureau maintains a useful calculator.
  • The area where this is the biggest problem is probably large budget numbers. The raw data is almost universally useless. Saying that the budget deficit was $680.3 billion in fiscal year 2013 tells the average person absolutely nothing of value. It’s just a large number that sounds scary. It would help to at least know that it is down from $1.087 trillion in 2012 and a peak of $1.413 trillion in 2009, but that’s not entirely adequate.
  • ...6 more annotations...
  • it makes no sense to compare the federal budget to a family budget, which is what the Consumer Price Index is based on. One needs to use a broader index, like the gross domestic product deflator, which measures price changes throughout the entire economy.
  • For large numbers, the percentage of the gross domestic product is both the easiest to find and best to use.
  • Since the “burden” of the debt basically falls on the entire economy, the debt-to-G.D.P. ratio is generally considered the best measure of that burden. It also facilitates international comparisons without having to worry about exchange-rate adjustments.
  • international price comparisons can be especially tricky because current market exchange rates may not accurately reflect relative values or standards of living. Economists generally prefer to use something called “purchasing power parity,” but such data is not always easy to come by
  • There is much more to say on this topic. I recommend an essay on the Measuring Worth website that discusses different measures of value over time and how they materially affect our perceptions. There are also new statistical measures coming online that may provide even better data, like the Billion Prices Project from M.I.T., which gathers price data in real time directly from store price scanners.
  • This is an area where trial and error is the best strategy. The important thing is to make an effort to provide proper context where it appears necessary and not to simply ignore the problem.
Javier E

Nate Silver, Artist of Uncertainty - 0 views

  • In 2008, Nate Silver correctly predicted the results of all 35 Senate races and the presidential results in 49 out of 50 states. Since then, his website, fivethirtyeight.com (now central to The New York Times’s political coverage), has become an essential source of rigorous, objective analysis of voter surveys to predict the Electoral College outcome of presidential campaigns. 
  • Political junkies, activists, strategists, and journalists will gain a deeper and more sobering sense of Silver’s methods in The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t (Penguin Press). A brilliant analysis of forecasting in finance, geology, politics, sports, weather, and other domains, Silver’s book is also an original fusion of cognitive psychology and modern statistical theory.
  • Its most important message is that the first step toward improving our predictions is learning how to live with uncertainty.
  • ...7 more annotations...
  • he blends the best of modern statistical analysis with research on cognition biases pioneered by Princeton psychologist and Nobel laureate in economics  Daniel Kahneman and the late Stanford psychologist Amos Tversky. 
  • Silver’s background in sports and poker turns out to be invaluable. Successful analysts in gambling and sports are different from fans and partisans—far more aware that “sure things” are likely to be illusions,
  • The second step is starting to understand why it is that big data, super computers, and mathematical sophistication haven’t made us better at separating signals (information with true predictive value) from noise (misleading information). 
  • One of the biggest problems we have in separating signal from noise is that when we look too hard for certainty that isn’t there, we often end up attracted to noise, either because it is more prominent or because it confirms what we would like to believe.
  • In discipline after discipline, Silver shows in his book that when you look at even the best single forecast, the average of all independent forecasts is 15 to 20 percent more accurate. 
  • Silver has taken the next major step: constantly incorporating both state polls and national polls into Bayesian models that also incorporate economic data.
  • Silver explains why we will be misled if we only consider significance tests—i.e., statements that the margin of error for the results is, for example, plus or minus four points, meaning there is one chance in 20 that the percentages reported are off by more than four. Calculations like these assume the only source of error is sampling error—the irreducible error—while ignoring errors attributable to house effects, like the proportion of cell-phone users, one of the complex set of assumptions every pollster must make about who will actually vote. In other words, such an approach ignores context in order to avoid having to justify and defend judgments. 
Javier E

Who Needs Math? - The Monkey Cage - 1 views

  • by Larry Bartels on April 9, 2013
  • “When something new is encountered, the follow-up steps usually require mathematical and statistical methods to move the analysis forward.” At that point, he suggests finding a collaborator
  • But technical expertise in itself is of little avail: ”The annals of theoretical biology are clogged with mathematical models that either can be safely ignored or, when tested, fail. Possibly no more than 10% have any lasting value. Only those linked solidly to knowledge of real living systems have much chance of being used.”
  • ...5 more annotations...
  • . If you’re going to talk about economics at all, you need some sense of how magnitudes play off against each other, which is the only way to have a chance of seeing how the pieces fit together.
  • [M]aybe the thing to say is that higher math isn’t usually essential; arithmetic is.
  • My own work has become rather less mathematical over the course of my career. When people ask why, I usually say that as I have come to learn more about politics, the “sophisticated” wrinkles have seemed to distract more than they adde
  • “Seeing how the pieces fit together” requires “some sense of how magnitudes play off against each other.” But, paradoxically, ”higher math” can get in the way of “mathematical intuition” about magnitudes. Formal theory is often couched in purely qualitative terms: under such and such conditions, more X should produce more Y. And quantitative analysis—which ought to focus squarely on magnitudes—is less likely to do so the more it is justified and valued on technical rather than substantive grounds.
  • I recently spent some time doing an informal meta-analysis of studies of the impact of campaign advertising. At the heart of that literature is a pretty simple question: how much does one more ad contribute to the sponsoring candidate’s vote share? Alas, most of the studies I reviewed provided no intelligible answer to that question; and the correlation between methodological “sophistication” (logarithmic transformations, multinomial logits, fixed effects, distributed lag models) and intelligibility was decidedly negative. The authors of these studies rarely seemed to know or care what their results implied about the magnitude of the effect, as long as those results could be billed as “statistically significant.
grayton downing

Lost in Translation | The Scientist Magazine® - 0 views

  • excessive reporting of positive results in papers that describe animal testing of potential therapies, just like the publishing bias seen in clinical research, according to a paper published today
  • We know publication bias happens a lot in clinical trials,” said Torgerson, “so I was surprised at myself for being surprised at the results, because of course, if it happens in human research, why wouldn’t it happen in animal research?”
  • his suspicions about publication bias by performing a statistical meta-analysis of thousands of reported animal tests for various neurological interventions—a total of 4,445 reported tests of 160 different drugs and other treatments for conditions that included Alzheimer’s disease, Parkinson’s disease, brain ischemia, and more.
  • ...3 more annotations...
  • One, as mentioned, is the suppression of negative results. The second is selective reporting of only the statistical analyses of data that provide a significant score. “Practically any data set, if it is tortured enough, will confess, and you will get a statistically significant result,”
  • , a publication bias “almost certainly” applies in other areas of preclinical research,
  • he suggested that a forum where investigators can deposit neutral or negative results in the form of articles, should be established to ensure that such findings are in the public arena and not hidden. This should help prevent other researchers from pursuing fruitless avenues of research, he said, “which is a waste of animals and a waste of research money.” Not to mention a risk to people enrolled in potentially pointless trials.
Javier E

Watson Still Can't Think - NYTimes.com - 0 views

  • Fish argued that Watson “does not come within a million miles of replicating the achievements of everyday human action and thought.” In defending this claim, Fish invoked arguments that one of us (Dreyfus) articulated almost 40 years ago in “What Computers Can’t Do,” a criticism of 1960s and 1970s style artificial intelligence.
  • At the dawn of the AI era the dominant approach to creating intelligent systems was based on finding the right rules for the computer to follow.
  • GOFAI, for Good Old Fashioned Artificial Intelligence.
  • ...12 more annotations...
  • For constrained domains the GOFAI approach is a winning strategy.
  • there is nothing intelligent or even interesting about the brute force approach.
  • the dominant paradigm in AI research has largely “moved on from GOFAI to embodied, distributed intelligence.” And Faustus from Cincinnati insists that as a result “machines with bodies that experience the world and act on it” will be “able to achieve intelligence.”
  • The new, embodied paradigm in AI, deriving primarily from the work of roboticist Rodney Brooks, insists that the body is required for intelligence. Indeed, Brooks’s classic 1990 paper, “Elephants Don’t Play Chess,” rejected the very symbolic computation paradigm against which Dreyfus had railed, favoring instead a range of biologically inspired robots that could solve apparently simple, but actually quite complicated, problems like locomotion, grasping, navigation through physical environments and so on. To solve these problems, Brooks discovered that it was actually a disadvantage for the system to represent the status of the environment and respond to it on the basis of pre-programmed rules about what to do, as the traditional GOFAI systems had. Instead, Brooks insisted, “It is better to use the world as its own model.”
  • although they respond to the physical world rather well, they tend to be oblivious to the global, social moods in which we find ourselves embedded essentially from birth, and in virtue of which things matter to us in the first place.
  • the embodied AI paradigm is irrelevant to Watson. After all, Watson has no useful bodily interaction with the world at all.
  • The statistical machine learning strategies that it uses are indeed a big advance over traditional GOFAI techniques. But they still fall far short of what human beings do.
  • “The illusion is that this computer is doing the same thing that a very good ‘Jeopardy!’ player would do. It’s not. It’s doing something sort of different that looks the same on the surface. And every so often you see the cracks.”
  • Watson doesn’t understand relevance at all. It only measures statistical frequencies. Because it is relatively common to find mismatches of this sort, Watson learns to weigh them as only mild evidence against the answer. But the human just doesn’t do it that way. The human being sees immediately that the mismatch is irrelevant for the Erie Canal but essential for Toronto. Past frequency is simply no guide to relevance.
  • The fact is, things are relevant for human beings because at root we are beings for whom things matter. Relevance and mattering are two sides of the same coin. As Haugeland said, “The problem with computers is that they just don’t give a damn.” It is easy to pretend that computers can care about something if we focus on relatively narrow domains — like trivia games or chess — where by definition winning the game is the only thing that could matter, and the computer is programmed to win. But precisely because the criteria for success are so narrowly defined in these cases, they have nothing to do with what human beings are when they are at their best.
  • Far from being the paradigm of intelligence, therefore, mere matching with no sense of mattering or relevance is barely any kind of intelligence at all. As beings for whom the world already matters, our central human ability is to be able to see what matters when.
  • But, as we show in our recent book, this is an existential achievement orders of magnitude more amazing and wonderful than any statistical treatment of bare facts could ever be. The greatest danger of Watson’s victory is not that it proves machines could be better versions of us, but that it tempts us to misunderstand ourselves as poorer versions of them.
Javier E

Noted Dutch Psychologist, Stapel, Accused of Research Fraud - NYTimes.com - 0 views

  • A well-known psychologist in the Netherlands whose work has been published widely in professional journals falsified data and made up entire experiments, an investigating committee has found
  • Experts say the case exposes deep flaws in the way science is done in a field, psychology, that has only recently earned a fragile respectability.
  • In recent years, psychologists have reported a raft of findings on race biases, brain imaging and even extrasensory perception that have not stood up to scrutiny. Outright fraud may be rare, these experts say, but they contend that Dr. Stapel took advantage of a system that allows researchers to operate in near secrecy and massage data to find what they want to find, without much fear of being challenged.
  • ...8 more annotations...
  • “The big problem is that the culture is such that researchers spin their work in a way that tells a prettier story than what they really found,” said Jonathan Schooler, a psychologist at the University of California, Santa Barbara. “It’s almost like everyone is on steroids, and to compete you have to take steroids as well.”
  • Dr. Stapel published papers on the effect of power on hypocrisy, on racial stereotyping and on how advertisements affect how people view themselves. Many of his findings appeared in newspapers around the world, including The New York Times, which reported in December on his study about advertising and identity.
  • In a survey of more than 2,000 American psychologists scheduled to be published this year, Leslie John of Harvard Business School and two colleagues found that 70 percent had acknowledged, anonymously, to cutting some corners in reporting data. About a third said they had reported an unexpected finding as predicted from the start, and about 1 percent admitted to falsifying data.
  • Dr. Stapel was able to operate for so long, the committee said, in large measure because he was “lord of the data,” the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. “This is in violation of ethical rules established in the field,” Dr. Wicherts said.
  • Also common is a self-serving statistical sloppiness. In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error tha
  • t changed a reported finding — almost always in opposition to the authors’ hypothesis.
  • an analysis of 49 studies appearing Wednesday in the journal PLoS One, by Dr. Wicherts, Dr. Bakker and Dylan Molenaar, found that the more reluctant that scientists were to share their data, the more likely that evidence contradicted their reported findings.
  • “We know the general tendency of humans to draw the conclusions they want to draw — there’s a different threshold,” said Joseph P. Simmons, a psychologist at the University of Pennsylvania’s Wharton School. “With findings we want to see, we ask, ‘Can I believe this?’ With those we don’t, we ask, ‘Must I believe this?’
1 - 20 of 197 Next › Last »
Showing 20 items per page