Skip to main content

Home/ New Media Ethics 2009 course/ Group items tagged Sampling

Rss Feed Group items tagged

Weiye Loh

Study: Airport Security Should Stop Racial Profiling | Smart Journalism. Real Solutions... - 0 views

  • Plucking out of line most of the vaguely Middle Eastern-looking men at the airport for heightened screening is no more effective at catching terrorists than randomly sampling everyone. It may even be less effective. Press stumbled across this counterintuitive concept — sometimes the best way to find something is not to weight it by probability — in the unrelated context of computational biology. The parallels to airport security struck him when a friend mentioned he was constantly being pulled out of line at the airport.
  • Racial profiling, in other words, doesn’t work because it devotes heightened resources to innocent people — and then devotes those resources to them repeatedly even after they’ve been cleared as innocent the first time. The actual terrorists, meanwhile, may sneak through while Transportation Security Administration agents are focusing their limited attention on the wrong passengers.
  • Press tested the theory in a series of probability equations (the ambitious can check his math here and here).
  • ...3 more annotations...
  • Sampling based on profiling is mathematically no more effective than uniform random sampling. The optimal equation, rather, turns out to be something called “square-root sampling,” a compromise between the other two methods.
  • “Crudely,” Press writes of his findings in the journal Significance, if certain people are “nine times as likely to be the terrorist, we pull out only three times as many of them for special checks. Surprisingly, and bizarrely, this turns out to be the most efficient way of catching the terrorist.”
  • Square-root sampling, though, still represents a kind of profiling, and, Press adds, not one that could be realistically implemented at airports today. Square-root sampling only works if the profile probabilities are accurate in the first place — if we are able to say with mathematical certainty that some types of people are “nine times as likely to be the terrorist” compared to others. TSA agents in a crowded holiday terminal making snap judgments about facial hair would be far from this standard. “The nice thing about uniform sampling is there’s nothing to be inaccurate about, you don’t need any data, it never can be worse than you expect,” Press said. “As soon as you use profile probabilities, if the profile probabilities are just wrong, then the strong profiling just does worse than the random sampling.”
Weiye Loh

How wise are crowds? - 0 views

  • n the past, economists trying to model the propagation of information through a population would allow any given member of the population to observe the decisions of all the other members, or of a random sampling of them. That made the models easier to deal with mathematically, but it also made them less representative of the real world.
    • Weiye Loh
       
      Random sampling is not representative
  • this paper does is add the important component that this process is typically happening in a social network where you can’t observe what everyone has done, nor can you randomly sample the population to find out what a random sample has done, but rather you see what your particular friends in the network have done,” says Jon Kleinberg, Tisch University Professor in the Cornell University Department of Computer Science, who was not involved in the research. “That introduces a much more complex structure to the problem, but arguably one that’s representative of what typically happens in real settings.”
    • Weiye Loh
       
      So random sampling is actually more accurate?
  • Earlier models, Kleinberg explains, indicated the danger of what economists call information cascades. “If you have a few crucial ingredients — namely, that people are making decisions in order, that they can observe the past actions of other people but they can’t know what those people actually knew — then you have the potential for information cascades to occur, in which large groups of people abandon whatever private information they have and actually, for perfectly rational reasons, follow the crowd,”
  • ...8 more annotations...
  • The MIT researchers’ paper, however, suggests that the danger of information cascades may not be as dire as it previously seemed.
  • a mathematical model that describes attempts by members of a social network to make binary decisions — such as which of two brands of cell phone to buy — on the basis of decisions made by their neighbors. The model assumes that for all members of the population, there is a single right decision: one of the cell phones is intrinsically better than the other. But some members of the network have bad information about which is which.
  • The MIT researchers analyzed the propagation of information under two different conditions. In one case, there’s a cap on how much any one person can know about the state of the world: even if one cell phone is intrinsically better than the other, no one can determine that with 100 percent certainty. In the other case, there’s no such cap. There’s debate among economists and information theorists about which of these two conditions better reflects reality, and Kleinberg suggests that the answer may vary depending on the type of information propagating through the network. But previous models had suggested that, if there is a cap, information cascades are almost inevitable.
  • if there’s no cap on certainty, an expanding social network will eventually converge on an accurate representation of the state of the world; that wasn’t a big surprise. But they also showed that in many common types of networks, even if there is a cap on certainty, convergence will still occur.
  • people in the past have looked at it using more myopic models,” says Acemoglu. “They would be averaging type of models: so my opinion is an average of the opinions of my neighbors’.” In such a model, Acemoglu says, the views of people who are “oversampled” — who are connected with a large enough number of other people — will end up distorting the conclusions of the group as a whole.
  • What we’re doing is looking at it in a much more game-theoretic manner, where individuals are realizing where the information comes from. So there will be some correction factor,” Acemoglu says. “If I’m seeing you, your action, and I’m seeing Munzer’s action, and I also know that there is some probability that you might have observed Munzer, then I discount his opinion appropriately, because I know that I don’t want to overweight it. And that’s the reason why, even though you have these influential agents — it might be that Munzer is everywhere, and everybody observes him — that still doesn’t create a herd on his opinion.”
  • the new paper leaves a few salient questions unanswered, such as how quickly the network will converge on the correct answer, and what happens when the model of agents’ knowledge becomes more complex.
  • the MIT researchers begin to address both questions. One paper examines rate of convergence, although Dahleh and Acemoglu note that that its results are “somewhat weaker” than those about the conditions for convergence. Another paper examines cases in which different agents make different decisions given the same information: some people might prefer one type of cell phone, others another. In such cases, “if you know the percentage of people that are of one type, it’s enough — at least in certain networks — to guarantee learning,” Dahleh says. “I don’t need to know, for every individual, whether they’re for it or against it; I just need to know that one-third of the people are for it, and two-thirds are against it.” For instance, he says, if you notice that a Chinese restaurant in your neighborhood is always half-empty, and a nearby Indian restaurant is always crowded, then information about what percentages of people prefer Chinese or Indian food will tell you which restaurant, if either, is of above-average or below-average quality.
  •  
    By melding economics and engineering, researchers show that as social networks get larger, they usually get better at sorting fact from fiction.
Weiye Loh

Studying the politics of online science « through the looking glass - 0 views

  • Mendick, H. and Moreau, M. (2010). Monitoring the presence and representation of  women in SET occupations in UK based online media. Bradford: The UKRC.
  • Mendick and Moreau considered the representation of women on eight ‘SET’ (science, engineering and technology) websites: New Scientist, Bad Science, the Science Museum, the Natural History Museum, Neuroskeptic, Science: So What, Watt’s Up With That and RichardDawkins.net. They also monitored SET content across eight more general sites: the BBC, Channel 4, Sky, the Guardian, the Daily Mail, Wikipedia, YouTube and Twitter.
  • Their results suggest online science informational content is male dominated in that far more men than women are present. On some websites, they found no SET women. All of the 14 people in SET identified on the sampled pages of the RichardDawkins.net website were men, and so were all 29 of those mentioned on the sampled pages of the Channel 4 website (Mendick & Moreau, 2010: 11).
  • ...8 more annotations...
  • They found less hyperlinking of women’s than men’s names (Mendick & Moreau, 2010: 7). Personally, I’d have really liked some detail as to how they came up with this, and what constituted ‘hyperlinking of women’s names’ precisely. It’s potentially an interesting finding, but I can’t quite get a grip on what they are saying.
  • They also note that the women that did appear, they were often peripheral to the main story, or ‘subject to muting’ (i.e. seen but not heard). They also noted many instances where women were pictured but remain anonymous, as if there are used to illustrate a piece – for ‘ornamental’ purposes – and give the example of the wikipedia entry on scientists, which includes a picture a women as an example, but stress she is anonymous (Mendick & Moreau, 2010: 12).
  • Echoing findings of earlier research on science in the media (e.g. the Bimbo or Boffin paper), they noted that women, when represented, tended to be associated with ‘feminine’ attributes and activities, demonstrating empathy with children and animals, etc. They also noted a clustering in specific fields. For example, in the pages they’d sampled of the Guardian, they found seven mentions of women scientists compared with twenty-eight of men, and three of the these women were in a single article, about Jane Goodall (Mendick & Moreau, 2010: 12-13).
  • The women presented were often discussed in terms of appearance, personality, sexuality and personal circumstances, again echoing previous research. They also noted that women scientists, when present, tended to be younger than the men, and there was a striking lack of ethnic diversity (Mendick & Moreau, 2010: 14).
  • I’m going to be quite critical of this research. It’s not actively bad, it just seems to lack depth and precision. I suspect Mendick and Moreau were doing their best with low resources and an overly-broad brief. I also think that we are still feeling our way in terms of working out how to study online science media, and so can learn something from such a critique.
  • Problem number one: it’s a small study, and yet a ginormous topic. I’d much rather they had looked at less, but made more of it. At times I felt like I was reading a cursory glance at online science.
  • Problem number two: the methodological script seemed a bit stuck in the print era. I felt the study lacked a feel for the variety of routes people take through online science. It lacked a sense of online science’s communities and cliques, its cultures and sub-cultures, its history and its people. It lacked context. Most of all, it lacked a sense of what I think sits at the center of online communication: the link.
  • It tries to look at too much, too quickly. We’re told that of the blog entries sampled from Bad Science, three out of four of the women mentioned were associated with ‘bad science’, compared to 12 out of 27 of the men . They follow up this a note that Goldacre has appeared on television critiquing Greenfield,­ a clip of which is on his site (Mendick & Moreau, 2010: 17-18). OK, but ‘bad’ needs unpacking here, as does the gendered nature of the area Goldacre takes aim at. As for Susan Greenfield, she is a very complex character when it comes to the politics of science and gender (one I’d say it is dangerous to treat representations of simplistically). Moreover, this is a very small sample, without much feel for the broader media context the Bad Science blog works within, including not only other platforms for Ben Goldacre’s voice but comment threads, forums and a whole community of other ‘bad science bloggers’ (and their relationships with each other)
  •  
    okmark
Weiye Loh

Jonathan Stray » Measuring and improving accuracy in journalism - 0 views

  • Accuracy is a hard thing to measure because it’s a hard thing to define. There are subjective and objective errors, and no standard way of determining whether a reported fact is true or false
  • The last big study of mainstream reporting accuracy found errors (defined below) in 59% of 4,800 stories across 14 metro newspapers. This level of inaccuracy — where about one in every two articles contains an error — has persisted for as long as news accuracy has been studied, over seven decades now.
  • With the explosion of available information, more than ever it’s time to get serious about accuracy, about knowing which sources can be trusted. Fortunately, there are emerging techniques that might help us to measure media accuracy cheaply, and then increase it.
  • ...7 more annotations...
  • We could continuously sample a news source’s output to produce ongoing accuracy estimates, and build social software to help the audience report and filter errors. Meticulously applied, this approach would give a measure of the accuracy of each information source, and a measure of the efficiency of their corrections process (currently only about 3% of all errors are corrected.)
  • Real world reporting isn’t always clearly “right” or “wrong,” so it will often be hard to decide whether something is an error or not. But we’re not going for ultimate Truth here,  just a general way of measuring some important aspect of the idea we call “accuracy.” In practice it’s important that the error counting method is simple, clear and repeatable, so that you can compare error rates of different times and sources.
  • Subjective errors, though by definition involving judgment, should not be dismissed as merely differences in opinion. Sources found such errors to be about as common as factual errors and often more egregious [as rated by the sources.] But subjective errors are a very complex category
  • One of the major problems with previous news accuracy metrics is the effort and time required to produce them. In short, existing accuracy measurement methods are expensive and slow. I’ve been wondering if we can do better, and a simple idea comes to mind: sampling. The core idea is this: news sources could take an ongoing random sample of their output and check it for accuracy — a fact check spot check
  • Standard statistical theory tells us what the error on that estimate will be for any given number of samples (If I’ve got this right, the relevant formula is standard error of a population proportion estimate without replacement.) At a sample rate of a few stories per day, daily estimates of error rate won’t be worth much. But weekly and monthly aggregates will start to produce useful accuracy estimates
  • the first step would be admitting how inaccurate journalism has historically been. Then we have to come up with standardized accuracy evaluation procedures, in pursuit of metrics that capture enough of what we mean by “true” to be worth optimizing. Meanwhile, we can ramp up the efficiency of our online corrections processes until we find as many useful, legitimate errors as possible with as little staff time as possible. It might also be possible do data mining on types of errors and types of stories to figure out if there are patterns in how an organization fails to get facts right.
  • I’d love to live in a world where I could compare the accuracy of information sources, where errors got found and fixed with crowd-sourced ease, and where news organizations weren’t shy about telling me what they did and did not know. Basic factual accuracy is far from the only measure of good journalism, but perhaps it’s an improvement over the current sad state of affairs
  •  
    Professional journalism is supposed to be "factual," "accurate," or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem like important questions, so I've been poking around, trying to figure out what we know and don't know about the accuracy of our news sources. Meanwhile, the online news corrections process continues to evolve, which gives us hope that the news will become more accurate in the future.
Weiye Loh

The Decline Effect and the Scientific Method : The New Yorker - 0 views

  • On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties.
  • the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.
  • Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
  • ...30 more annotations...
  • But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
  • the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.
  • At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”
  • In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”
  • The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!
  • this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”
  • In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
  • the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.
  • the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
  • Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
  • While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts
  • an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
  • The funnel graph visually captures the distortions of selective reporting. For instance, after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results.
  • Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
  • Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
  • One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
  • John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.”
  • In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
  • The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds of studies on the various genes that control the differences in disease risk between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
  • the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
  • The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
  • scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,” he says. The current “obsession” with replicability distracts from the real problem, which is faulty design. He notes that nobody even tries to replicate most science papers—there are simply too many. (According to Nature, a third of all studies never even get cited, let alone repeated.)
  • Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”
  • Although such reforms would mitigate the dangers of publication bias and selective reporting, they still wouldn’t erase the decline effect. This is largely because scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging
  • John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
  • The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
  • The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
  • This suggests that the decline effect is actually a decline of illusion.
  • While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that. Many scientific theories continue to be considered true even after failing numerous experimental tests. Verbal overshadowing might exhibit the decline effect, but it remains extensively relied upon within the field. The same holds for any number of phenomena, from the disappearing benefits of second-generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite these findings, second-generation antipsychotics are still widely prescribed, and our model of the neutron hasn’t changed. The law of gravity remains the same.
  • Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
Weiye Loh

Odds Are, It's Wrong - Science News - 0 views

  • science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.
  • a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.
  • science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.
  • ...24 more annotations...
  • Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.
  • “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”
  • In 2007, for instance, researchers combing the medical literature found numerous studies linking a total of 85 genetic variants in 70 different genes to acute coronary syndrome, a cluster of heart problems. When the researchers compared genetic tests of 811 patients that had the syndrome with a group of 650 (matched for sex and age) that didn’t, only one of the suspect gene variants turned up substantially more often in those with the syndrome — a number to be expected by chance.“Our null results provide no support for the hypothesis that any of the 85 genetic variants tested is a susceptibility factor” for the syndrome, the researchers reported in the Journal of the American Medical Association.How could so many studies be wrong? Because their conclusions relied on “statistical significance,” a concept at the heart of the mathematical analysis of modern scientific experiments.
  • Statistical significance is a phrase that every science graduate student learns, but few comprehend. While its origins stretch back at least to the 19th century, the modern notion was pioneered by the mathematician Ronald A. Fisher in the 1920s. His original interest was agriculture. He sought a test of whether variation in crop yields was due to some specific intervention (say, fertilizer) or merely reflected random factors beyond experimental control.Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.Fisher’s P value eventually became the ultimate arbiter of credibility for science results of all sorts
  • But in fact, there’s no logical basis for using a P value from a single study to draw any conclusion. If the chance of a fluke is less than 5 percent, two possible conclusions remain: There is a real effect, or the result is an improbable fluke. Fisher’s method offers no way to know which is which. On the other hand, if a study finds no statistically significant effect, that doesn’t prove anything, either. Perhaps the effect doesn’t exist, or maybe the statistical test wasn’t powerful enough to detect a small but real effect.
  • Soon after Fisher established his system of statistical significance, it was attacked by other mathematicians, notably Egon Pearson and Jerzy Neyman. Rather than testing a null hypothesis, they argued, it made more sense to test competing hypotheses against one another. That approach also produces a P value, which is used to gauge the likelihood of a “false positive” — concluding an effect is real when it actually isn’t. What  eventually emerged was a hybrid mix of the mutually inconsistent Fisher and Neyman-Pearson approaches, which has rendered interpretations of standard statistics muddled at best and simply erroneous at worst. As a result, most scientists are confused about the meaning of a P value or how to interpret it. “It’s almost never, ever, ever stated correctly, what it means,” says Goodman.
  • experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”
  • That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time. (See Box 2)
    • Weiye Loh
       
      Does the problem then, lie not in statistics, but the interpretation of statistics? Is the fallacy of appeal to probability is at work in such interpretation? 
  • Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms. A new drug may be statistically better than an old drug, but for every thousand people you treat you might get just one or two additional cures — not clinically significant. Similarly, when studies claim that a chemical causes a “significantly increased risk of cancer,” they often mean that it is just statistically significant, possibly posing only a tiny absolute increase in risk.
  • Statisticians perpetually caution against mistaking statistical significance for practical importance, but scientific papers commit that error often. Ziliak studied journals from various fields — psychology, medicine and economics among others — and reported frequent disregard for the distinction.
  • “I found that eight or nine of every 10 articles published in the leading journals make the fatal substitution” of equating statistical significance to importance, he said in an interview. Ziliak’s data are documented in the 2008 book The Cult of Statistical Significance, coauthored with Deirdre McCloskey of the University of Illinois at Chicago.
  • Multiplicity of mistakesEven when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly.
  • Recognizing these problems, some researchers now calculate a “false discovery rate” to warn of flukes disguised as real effects. And genetics researchers have begun using “genome-wide association studies” that attempt to ameliorate the multiplicity issue (SN: 6/21/08, p. 20).
  • Many researchers now also commonly report results with confidence intervals, similar to the margins of error reported in opinion polls. Such intervals, usually given as a range that should include the actual value with 95 percent confidence, do convey a better sense of how precise a finding is. But the 95 percent confidence calculation is based on the same math as the .05 P value and so still shares some of its problems.
  • Statistical problems also afflict the “gold standard” for medical research, the randomized, controlled clinical trials that test drugs for their ability to cure or their power to harm. Such trials assign patients at random to receive either the substance being tested or a placebo, typically a sugar pill; random selection supposedly guarantees that patients’ personal characteristics won’t bias the choice of who gets the actual treatment. But in practice, selection biases may still occur, Vance Berger and Sherri Weinstein noted in 2004 in ControlledClinical Trials. “Some of the benefits ascribed to randomization, for example that it eliminates all selection bias, can better be described as fantasy than reality,” they wrote.
  • Randomization also should ensure that unknown differences among individuals are mixed in roughly the same proportions in the groups being tested. But statistics do not guarantee an equal distribution any more than they prohibit 10 heads in a row when flipping a penny. With thousands of clinical trials in progress, some will not be well randomized. And DNA differs at more than a million spots in the human genetic catalog, so even in a single trial differences may not be evenly mixed. In a sufficiently large trial, unrandomized factors may balance out, if some have positive effects and some are negative. (See Box 3) Still, trial results are reported as averages that may obscure individual differences, masking beneficial or harm­ful effects and possibly leading to approval of drugs that are deadly for some and denial of effective treatment to others.
  • nother concern is the common strategy of combining results from many trials into a single “meta-analysis,” a study of studies. In a single trial with relatively few participants, statistical tests may not detect small but real and possibly important effects. In principle, combining smaller studies to create a larger sample would allow the tests to detect such small effects. But statistical techniques for doing so are valid only if certain criteria are met. For one thing, all the studies conducted on the drug must be included — published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,” he says.
  • Meta-analyses have produced many controversial conclusions. Common claims that antidepressants work no better than placebos, for example, are based on meta-analyses that do not conform to the criteria that would confer validity. Similar problems afflicted a 2007 meta-analysis, published in the New England Journal of Medicine, that attributed increased heart attack risk to the diabetes drug Avandia. Raw data from the combined trials showed that only 55 people in 10,000 had heart attacks when using Avandia, compared with 59 people per 10,000 in comparison groups. But after a series of statistical manipulations, Avandia appeared to confer an increased risk.
  • combining small studies in a meta-analysis is not a good substitute for a single trial sufficiently large to test a given question. “Meta-analyses can reduce the role of chance in the interpretation but may introduce bias and confounding,” Hennekens and DeMets write in the Dec. 2 Journal of the American Medical Association. “Such results should be considered more as hypothesis formulating than as hypothesis testing.”
  • Some studies show dramatic effects that don’t require sophisticated statistics to interpret. If the P value is 0.0001 — a hundredth of a percent chance of a fluke — that is strong evidence, Goodman points out. Besides, most well-accepted science is based not on any single study, but on studies that have been confirmed by repetition. Any one result may be likely to be wrong, but confidence rises quickly if that result is independently replicated.“Replication is vital,” says statistician Juliet Shaffer, a lecturer emeritus at the University of California, Berkeley. And in medicine, she says, the need for replication is widely recognized. “But in the social sciences and behavioral sciences, replication is not common,” she noted in San Diego in February at the annual meeting of the American Association for the Advancement of Science. “This is a sad situation.”
  • Most critics of standard statistics advocate the Bayesian approach to statistical reasoning, a methodology that derives from a theorem credited to Bayes, an 18th century English clergyman. His approach uses similar math, but requires the added twist of a “prior probability” — in essence, an informed guess about the expected probability of something in advance of the study. Often this prior probability is more than a mere guess — it could be based, for instance, on previous studies.
  • it basically just reflects the need to include previous knowledge when drawing conclusions from new observations. To infer the odds that a barking dog is hungry, for instance, it is not enough to know how often the dog barks when well-fed. You also need to know how often it eats — in order to calculate the prior probability of being hungry. Bayesian math combines a prior probability with observed data to produce an estimate of the likelihood of the hunger hypothesis. “A scientific hypothesis cannot be properly assessed solely by reference to the observational data,” but only by viewing the data in light of prior belief in the hypothesis, wrote George Diamond and Sanjay Kaul of UCLA’s School of Medicine in 2004 in the Journal of the American College of Cardiology. “Bayes’ theorem is ... a logically consistent, mathematically valid, and intuitive way to draw inferences about the hypothesis.” (See Box 4)
  • In many real-life contexts, Bayesian methods do produce the best answers to important questions. In medical diagnoses, for instance, the likelihood that a test for a disease is correct depends on the prevalence of the disease in the population, a factor that Bayesian math would take into account.
  • But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics. “Subjective prior beliefs are anathema to the frequentist, who relies instead on a series of ad hoc algorithms that maintain the facade of scientific objectivity,” Diamond and Kaul wrote.Conflict between frequentists and Bayesians has been ongoing for two centuries. So science’s marriage to mathematics seems to entail some irreconcilable differences. Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.“What does probability mean in real life?” the statistician David Salsburg asked in his 2001 book The Lady Tasting Tea. “This problem is still unsolved, and ... if it remains un­solved, the whole of the statistical approach to science may come crashing down from the weight of its own inconsistencies.”
  •  
    Odds Are, It's Wrong Science fails to face the shortcomings of statistics
Weiye Loh

Referees' quotes - 2010 - 2010 - Environmental Microbiology - Wiley Online Library - 0 views

  • This paper is desperate. Please reject it completely and then block the author's email ID so they can't use the online system in future.
  • The type of lava vs. diversity has no meaning if only one of each sample is analyzed; multiple samples are required for generality. This controls provenance (e.g. maybe some beetle took a pee on one or the other of the samples, seriously skewing relevance to lava composition).
  • Merry X-mas! First, my recommendation was reject with new submission, because it is necessary to investigate further, but reading a well written manuscript before X-mas makes me feel like Santa Claus.
  • ...6 more annotations...
  • Season's Greetings! I apologise for my slow response but a roast goose prevented me from answering emails for a few days.• I started to review this but could not get much past the abstract.
  • Stating that the study is confirmative is not a good start for the Discussion. Rephrasing the first sentence of the Discussion would seem to be a good idea.
  • Reject – More holes than my grandad's string vest!• The writing and data presentation are so bad that I had to leave work and go home early and then spend time to wonder what life is about.
  • Sorry for the overdue, it seems to me that ‘overdue’ is my constant, persistent and chronic EMI status. Good that the reviewers are not getting red cards! The editors could create, in addition to the referees quotes, a ranking for ‘on-time’ referees. I would get the bottom place. But fast is not equal to good (I am consoling myself!)
  • It hurts me a little to have so little criticism of a manuscript.
  • Based on titles seen in journals, many authors seem to be more fascinated these days by their methods than by their science. The authors should be encouraged to abstract the main scientific (i.e., novel) finding into the title.
Weiye Loh

Shakespeare? He's in my DNA | plus.maths.org - 0 views

  •  
    "not only can scientist read DNA sequences from biological samples, they can also "write" them. In the lab they can produce strands of DNA corresponding to particular strings of nucleotides, denoted by the letters A, G, T and C. So if you encode your information in terms of these four letters, you could theoretically store it in DNA. "We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it," explains Nick Goldman of the EMBL-European Bioinformatics Institute (EMBL-EBI). "It's also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.""
Chen Guo Lim

POLICE & THIEF - 5 views

According to the readings, one reason why people do not consider illegal downloads as theft is that it does not deprive others of that item. When I download an mp3 file from, the file will not disa...

Weiye Loh

Rationally Speaking: The problem of replicability in science - 0 views

  • The problem of replicability in science from xkcdby Massimo Pigliucci
  • In recent months much has been written about the apparent fact that a surprising, indeed disturbing, number of scientific findings cannot be replicated, or when replicated the effect size turns out to be much smaller than previously thought.
  • Arguably, the recent streak of articles on this topic began with one penned by David Freedman in The Atlantic, and provocatively entitled “Lies, Damned Lies, and Medical Science.” In it, the major character was John Ioannidis, the author of some influential meta-studies about the low degree of replicability and high number of technical flaws in a significant portion of published papers in the biomedical literature.
  • ...18 more annotations...
  • As Freedman put it in The Atlantic: “80 percent of non-randomized studies (by far the most common type) turn out to be wrong, as do 25 percent of supposedly gold-standard randomized trials, and as much as 10 percent of the platinum-standard large randomized trials.” Ioannidis himself was quoted uttering some sobering words for the medical community (and the public at large): “Science is a noble endeavor, but it’s also a low-yield endeavor. I’m not sure that more than a very small percentage of medical research is ever likely to lead to major improvements in clinical outcomes and quality of life. We should be very comfortable with that fact.”
  • Julia and I actually addressed this topic during a Rationally Speaking podcast, featuring as guest our friend Steve Novella, of Skeptics’ Guide to the Universe and Science-Based Medicine fame. But while Steve did quibble with the tone of the Atlantic article, he agreed that Ioannidis’ results are well known and accepted by the medical research community. Steve did point out that it should not be surprising that results get better and better as one moves toward more stringent protocols like large randomized trials, but it seems to me that one should be surprised (actually, appalled) by the fact that even there the percentage of flawed studies is high — not to mention the fact that most studies are in fact neither large nor properly randomized.
  • The second big recent blow to public perception of the reliability of scientific results is an article published in The New Yorker by Jonah Lehrer, entitled “The truth wears off.” Lehrer also mentions Ioannidis, but the bulk of his essay is about findings in psychiatry, psychology and evolutionary biology (and even in research on the paranormal!).
  • In these disciplines there are now several documented cases of results that were initially spectacularly positive — for instance the effects of second generation antipsychotic drugs, or the hypothesized relationship between a male’s body symmetry and the quality of his genes — that turned out to be increasingly difficult to replicate over time, with the original effect sizes being cut down dramatically, or even disappearing altogether.
  • As Lehrer concludes at the end of his article: “Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling.”
  • None of this should actually be particularly surprising to any practicing scientist. If you have spent a significant time of your life in labs and reading the technical literature, you will appreciate the difficulties posed by empirical research, not to mention a number of issues such as the fact that few scientists ever actually bother to replicate someone else’s results, for the simple reason that there is no Nobel (or even funded grant, or tenured position) waiting for the guy who arrived second.
  • n the midst of this I was directed by a tweet by my colleague Neil deGrasse Tyson (who has also appeared on the RS podcast, though in a different context) to a recent ABC News article penned by John Allen Paulos, which meant to explain the decline effect in science.
  • Paulos’ article is indeed concise and on the mark (though several of the explanations he proposes were already brought up in both the Atlantic and New Yorker essays), but it doesn’t really make things much better.
  • Paulos suggests that one explanation for the decline effect is the well known statistical phenomenon of the regression toward the mean. This phenomenon is responsible, among other things, for a fair number of superstitions: you’ve probably heard of some athletes’ and other celebrities’ fear of being featured on the cover of a magazine after a particularly impressive series of accomplishments, because this brings “bad luck,” meaning that the following year one will not be able to repeat the performance at the same level. This is actually true, not because of magical reasons, but simply as a result of the regression to the mean: extraordinary performances are the result of a large number of factors that have to line up just right for the spectacular result to be achieved. The statistical chances of such an alignment to repeat itself are low, so inevitably next year’s performance will likely be below par. Paulos correctly argues that this also explains some of the decline effect of scientific results: the first discovery might have been the result of a number of factors that are unlikely to repeat themselves in exactly the same way, thus reducing the effect size when the study is replicated.
  • nother major determinant of the unreliability of scientific results mentioned by Paulos is the well know problem of publication bias: crudely put, science journals (particularly the high-profile ones, like Nature and Science) are interested only in positive, spectacular, “sexy” results. Which creates a powerful filter against negative, or marginally significant results. What you see in science journals, in other words, isn’t a statistically representative sample of scientific results, but a highly biased one, in favor of positive outcomes. No wonder that when people try to repeat the feat they often come up empty handed.
  • A third cause for the problem, not mentioned by Paulos but addressed in the New Yorker article, is the selective reporting of results by scientists themselves. This is essentially the same phenomenon as the publication bias, except that this time it is scientists themselves, not editors and reviewers, who don’t bother to submit for publication results that are either negative or not strongly conclusive. Again, the outcome is that what we see in the literature isn’t all the science that we ought to see. And it’s no good to argue that it is the “best” science, because the quality of scientific research is measured by the appropriateness of the experimental protocols (including the use of large samples) and of the data analyses — not by whether the results happen to confirm the scientist’s favorite theory.
  • The conclusion of all this is not, of course, that we should throw the baby (science) out with the bath water (bad or unreliable results). But scientists should also be under no illusion that these are rare anomalies that do not affect scientific research at large. Too much emphasis is being put on the “publish or perish” culture of modern academia, with the result that graduate students are explicitly instructed to go for the SPU’s — Smallest Publishable Units — when they have to decide how much of their work to submit to a journal. That way they maximize the number of their publications, which maximizes the chances of landing a postdoc position, and then a tenure track one, and then of getting grants funded, and finally of getting tenure. The result is that, according to statistics published by Nature, it turns out that about ⅓ of published studies is never cited (not to mention replicated!).
  • “Scientists these days tend to keep up the polite fiction that all science is equal. Except for the work of the misguided opponent whose arguments we happen to be refuting at the time, we speak as though every scientist’s field and methods of study are as good as every other scientist’s, and perhaps a little better. This keeps us all cordial when it comes to recommending each other for government grants. ... We speak piously of taking measurements and making small studies that will ‘add another brick to the temple of science.’ Most such bricks lie around the brickyard.”
    • Weiye Loh
       
      Written by John Platt in a "Science" article published in 1964
  • Most damning of all, however, is the potential effect that all of this may have on science’s already dubious reputation with the general public (think evolution-creation, vaccine-autism, or climate change)
  • “If we don’t tell the public about these problems, then we’re no better than non-scientists who falsely claim they can heal. If the drugs don’t work and we’re not sure how to treat something, why should we claim differently? Some fear that there may be less funding because we stop claiming we can prove we have miraculous treatments. But if we can’t really provide those miracles, how long will we be able to fool the public anyway? The scientific enterprise is probably the most fantastic achievement in human history, but that doesn’t mean we have a right to overstate what we’re accomplishing.”
  • Joseph T. Lapp said... But is any of this new for science? Perhaps science has operated this way all along, full of fits and starts, mostly duds. How do we know that this isn't the optimal way for science to operate?My issues are with the understanding of science that high school graduates have, and with the reporting of science.
    • Weiye Loh
       
      It's the media at fault again.
  • What seems to have emerged in recent decades is a change in the institutional setting that got science advancing spectacularly since the establishment of the Royal Society. Flaws in the system such as corporate funded research, pal-review instead of peer-review, publication bias, science entangled with policy advocacy, and suchlike, may be distorting the environment, making it less suitable for the production of good science, especially in some fields.
  • Remedies should exist, but they should evolve rather than being imposed on a reluctant sociological-economic science establishment driven by powerful motives such as professional advance or funding. After all, who or what would have the authority to impose those rules, other than the scientific establishment itself?
Weiye Loh

Effective media reporting of sea level rise projections: 1989-2009 - 0 views

  •  
    In the mass media, sea level rise is commonly associated with the impacts of climate change due to increasing atmospheric greenhouse gases. As this issue garners ongoing international policy attention, segments of the scientific community have expressed unease about how this has been covered by mass media. Therefore, this study examines how sea level rise projections-in IPCC Assessment Reports and a sample of the scientific literature-have been represented in seven prominent United States (US) and United Kingdom (UK) newspapers over the past two decades. The research found that-with few exceptions-journalists have accurately portrayed scientific research on sea level rise projections to 2100. Moreover, while coverage has predictably increased in the past 20 years, journalists have paid particular attention to the issue in years when an IPCC report is released or when major international negotiations take place, rather than when direct research is completed and specific projections are published. We reason that the combination of these factors has contributed to a perceived problem in the sea level rise reporting by the scientific community, although systematic empirical research shows none. In this contemporary high-stakes, high-profile and highly politicized arena of climate science and policy interactions, such results mark a particular bright spot in media representations of climate change. These findings can also contribute to more measured considerations of climate impacts and policy action at a critical juncture of international negotiations and everyday decision-making associated with the causes and consequences of climate change.
Weiye Loh

Hiding the Decline | Climate Etc. - 0 views

  • we need to understand the magnitude and characteristics and causes of natural climate variability over the current interglacial, particularly the last 2000 years.  I’m more interested in the handle than the blade of the hockey stick.  I also view understanding regional climate variations as much more important than trying to use some statistical model to create global average anomalies (which I personally regard as pointless, given the sampling issue).
  • I am really hoping that the AR5 will do a better job of providing a useful analysis and assessment of the paleodata for the last millennium.  However I am not too optimistic. There was another Workshop in Lisbon this past year (Sept 2010), on the Medieval Warm Period.  The abstracts for the presentations are found here.  No surprises, many of the usual people doing the usual things.
  • This raises the issue as to whether there is any value at all in the tree ring analyses for this application, and whether these paleoreconstructions can tell us anything.  Apart from the issue of the proxies not matching the observations from the current period of warming (which is also the period of best historical data), there is the further issue as to whether these hemispheric or global temperature analyses make any sense at all because of the sampling issue.  I am personally having a difficult time in seeing how this stuff has any credibility at the level of “likely” confidence levels reported in the TAR and AR4.
  • ...5 more annotations...
  • There is no question that the diagrams and accompanying text in the IPCC TAR, AR4 and WMO 1999 are misleading.  I was misled.  Upon considering the material presented in these reports, it did not occur to me that recent paleo data was not consistent with the historical record.  The one statement in AR4 (put in after McIntyre’s insistence as a reviewer) that mentions the divergence problem is weak tea.
  • It is obvious that there has been deletion of adverse data in figures shown IPCC AR3 and AR4, and the 1999 WMO document.  Not only is this misleading, but it is dishonest (I agree with Muller on this one).  The authors defend themselves by stating that there has been no attempt to hide the divergence problem in the literature, and that the relevant paper was referenced.  I infer then that there is something in the IPCC process or the authors’ interpretation of the IPCC process  (i.e. don’t dilute the message) that corrupted the scientists into deleting the adverse data in these diagrams.
  • McIntyre’s analysis is sufficiently well documented that it is difficult to imagine that his analysis is incorrect in any significant way.  If his analysis is incorrect, it should be refuted.  I would like to know what the heck Mann, Briffa, Jones et al. were thinking when they did this and why they did this, and how they can defend this, although the emails provide pretty strong clues.  Does the IPCC regard this as acceptable?  I sure don’t.
  • paleoproxies are outside the arena of my personal research expertise, and I find my eyes glaze over when I start reading about bristlecones, etc.  However, two things this week have changed my mind, and I have decided to take on one aspect of this issue: the infamous “hide the decline.” The first thing that contributed to my mind change was this post at Bishop Hill entitled “Will Sir John condemn hide the decline?”, related to Sir John Beddington’s statement:  It is time the scientific community became proactive in challenging misuse of scientific evidence.
  • The second thing was this youtube clip of physicist Richard Muller (Director of the Berkeley Earth Project), where he discusses “hide the decline” and vehemently refers to this as “dishonest,” and says “you are not allowed to do this,” and further states that he intends not to read further papers by these authors (note “hide the decline” appears around minute 31 into the clip).  While most of his research is in physics, Muller has also published important papers on paleoclimate, including a controversial paper that supported McIntyre and McKitrick’s analysis.
Weiye Loh

Ads Implant False Memories | Wired Science | Wired.com - 0 views

  • The experiment went like this: 100 undergraduates were introduced to a new popcorn product called “Orville Redenbacher’s Gourmet Fresh Microwave Popcorn.” (No such product exists, but that’s the point.) Then, the students were randomly assigned to various advertisement conditions. Some subjects viewed low-imagery text ads, which described the delicious taste of this new snack food. Others watched a high-imagery commercial, in which they watched all sorts of happy people enjoying this popcorn in their living room. After viewing the ads, the students were then assigned to one of two rooms. In one room, they were given an unrelated survey. In the other room, however, they were given a sample of this fictional new popcorn to taste. (A different Orville Redenbacher popcorn was actually used.) One week later, all the subjects were quizzed about their memory of the product. Here’s where things get disturbing: While students who saw the low-imagery ad were extremely unlikely to report having tried the popcorn, those who watched the slick commercial were just as likely to have said they tried the popcorn as those who actually did. Furthermore, their ratings of the product were as favorable as those who sampled the salty, buttery treat. Most troubling, perhaps, is that these subjects were extremely confident in these made-up memories. The delusion felt true. They didn’t like the popcorn because they’d seen a good ad. They liked the popcorn because it was delicious.
  • “false experience effect,”
  • “Viewing the vivid advertisement created a false memory of eating the popcorn, despite the fact that eating the non-existent product would have been impossible,” write Priyali Rajagopal and Nicole Montgomery, the lead authors on the paper. “As a result, consumers need to be vigilant while processing high-imagery advertisements.”
  • ...2 more annotations...
  • How could a stupid commercial trick me into believing that I loved a product I’d never actually tasted? Or that I drank Coke out of glass bottles? The answer returns us to a troubling recent theory known as memory reconsolidation. In essence, reconsolidation is rooted in the fact that every time we recall a memory we also remake it, subtly tweaking the neuronal details. Although we like to think of our memories as being immutable impressions, somehow separate from the act of remembering them, they aren’t. A memory is only as real as the last time you remembered it. What’s disturbing, of course, is that we can’t help but borrow many of our memories from elsewhere, so that the ad we watched on television becomes our own, part of that personal narrative we repeat and retell.
  • This idea, simple as it seems, requires us to completely re-imagine our assumptions about memory.  It reveals memory as a ceaseless process, not a repository of inert information. The recall is altered in the absence of the original stimulus, becoming less about what we actually remember and more about what we’d like to remember. It’s the difference between a “Save” and the “Save As” function. Our memories are a “Save As”: They are files that get rewritten every time we remember them, which is why the more we remember something, the less accurate the memory becomes. And so that pretty picture of popcorn becomes a taste we definitely remember, and that alluring soda commercial becomes a scene from my own life. We steal our stories from everywhere. Marketers, it turns out, are just really good at giving us stories we want to steal.
  •  
    A new study, published in The Journal of Consumer Research, helps explain both the success of this marketing strategy and my flawed nostalgia for Coke. It turns out that vivid commercials are incredibly good at tricking the hippocampus (a center of long-term memory in the brain) into believing that the scene we just watched on television actually happened. And it happened to us.
Weiye Loh

Gender and time comparisons on Twitter - 0 views

  •  
    Men and women are different. You know that. But do they tweet differently? Tweetolife is a simple application that lets you compare and contrast what men and women tweet about. Simply type in a search term or phrase and compare. For example, search for love, and 63 percent of tweets that contain that word were from women, based on the sample data collected between November 2009 and February 2010.
Weiye Loh

English: Who speaks English? | The Economist - 0 views

  • This was not a statistically controlled study: the subjects took a free test online and of their own accord.  They were by definition connected to the internet and interested in testing their English; they will also be younger and more urban than the population at large.
  • But Philip Hult, the boss of EF, says that his sample shows results similar to a more scientifically controlled but smaller study by the British Council.
  • Wealthy countries do better overall. But smaller wealthy countries do better still: the larger the number of speakers of a country’s main language, the worse that country tends to be at English. This is one reason Scandinavians do so well: what use is Swedish outside Sweden?  It may also explain why Spain was the worst performer in western Europe, and why Latin America was the worst-performing region: Spanish’s role as an international language in a big region dampens incentives to learn English.
  • ...4 more annotations...
  • Export dependency is another correlate with English. Countries that export more are better at English (though it’s not clear which factor causes which).  Malaysia, the best English-performer in Asia, is also the sixth-most export-dependent country in the world.  (Singapore was too small to make the list, or it probably would have ranked similarly.) This is perhaps surprising, given a recent trend towards anti-colonial and anti-Western sentiment in Malaysia’s politics. The study’s authors surmise that English has become seen as a mere tool, divorced in many minds from its associations with Britain and America.
  • Teaching plays a role, too. Starting young, while it seems a good idea, may not pay off: children between eight and 12 learn foreign languages faster than younger ones, so each class hour on English is better spent on a 10-year-old than on a six-year-old.
  • Between 1984 and 2000, the study's authors say, the Netherlands and Denmark began English-teaching between 10 and 12, while Spain and Italy began between eight and 11, with considerably worse results. Mr Hult reckons that poor methods, particularly the rote learning he sees in Japan, can be responsible for poor results despite strenuous efforts.
  • one surprising result is that China and India are next to each other (29th and 30th of 44) in the rankings, despite India’s reputation as more Anglophone. Mr Hult says that the Chinese have made a broad push for English (they're "practically obsessed with it”). But efforts like this take time to marinade through entire economies, and so may have avoided notice by outsiders. India, by contrast, has long had well-known Anglophone elites, but this is a narrow slice of the population in a country considerably poorer and less educated than China. English has helped India out-compete China in services, while China has excelled in manufacturing. But if China keeps up the push for English, the subcontinental neighbour's advantage may not last.
Weiye Loh

A Brief Primer on Criminal Statistics « Canada « Skeptic North - 0 views

  • Occurrences of crime are properly expressed as the number of incidences per 100,000 people. Total numbers are not informative on their own and it is very easy to manipulate an argument by cherry picking between a total number and a rate.  Beware of claims about crime that use raw incidence numbers. When a change in whole incidence numbers is observed, this might not have any bearing on crime levels at all, because levels of crime are dependent on population.
  • Whole Numbers versus Rates
  • Reliability Not every criminal statistic is equally reliable. Even though we have measures of incidences of crimes across types and subtypes, not every one of these statistics samples the actual incidence of these crimes in the same way. Indeed, very few measure the total incidences very reliably at all. The crime rates that you are most likely to encounter capture only crimes known and substantiated by police. These numbers are vulnerable to variances in how crimes become known and verified by police in the first place. Crimes very often go unreported or undiscovered. Some crimes are more likely to go unreported than others (such as sexual assaults and drug possession), and some crimes are more difficult to substantiate as having occurred than others.
  • ...9 more annotations...
  • Complicating matters further is the fact that these reporting patterns vary over time and are reflected in observed trends.   So, when a change in the police reported crime rate is observed from year to year or across a span of time we may be observing a “real” change, we may be observing a change in how these crimes come to the attention of police, or we may be seeing a mixture of both.
  • Generally, the most reliable criminal statistic is the homicide rate – it’s very difficult, though not impossible, to miss a dead body. In fact, homicides in Canada are counted in the year that they become known to police and not in the year that they occurred.  Our most reliable number is very, very close, but not infallible.
  • Crimes known to the police nearly always under measure the true incidence of crime, so other measures are needed to better complete our understanding. The reported crimes measure is reported every year to Statistics Canada from data that makes up the Uniform Crime Reporting Survey. This is a very rich data set that measures police data very accurately but tells us nothing about unreported crime.
  • We do have some data on unreported crime available. Victims are interviewed (after self-identifying) via the General Social Survey. The survey is conducted every five years
  • This measure captures information in eight crime categories both reported, and not reported to police. It has its own set of interpretation problems and pathways to misuse. The survey relies on self-reporting, so the accuracy of the information will be open to errors due to faulty memories, willingness to report, recording errors etc.
  • From the last data set available, self-identified victims did not report 69% of violent victimizations (sexual assault, robbery and physical assault), 62% of household victimizations (break and enter, motor vehicle/parts theft, household property theft and vandalism), and 71% of personal property theft victimizations.
  • while people generally understand that crimes go unreported and unknown to police, they tend to be surprised and perhaps even shocked at the actual amounts that get unreported. These numbers sound scary. However, the most common reasons reported by victims of violent and household crime for not reporting were: believing the incident was not important enough (68%) believing the police couldn’t do anything about the incident (59%), and stating that the incident was dealt with in another way (42%).
  • Also, note that the survey indicated that 82% of violent incidents did not result in injuries to the victims. Do claims that we should do something about all this hidden crime make sense in light of what this crime looks like in the limited way we can understand it? How could you be reasonably certain that whatever intervention proposed would in fact reduce the actual amount of crime and not just reduce the amount that goes unreported?
  • Data is collected at all levels of the crime continuum with differing levels of accuracy and applicability. This is nicely reflected in the concept of “the crime funnel”. All criminal incidents that are ever committed are at the opening of the funnel. There is “loss” all along the way to the bottom where only a small sample of incidences become known with charges laid, prosecuted successfully and responded to by the justice system.  What goes into the top levels of the funnel affects what we can know at any other point later.
Weiye Loh

Alzheimer's Studies Find New Genetic Links - NYTimes.com - 0 views

  • The two largest studies of Alzheimer’s disease have led to the discovery of no fewer than five genes that provide intriguing new clues to why the disease strikes and how it progresses.
  • For years, there have been unproven but persistent hints that cholesterol and inflammation are part of the disease process. People with high cholesterol are more likely to get the disease. Strokes and head injuries, which make Alzheimer’s more likely, also cause brain inflammation. Now, some of the newly discovered genes appear to bolster this line of thought, because some are involved with cholesterol and others are linked to inflammation or the transport of molecules inside cells.
  • By themselves, the genes are not nearly as important a factor as APOE, a gene discovered in 1995 that greatly increases risk for the disease: by 400 percent if a person inherits a copy from one parent, by 1,000 percent if from both parents.
  • ...7 more annotations...
  • In contrast, each of the new genes increases risk by no more than 10 to 15 percent; for that reason, they will not be used to decide if a person is likely to develop Alzheimer’s. APOE, which is involved in metabolizing cholesterol, “is in a class of its own,” said Dr. Rudolph Tanzi, a neurology professor at Harvard Medical School and an author of one of the papers.
  • But researchers say that even a slight increase in risk helps them in understanding the disease and developing new therapies. And like APOE, some of the newly discovered genes appear to be involved with cholesterol.
  • The other paper is by researchers in Britain, France and other European countries with contributions from the United States. They confirmed the genes found by the American researchers and added one more gene.
  • The American study got started about three years ago when Gerard D. Schellenberg, a pathology professor at the University of Pennsylvania, went to the National Institutes of Health with a complaint and a proposal. Individual research groups had been doing their own genome studies but not having much success, because no one center had enough subjects. In an interview, Dr. Schellenberg said that he had told Dr. Richard J. Hodes, director of the National Institute on Aging, the small genomic studies had to stop, and that Dr. Hodes had agreed. These days, Dr. Hodes said, “the old model in which researchers jealously guarded their data is no longer applicable.”
  • So Dr. Schellenberg set out to gather all the data he could on Alzheimer’s patients and on healthy people of the same ages. The idea was to compare one million positions on each person’s genome to determine whether some genes were more common in those who had Alzheimer’s. “I spent a lot of time being nice to people on the phone,” Dr. Schellenberg said. He got what he wanted: nearly every Alzheimer’s center and Alzheimer’s geneticist in the country cooperated. Dr. Schellenberg and his colleagues used the mass of genetic data to do an analysis and find the genes and then, using two different populations, to confirm that the same genes were conferring the risk. That helped assure the investigators that they were not looking at a chance association. It was a huge effort, Dr. Mayeux said. Many medical centers had Alzheimer’s patients’ tissue sitting in freezers. They had to extract the DNA and do genome scans.
  • “One of my jobs was to make sure the Alzheimer’s cases really were cases — that they had used some reasonable criteria” for diagnosis, Dr. Mayeux said. “And I had to be sure that people who were unaffected really were unaffected.”
  • Meanwhile, the European group, led by Dr. Julie Williams of the School of Medicine at Cardiff University, was engaged in a similar effort. Dr. Schellenberg said the two groups compared their results and were reassured that they were largely finding the same genes. “If there were mistakes, we wouldn’t see the same things,” he added. Now the European and American groups are pooling their data to do an enormous study, looking for genes in the combined samples. “We are upping the sample size,” Dr. Schellenberg said. “We are pretty sure more stuff will pop out.”
  •  
    Gene Study Yields
Weiye Loh

New Service Adds Your Drunken Facebook Photos To Employer Background Checks, For Up To ... - 0 views

  •  
    The FTC has given thumbs up to a company, Social Intelligence Corp., selling a new kind of employee background check to employers. This one scours the internet for your posts and pictures to social media sites and creates a file of all the dumb stuff you ever uploaded online. For instance, this sample they provided was flagged for "Demonstrating potentially violent behavior" because of "flagrant display of weapons or bombs." The FTC said that the file, which will last for up to seven years, does not violate the Fair Credit Reporting Act. The company also says that info in your file will be updated when you remove pictures from the social media sites. Forbes reports, "new employers who run searches through Social Intelligence won't have access to the materials if they are completely removed from the Internet."
Weiye Loh

On the Media: Survey shows that not all polls are equal - latimes.com - 0 views

  • Internet surveys sometimes acknowledge how unscientific (read: meaningless) they really are. They surely must be a pale imitation of the rigorous, carefully sampled, thoroughly transparent polls favored by political savants and mainstream news organizations
  • The line between junk and credible polling remains. But it became a little blurrier — creating concern among professional survey organizations and reason for greater skepticism by all of us — because of charges this week that one widely cited pollster may have fabricated data or manipulated it so seriously as to render it meaningless.
  • founder of the left-leaning Daily Kos website, filed a lawsuit in federal court in Oakland on Wednesday charging that Research 2000, the organization he had commissioned for 1 1/2 years to test voter opinion, had doctored its results.
  • ...4 more annotations...
  • The firm's protestations that it did nothing wrong have been loud and repeated. Evidence against the company is somewhat arcane. Suffice it to say that independent statisticians have found a bewildering lack of statistical "noise" in the company's data. Where random variation would be expected, results are too consistent.
  • Most reputable pollsters agree on one thing — polling organizations should publicly disclose as much of their methodology as possible. Just for starters, they should reveal how many people were interviewed, how they were selected, how many rejected the survey, how "likely voters" and other sub-groups were defined and how the raw data was weighted to reflect the population, or subgroups.
  • Michael Cornfield, a George Washington University political scientist and polling expert, recommends that concerned citizens ignore the lone, sometimes sensational, poll result. "Trend data are superior to a single point in time," Cornfield said via e-mail, "and consensus results from multiple firms are superior to those conducted by a single outfit."
  • The rest of us should look at none of the polls. Or look at all of them. And look out for the operators not willing to tell us how they're doing business.
  •  
    On the Media: Survey shows not all polls equal
juliet huang

Go slow with Net law - 4 views

Article : Go slow with tech law Published : 23 Aug 2009 Source: Straits Times Background : When Singapore signed a free trade agreement with the USA in 2003, intellectual property rights was a ...

sim lim square

started by juliet huang on 26 Aug 09 no follow-up yet
1 - 20 of 29 Next ›
Showing 20 items per page