Skip to main content

Home/ New Media Ethics 2009 course/ Group items tagged test

Rss Feed Group items tagged

Weiye Loh

Odds Are, It's Wrong - Science News - 0 views

  • science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.
  • a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.
  • science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.
  • ...24 more annotations...
  • Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.
  • “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”
  • In 2007, for instance, researchers combing the medical literature found numerous studies linking a total of 85 genetic variants in 70 different genes to acute coronary syndrome, a cluster of heart problems. When the researchers compared genetic tests of 811 patients that had the syndrome with a group of 650 (matched for sex and age) that didn’t, only one of the suspect gene variants turned up substantially more often in those with the syndrome — a number to be expected by chance.“Our null results provide no support for the hypothesis that any of the 85 genetic variants tested is a susceptibility factor” for the syndrome, the researchers reported in the Journal of the American Medical Association.How could so many studies be wrong? Because their conclusions relied on “statistical significance,” a concept at the heart of the mathematical analysis of modern scientific experiments.
  • Statistical significance is a phrase that every science graduate student learns, but few comprehend. While its origins stretch back at least to the 19th century, the modern notion was pioneered by the mathematician Ronald A. Fisher in the 1920s. His original interest was agriculture. He sought a test of whether variation in crop yields was due to some specific intervention (say, fertilizer) or merely reflected random factors beyond experimental control.Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.Fisher’s P value eventually became the ultimate arbiter of credibility for science results of all sorts
  • But in fact, there’s no logical basis for using a P value from a single study to draw any conclusion. If the chance of a fluke is less than 5 percent, two possible conclusions remain: There is a real effect, or the result is an improbable fluke. Fisher’s method offers no way to know which is which. On the other hand, if a study finds no statistically significant effect, that doesn’t prove anything, either. Perhaps the effect doesn’t exist, or maybe the statistical test wasn’t powerful enough to detect a small but real effect.
  • Soon after Fisher established his system of statistical significance, it was attacked by other mathematicians, notably Egon Pearson and Jerzy Neyman. Rather than testing a null hypothesis, they argued, it made more sense to test competing hypotheses against one another. That approach also produces a P value, which is used to gauge the likelihood of a “false positive” — concluding an effect is real when it actually isn’t. What  eventually emerged was a hybrid mix of the mutually inconsistent Fisher and Neyman-Pearson approaches, which has rendered interpretations of standard statistics muddled at best and simply erroneous at worst. As a result, most scientists are confused about the meaning of a P value or how to interpret it. “It’s almost never, ever, ever stated correctly, what it means,” says Goodman.
  • experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”
  • That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time. (See Box 2)
    • Weiye Loh
       
      Does the problem then, lie not in statistics, but the interpretation of statistics? Is the fallacy of appeal to probability is at work in such interpretation? 
  • Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms. A new drug may be statistically better than an old drug, but for every thousand people you treat you might get just one or two additional cures — not clinically significant. Similarly, when studies claim that a chemical causes a “significantly increased risk of cancer,” they often mean that it is just statistically significant, possibly posing only a tiny absolute increase in risk.
  • Statisticians perpetually caution against mistaking statistical significance for practical importance, but scientific papers commit that error often. Ziliak studied journals from various fields — psychology, medicine and economics among others — and reported frequent disregard for the distinction.
  • “I found that eight or nine of every 10 articles published in the leading journals make the fatal substitution” of equating statistical significance to importance, he said in an interview. Ziliak’s data are documented in the 2008 book The Cult of Statistical Significance, coauthored with Deirdre McCloskey of the University of Illinois at Chicago.
  • Multiplicity of mistakesEven when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly.
  • Recognizing these problems, some researchers now calculate a “false discovery rate” to warn of flukes disguised as real effects. And genetics researchers have begun using “genome-wide association studies” that attempt to ameliorate the multiplicity issue (SN: 6/21/08, p. 20).
  • Many researchers now also commonly report results with confidence intervals, similar to the margins of error reported in opinion polls. Such intervals, usually given as a range that should include the actual value with 95 percent confidence, do convey a better sense of how precise a finding is. But the 95 percent confidence calculation is based on the same math as the .05 P value and so still shares some of its problems.
  • Statistical problems also afflict the “gold standard” for medical research, the randomized, controlled clinical trials that test drugs for their ability to cure or their power to harm. Such trials assign patients at random to receive either the substance being tested or a placebo, typically a sugar pill; random selection supposedly guarantees that patients’ personal characteristics won’t bias the choice of who gets the actual treatment. But in practice, selection biases may still occur, Vance Berger and Sherri Weinstein noted in 2004 in ControlledClinical Trials. “Some of the benefits ascribed to randomization, for example that it eliminates all selection bias, can better be described as fantasy than reality,” they wrote.
  • Randomization also should ensure that unknown differences among individuals are mixed in roughly the same proportions in the groups being tested. But statistics do not guarantee an equal distribution any more than they prohibit 10 heads in a row when flipping a penny. With thousands of clinical trials in progress, some will not be well randomized. And DNA differs at more than a million spots in the human genetic catalog, so even in a single trial differences may not be evenly mixed. In a sufficiently large trial, unrandomized factors may balance out, if some have positive effects and some are negative. (See Box 3) Still, trial results are reported as averages that may obscure individual differences, masking beneficial or harm­ful effects and possibly leading to approval of drugs that are deadly for some and denial of effective treatment to others.
  • nother concern is the common strategy of combining results from many trials into a single “meta-analysis,” a study of studies. In a single trial with relatively few participants, statistical tests may not detect small but real and possibly important effects. In principle, combining smaller studies to create a larger sample would allow the tests to detect such small effects. But statistical techniques for doing so are valid only if certain criteria are met. For one thing, all the studies conducted on the drug must be included — published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,” he says.
  • Meta-analyses have produced many controversial conclusions. Common claims that antidepressants work no better than placebos, for example, are based on meta-analyses that do not conform to the criteria that would confer validity. Similar problems afflicted a 2007 meta-analysis, published in the New England Journal of Medicine, that attributed increased heart attack risk to the diabetes drug Avandia. Raw data from the combined trials showed that only 55 people in 10,000 had heart attacks when using Avandia, compared with 59 people per 10,000 in comparison groups. But after a series of statistical manipulations, Avandia appeared to confer an increased risk.
  • combining small studies in a meta-analysis is not a good substitute for a single trial sufficiently large to test a given question. “Meta-analyses can reduce the role of chance in the interpretation but may introduce bias and confounding,” Hennekens and DeMets write in the Dec. 2 Journal of the American Medical Association. “Such results should be considered more as hypothesis formulating than as hypothesis testing.”
  • Some studies show dramatic effects that don’t require sophisticated statistics to interpret. If the P value is 0.0001 — a hundredth of a percent chance of a fluke — that is strong evidence, Goodman points out. Besides, most well-accepted science is based not on any single study, but on studies that have been confirmed by repetition. Any one result may be likely to be wrong, but confidence rises quickly if that result is independently replicated.“Replication is vital,” says statistician Juliet Shaffer, a lecturer emeritus at the University of California, Berkeley. And in medicine, she says, the need for replication is widely recognized. “But in the social sciences and behavioral sciences, replication is not common,” she noted in San Diego in February at the annual meeting of the American Association for the Advancement of Science. “This is a sad situation.”
  • Most critics of standard statistics advocate the Bayesian approach to statistical reasoning, a methodology that derives from a theorem credited to Bayes, an 18th century English clergyman. His approach uses similar math, but requires the added twist of a “prior probability” — in essence, an informed guess about the expected probability of something in advance of the study. Often this prior probability is more than a mere guess — it could be based, for instance, on previous studies.
  • it basically just reflects the need to include previous knowledge when drawing conclusions from new observations. To infer the odds that a barking dog is hungry, for instance, it is not enough to know how often the dog barks when well-fed. You also need to know how often it eats — in order to calculate the prior probability of being hungry. Bayesian math combines a prior probability with observed data to produce an estimate of the likelihood of the hunger hypothesis. “A scientific hypothesis cannot be properly assessed solely by reference to the observational data,” but only by viewing the data in light of prior belief in the hypothesis, wrote George Diamond and Sanjay Kaul of UCLA’s School of Medicine in 2004 in the Journal of the American College of Cardiology. “Bayes’ theorem is ... a logically consistent, mathematically valid, and intuitive way to draw inferences about the hypothesis.” (See Box 4)
  • In many real-life contexts, Bayesian methods do produce the best answers to important questions. In medical diagnoses, for instance, the likelihood that a test for a disease is correct depends on the prevalence of the disease in the population, a factor that Bayesian math would take into account.
  • But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics. “Subjective prior beliefs are anathema to the frequentist, who relies instead on a series of ad hoc algorithms that maintain the facade of scientific objectivity,” Diamond and Kaul wrote.Conflict between frequentists and Bayesians has been ongoing for two centuries. So science’s marriage to mathematics seems to entail some irreconcilable differences. Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.“What does probability mean in real life?” the statistician David Salsburg asked in his 2001 book The Lady Tasting Tea. “This problem is still unsolved, and ... if it remains un­solved, the whole of the statistical approach to science may come crashing down from the weight of its own inconsistencies.”
  •  
    Odds Are, It's Wrong Science fails to face the shortcomings of statistics
Weiye Loh

The Decline Effect and the Scientific Method : The New Yorker - 0 views

  • On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties.
  • the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.
  • Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
  • ...30 more annotations...
  • But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
  • the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.
  • At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”
  • In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”
  • The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!
  • this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”
  • In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
  • the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.
  • the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
  • Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
  • While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts
  • an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
  • The funnel graph visually captures the distortions of selective reporting. For instance, after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results.
  • Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
  • Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
  • One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
  • John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.”
  • In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
  • The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds of studies on the various genes that control the differences in disease risk between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
  • the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
  • The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
  • scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,” he says. The current “obsession” with replicability distracts from the real problem, which is faulty design. He notes that nobody even tries to replicate most science papers—there are simply too many. (According to Nature, a third of all studies never even get cited, let alone repeated.)
  • Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”
  • Although such reforms would mitigate the dangers of publication bias and selective reporting, they still wouldn’t erase the decline effect. This is largely because scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging
  • John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
  • The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
  • The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
  • This suggests that the decline effect is actually a decline of illusion.
  • While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that. Many scientific theories continue to be considered true even after failing numerous experimental tests. Verbal overshadowing might exhibit the decline effect, but it remains extensively relied upon within the field. The same holds for any number of phenomena, from the disappearing benefits of second-generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite these findings, second-generation antipsychotics are still widely prescribed, and our model of the neutron hasn’t changed. The law of gravity remains the same.
  • Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
Weiye Loh

Likert scale - Wikipedia, the free encyclopedia - 0 views

  • Whether individual Likert items can be considered as interval-level data, or whether they should be considered merely ordered-categorical data is the subject of disagreement. Many regard such items only as ordinal data, because, especially when using only five levels, one cannot assume that respondents perceive all pairs of adjacent levels as equidistant. On the other hand, often (as in the example above) the wording of response levels clearly implies a symmetry of response levels about a middle category; at the very least, such an item would fall between ordinal- and interval-level measurement; to treat it as merely ordinal would lose information. Further, if the item is accompanied by a visual analog scale, where equal spacing of response levels is clearly indicated, the argument for treating it as interval-level data is even stronger.
  • When treated as ordinal data, Likert responses can be collated into bar charts, central tendency summarised by the median or the mode (but some would say not the mean), dispersion summarised by the range across quartiles (but some would say not the standard deviation), or analyzed using non-parametric tests, e.g. chi-square test, Mann–Whitney test, Wilcoxon signed-rank test, or Kruskal–Wallis test.[4] Parametric analysis of ordinary averages of Likert scale data is also justifiable by the Central Limit Theorem, although some would disagree that ordinary averages should be used for Likert scale data.
Weiye Loh

Singapore M.D.: Arrogance of ignorance 2 - 0 views

  • The truth is not all screening tests are equal, and more importantly, even when a screening test is accurate (yes, I chose to use this term because I am lazy) and we are able "determine the health condition of an individual", it did not necessarily mean that it was cost-effective to screen the population at large, or indeed a specific patient. Unfortunately, if someone was in a position that "the costs of these tests are a big burden" to him or her, an accurate diagnosis may just be the beginning of more financial burden...
  • But the cost-effectiveness of screening tests are really quite a technical issue that we cannot expect laymen (or even all doctors) to understand - that's not what bothered me about this letter. What bothered me was how a layman can think that a bunch of doctors and statisticians sitting in MOH can be blind to the benefits of "all cancer screenings as well as all tests to detect heart disease", which are so obvious to him.
Weiye Loh

11.01.97 - Misconceptions about the causes of cancer lead to skewed priorities and wast... - 0 views

  • One of the big misconceptions is that artificial chemicals such as pesticides have a lot to do with human cancer, but that's just not true," says Bruce N. Ames, professor of biochemistry and molecular biology at the University of California at Berkeley and co-author of a new review of what is known about environmental pollution and cancer. "Nevertheless, it's conventional wisdom and society spends billions on this each year." "We consume more carcinogens in one cup of coffee than we get from the pesticide residues on all the fruits and vegetables we eat in a year," he adds.
  • there may be many excellent reasons for cleaning up pollution of our air, water and soil, the researchers say, prevention of cancer is not one of them.
  • "The problem is that lifestyle changes are tough," says Gold, director of the Carcinogenic Potency Project at UC Berkeley's National Institute for Environmental Health Sciences Center and a senior scientist in the cell and molecular biology division at Lawrence Berkeley National Laboratory. "But by targeting pesticide residues as a major problem, we risk making fruits and vegetables more expensive and indirectly increasing cancer risks, especially among the poor."
  • ...10 more annotations...
  • Whereas 99.9 percent of all the chemicals we ingest are natural, 78 percent of the chemicals tested are synthetic. So when more than half of all synthetic chemicals are found to cause cancer in rodents, it's not surprising that people link cancer with synthetic chemicals. But of the natural chemicals in our diet that have been tested in animals, half also cause cancer, Gold says.
  • "We need to recognize that there are far more carcinogens in the natural world than in the synthetic world, and go after the important things, such as lifestyle change."
  • Misconception: Cancer rates are soaring. In fact, the researchers say, if lung cancer due to smoking is excluded, overall cancer deaths in the U.S. have declined 16 percent since 1950.
  • Misconception: Reducing pesticide residues is an effective way to prevent diet-related cancer. Because fruits and vegetables are of major importance in reducing cancer, the unintended effect of requiring expensive efforts to reduce the amount of pesticides remaining on fruits and vegetables will be to increase their cost. This will lead to an increase in cancer among low income people who no longer will be able to afford to eat them.
  • Misconception: Human exposures to carcinogens and other potential hazards are primarily due to synthetic chemicals. Americans actually eat about 10,000 times more natural pesticides from fruits and vegetables than synthetic pesticide residues on food. Natural pesticides are chemicals that plants produce to defend themselves against fungi, insects, and other predators. And half of all natural pesticides tested in rodents turn out to be rodent carcinogens. In addition, we consume many other carcinogens in foods because of the chemicals produced in cooking. In a single cup of roasted coffee, for example, the natural chemicals known to be rodent carcinogens are about equal in weight to an entire year's work of synthetic pesticide residues.
  • Misconception: Cancer risks to humans can be assessed by standard high-dose animal cancer tests. In cancer tests, animals are given very high, nearly toxic doses. The effect on humans at lower doses is extrapolated from these results, as if the relationship were a straight line from high dose to low dose. However, the fact that half of all chemicals tested, whether natural or synthetic, turn out to cause cancer in rodents implies that this is an artifact of using high doses. High doses of any chemical can chronically kill cells and wound tissue, a risk factor for cancer . "Our conclusion is that the scientific evidence shows that there are high-dose effects," Ames says. "But even though government regulatory agencies recognize this, they still decide which synthetic chemicals to regulate based on linear extrapolation of high dose cancer tests in animals."
  • Misconception: Synthetic chemicals pose greater carcinogenic hazards than natural chemicals. Naturally occurring carcinogens represent an enormous background compared to the low-dose exposures to residues of synthetic chemicals such as pesticides, the researchers conclude. These results call for a reevaluation of whether animal cancer tests are really useful guides for protecting the public against minor hypothetical risks.
  • Misconception: The toxicology of synthetic chemicals is different from that of natural chemicals. No evidence exists for this, but the assumption could lead to unfortunate tradeoffs between natural and synthetic pesticides. Recently, for example, when a new variety of highly insect-resistant celery was introduced on a farm, the workers handling the celery developed rashes when they were exposed to sunlight. The pest-resistant celery turned out to contain almost eight times more natural pesticide in the form of psoralens -- chemicals known to cause cancer and genetic mutations -- than common celery.
  • Misconception: Pesticides and other synthetic chemicals are disrupting human hormones. Claims that synthetic chemicals with hormonal activity contribute to cancer and reduced sperm count ignore the fact that natural chemicals have hormone-like activity millions of times greater than do traces of synthetic chemicals. Rather, lifestyle -- lack of exercise, obesity, alcohol use and reproductive history -- are known to lead to marked changes in hormone levels in the body.
  • Misconception: Regulating low, hypothetical risks advances public health. Society -- primarily the private sector -- will spend an estimated $140 billion to comply with environmental regulations this year, according to projections by the Environmental Protection Agency. Much of this is aimed at reducing low-level human exposure to chemicals solely because they are rodent carcinogens, despite the fact that this rationale is flawed. Our improved ability to detect even minuscule concentrations of chemicals makes regulation even more expensive.
  •  
    BERKELEY -- Despite a lack of convincing evidence that pollution is an important cause of human cancer, this misconception drives government policy today and results in billions of dollars spent to clean up minuscule amounts of synthetic chemicals, say two UC Berkeley researchers.
Weiye Loh

Test Prep for Kindergarten: Kids and Class Privilege » Sociological Images - 0 views

  • a New York Times article covered the stiff competition for entrance to public and private kindergartens in Manhattan for especially smart kids.  Whereas at one time teachers recommended students to these programs, today entrance to both public and private schools for gifted children is dependent entirely on test scores.
  • It’s unfair that entrance into kindergarten level programs is being gamed by people with resources, disadvantaging the most disadvantaged kids from the get go.  I think it’s egregious.  Many people will agree that this isn’t fair.
  • But the more insidious value, the one that almost no one would identify as problematic, is the idea that all parents should do everything they can to give their child advantages.  Even Ms. Stewart thinks so.  “They want to help their kids,” she said. “If I could buy it, I would, too.” Somehow, in the attachment to the idea that we should all help our kids get every advantage, the fact that advantaging your child disadvantages other people’s children gets lost.  If it advantages your child, it must be advantaging him over someone else; otherwise it’s not an advantage, you see?
  • ...1 more annotation...
  • Test prep for kindergartners seems like a pretty blatant example of class privilege. But, of course, the argument that advantaging your own kid necessarily involves disadvantaging someone else’s applies to all sorts of things, from tutoring, to a leisurely summer with which to study for the SAT, to financial support during their unpaid internships, to helping them buy a house and, thus, keeping home prices high. I think it’s worth re-evaluating. Is giving your kid every advantage the moral thing to do?
  •  
    TEST PREP FOR KINDERGARTEN: KIDS AND CLASS PRIVILEGE
Weiye Loh

Science, Strong Inference -- Proper Scientific Method - 0 views

  • Scientists these days tend to keep up a polite fiction that all science is equal. Except for the work of the misguided opponent whose arguments we happen to be refuting at the time, we speak as though every scientist's field and methods of study are as good as every other scientist's and perhaps a little better. This keeps us all cordial when it comes to recommending each other for government grants.
  • Why should there be such rapid advances in some fields and not in others? I think the usual explanations that we tend to think of - such as the tractability of the subject, or the quality or education of the men drawn into it, or the size of research contracts - are important but inadequate. I have begun to believe that the primary factor in scientific advance is an intellectual one. These rapidly moving fields are fields where a particular method of doing scientific research is systematically used and taught, an accumulative method of inductive inference that is so effective that I think it should be given the name of "strong inference." I believe it is important to examine this method, its use and history and rationale, and to see whether other groups and individuals might learn to adopt it profitably in their own scientific and intellectual work. In its separate elements, strong inference is just the simple and old-fashioned method of inductive inference that goes back to Francis Bacon. The steps are familiar to every college student and are practiced, off and on, by every scientist. The difference comes in their systematic application. Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly: Devising alternative hypotheses; Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly is possible, exclude one or more of the hypotheses; Carrying out the experiment so as to get a clean result; Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain, and so on.
  • On any new problem, of course, inductive inference is not as simple and certain as deduction, because it involves reaching out into the unknown. Steps 1 and 2 require intellectual inventions, which must be cleverly chosen so that hypothesis, experiment, outcome, and exclusion will be related in a rigorous syllogism; and the question of how to generate such inventions is one which has been extensively discussed elsewhere (2, 3). What the formal schema reminds us to do is to try to make these inventions, to take the next step, to proceed to the next fork, without dawdling or getting tied up in irrelevancies.
  • ...28 more annotations...
  • It is clear why this makes for rapid and powerful progress. For exploring the unknown, there is no faster method; this is the minimum sequence of steps. Any conclusion that is not an exclusion is insecure and must be rechecked. Any delay in recycling to the next set of hypotheses is only a delay. Strong inference, and the logical tree it generates, are to inductive reasoning what the syllogism is to deductive reasoning in that it offers a regular method for reaching firm inductive conclusions one after the other as rapidly as possible.
  • "But what is so novel about this?" someone will say. This is the method of science and always has been, why give it a special name? The reason is that many of us have almost forgotten it. Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis? We may write our scientific papers so that it looks as if we had steps 1, 2, and 3 in mind all along. But in between, we do busywork. We become "method- oriented" rather than "problem-oriented." We say we prefer to "feel our way" toward generalizations. We fail to teach our students how to sharpen up their inductive inferences. And we do not realize the added power that the regular and explicit use of alternative hypothesis and sharp exclusion could give us at every step of our research.
  • A distinguished cell biologist rose and said, "No two cells give the same properties. Biology is the science of heterogeneous systems." And he added privately. "You know there are scientists, and there are people in science who are just working with these over-simplified model systems - DNA chains and in vitro systems - who are not doing science at all. We need their auxiliary work: they build apparatus, they make minor studies, but they are not scientists." To which Cy Levinthal replied: "Well, there are two kinds of biologists, those who are looking to see if there is one thing that can be understood and those who keep saying it is very complicated and that nothing can be understood. . . . You must study the simplest system you think has the properties you are interested in."
  • At the 1958 Conference on Biophysics, at Boulder, there was a dramatic confrontation between the two points of view. Leo Szilard said: "The problems of how enzymes are induced, of how proteins are synthesized, of how antibodies are formed, are closer to solution than is generally believed. If you do stupid experiments, and finish one a year, it can take 50 years. But if you stop doing experiments for a little while and think how proteins can possibly be synthesized, there are only about 5 different ways, not 50! And it will take only a few experiments to distinguish these." One of the young men added: "It is essentially the old question: How small and elegant an experiment can you perform?" These comments upset a number of those present. An electron microscopist said. "Gentlemen, this is off the track. This is philosophy of science." Szilard retorted. "I was not quarreling with third-rate scientists: I was quarreling with first-rate scientists."
  • Any criticism or challenge to consider changing our methods strikes of course at all our ego-defenses. But in this case the analytical method offers the possibility of such great increases in effectiveness that it is unfortunate that it cannot be regarded more often as a challenge to learning rather than as challenge to combat. Many of the recent triumphs in molecular biology have in fact been achieved on just such "oversimplified model systems," very much along the analytical lines laid down in the 1958 discussion. They have not fallen to the kind of men who justify themselves by saying "No two cells are alike," regardless of how true that may ultimately be. The triumphs are in fact triumphs of a new way of thinking.
  • the emphasis on strong inference
  • is also partly due to the nature of the fields themselves. Biology, with its vast informational detail and complexity, is a "high-information" field, where years and decades can easily be wasted on the usual type of "low-information" observations or experiments if one does not think carefully in advance about what the most important and conclusive experiments would be. And in high-energy physics, both the "information flux" of particles from the new accelerators and the million-dollar costs of operation have forced a similar analytical approach. It pays to have a top-notch group debate every experiment ahead of time; and the habit spreads throughout the field.
  • Historically, I think, there have been two main contributions to the development of a satisfactory strong-inference method. The first is that of Francis Bacon (13). He wanted a "surer method" of "finding out nature" than either the logic-chopping or all-inclusive theories of the time or the laudable but crude attempts to make inductions "by simple enumeration." He did not merely urge experiments as some suppose, he showed the fruitfulness of interconnecting theory and experiment so that the one checked the other. Of the many inductive procedures he suggested, the most important, I think, was the conditional inductive tree, which proceeded from alternative hypothesis (possible "causes," as he calls them), through crucial experiments ("Instances of the Fingerpost"), to exclusion of some alternatives and adoption of what is left ("establishing axioms"). His Instances of the Fingerpost are explicitly at the forks in the logical tree, the term being borrowed "from the fingerposts which are set up where roads part, to indicate the several directions."
  • ere was a method that could separate off the empty theories! Bacon, said the inductive method could be learned by anybody, just like learning to "draw a straighter line or more perfect circle . . . with the help of a ruler or a pair of compasses." "My way of discovering sciences goes far to level men's wit and leaves but little to individual excellence, because it performs everything by the surest rules and demonstrations." Even occasional mistakes would not be fatal. "Truth will sooner come out from error than from confusion."
  • Nevertheless there is a difficulty with this method. As Bacon emphasizes, it is necessary to make "exclusions." He says, "The induction which is to be available for the discovery and demonstration of sciences and arts, must analyze nature by proper rejections and exclusions, and then, after a sufficient number of negatives come to a conclusion on the affirmative instances." "[To man] it is granted only to proceed at first by negatives, and at last to end in affirmatives after exclusion has been exhausted." Or, as the philosopher Karl Popper says today there is no such thing as proof in science - because some later alternative explanation may be as good or better - so that science advances only by disproofs. There is no point in making hypotheses that are not falsifiable because such hypotheses do not say anything, "it must be possible for all empirical scientific system to be refuted by experience" (14).
  • The difficulty is that disproof is a hard doctrine. If you have a hypothesis and I have another hypothesis, evidently one of them must be eliminated. The scientist seems to have no choice but to be either soft-headed or disputatious. Perhaps this is why so many tend to resist the strong analytical approach and why some great scientists are so disputatious.
  • Fortunately, it seems to me, this difficulty can be removed by the use of a second great intellectual invention, the "method of multiple hypotheses," which is what was needed to round out the Baconian scheme. This is a method that was put forward by T.C. Chamberlin (15), a geologist at Chicago at the turn of the century, who is best known for his contribution to the Chamberlain-Moulton hypothesis of the origin of the solar system.
  • Chamberlin says our trouble is that when we make a single hypothesis, we become attached to it. "The moment one has offered an original explanation for a phenomenon which seems satisfactory, that moment affection for his intellectual child springs into existence, and as the explanation grows into a definite theory his parental affections cluster about his offspring and it grows more and more dear to him. . . . There springs up also unwittingly a pressing of the theory to make it fit the facts and a pressing of the facts to make them fit the theory..." "To avoid this grave danger, the method of multiple working hypotheses is urged. It differs from the simple working hypothesis in that it distributes the effort and divides the affections. . . . Each hypothesis suggests its own criteria, its own method of proof, its own method of developing the truth, and if a group of hypotheses encompass the subject on all sides, the total outcome of means and of methods is full and rich."
  • The conflict and exclusion of alternatives that is necessary to sharp inductive inference has been all too often a conflict between men, each with his single Ruling Theory. But whenever each man begins to have multiple working hypotheses, it becomes purely a conflict between ideas. It becomes much easier then for each of us to aim every day at conclusive disproofs - at strong inference - without either reluctance or combativeness. In fact, when there are multiple hypotheses, which are not anyone's "personal property," and when there are crucial experiments to test them, the daily life in the laboratory takes on an interest and excitement it never had, and the students can hardly wait to get to work to see how the detective story will come out. It seems to me that this is the reason for the development of those distinctive habits of mind and the "complex thought" that Chamberlin described, the reason for the sharpness, the excitement, the zeal, the teamwork - yes, even international teamwork - in molecular biology and high- energy physics today. What else could be so effective?
  • Unfortunately, I think, there are other other areas of science today that are sick by comparison, because they have forgotten the necessity for alternative hypotheses and disproof. Each man has only one branch - or none - on the logical tree, and it twists at random without ever coming to the need for a crucial decision at any point. We can see from the external symptoms that there is something scientifically wrong. The Frozen Method, The Eternal Surveyor, The Never Finished, The Great Man With a Single Hypothcsis, The Little Club of Dependents, The Vendetta, The All-Encompassing Theory Which Can Never Be Falsified.
  • a "theory" of this sort is not a theory at all, because it does not exclude anything. It predicts everything, and therefore does not predict anything. It becomes simply a verbal formula which the graduate student repeats and believes because the professor has said it so often. This is not science, but faith; not theory, but theology. Whether it is hand-waving or number-waving, or equation-waving, a theory is not a theory unless it can be disproved. That is, unless it can be falsified by some possible experimental outcome.
  • the work methods of a number of scientists have been testimony to the power of strong inference. Is success not due in many cases to systematic use of Bacon's "surest rules and demonstrations" as much as to rare and unattainable intellectual power? Faraday's famous diary (16), or Fermi's notebooks (3, 17), show how these men believed in the effectiveness of daily steps in applying formal inductive methods to one problem after another.
  • Surveys, taxonomy, design of equipment, systematic measurements and tables, theoretical computations - all have their proper and honored place, provided they are parts of a chain of precise induction of how nature works. Unfortunately, all too often they become ends in themselves, mere time-serving from the point of view of real scientific advance, a hypertrophied methodology that justifies itself as a lore of respectability.
  • We speak piously of taking measurements and making small studies that will "add another brick to the temple of science." Most such bricks just lie around the brickyard (20). Tables of constraints have their place and value, but the study of one spectrum after another, if not frequently re-evaluated, may become a substitute for thinking, a sad waste of intelligence in a research laboratory, and a mistraining whose crippling effects may last a lifetime.
  • Beware of the man of one method or one instrument, either experimental or theoretical. He tends to become method-oriented rather than problem-oriented. The method-oriented man is shackled; the problem-oriented man is at least reaching freely toward that is most important. Strong inference redirects a man to problem-orientation, but it requires him to be willing repeatedly to put aside his last methods and teach himself new ones.
  • anyone who asks the question about scientific effectiveness will also conclude that much of the mathematizing in physics and chemistry today is irrelevant if not misleading. The great value of mathematical formulation is that when an experiment agrees with a calculation to five decimal places, a great many alternative hypotheses are pretty well excluded (though the Bohr theory and the Schrödinger theory both predict exactly the same Rydberg constant!). But when the fit is only to two decimal places, or one, it may be a trap for the unwary; it may be no better than any rule-of-thumb extrapolation, and some other kind of qualitative exclusion might be more rigorous for testing the assumptions and more important to scientific understanding than the quantitative fit.
  • Today we preach that science is not science unless it is quantitative. We substitute correlations for causal studies, and physical equations for organic reasoning. Measurements and equations are supposed to sharpen thinking, but, in my observation, they more often tend to make the thinking noncausal and fuzzy. They tend to become the object of scientific manipulation instead of auxiliary tests of crucial inferences.
  • Many - perhaps most - of the great issues of science are qualitative, not quantitative, even in physics and chemistry. Equations and measurements are useful when and only when they are related to proof; but proof or disproof comes first and is in fact strongest when it is absolutely convincing without any quantitative measurement.
  • you can catch phenomena in a logical box or in a mathematical box. The logical box is coarse but strong. The mathematical box is fine-grained but flimsy. The mathematical box is a beautiful way of wrapping up a problem, but it will not hold the phenomena unless they have been caught in a logical box to begin with.
  • Of course it is easy - and all too common - for one scientist to call the others unscientific. My point is not that my particular conclusions here are necessarily correct, but that we have long needed some absolute standard of possible scientific effectiveness by which to measure how well we are succeeding in various areas - a standard that many could agree on and one that would be undistorted by the scientific pressures and fashions of the times and the vested interests and busywork that they develop. It is not public evaluation I am interested in so much as a private measure by which to compare one's own scientific performance with what it might be. I believe that strong inference provides this kind of standard of what the maximum possible scientific effectiveness could be - as well as a recipe for reaching it.
  • The strong-inference point of view is so resolutely critical of methods of work and values in science that any attempt to compare specific cases is likely to sound but smug and destructive. Mainly one should try to teach it by example and by exhorting to self-analysis and self-improvement only in general terms
  • one severe but useful private test - a touchstone of strong inference - that removes the necessity for third-person criticism, because it is a test that anyone can learn to carry with him for use as needed. It is our old friend the Baconian "exclusion," but I call it "The Question." Obviously it should be applied as much to one's own thinking as to others'. It consists of asking in your own mind, on hearing any scientific explanation or theory put forward, "But sir, what experiment could disprove your hypothesis?"; or, on hearing a scientific experiment described, "But sir, what hypothesis does your experiment disprove?"
  • It is not true that all science is equal; or that we cannot justly compare the effectiveness of scientists by any method other than a mutual-recommendation system. The man to watch, the man to put your money on, is not the man who wants to make "a survey" or a "more detailed study" but the man with the notebook, the man with the alternative hypotheses and the crucial experiments, the man who knows how to answer your Question of disproof and is already working on it.
  •  
    There is so much bad science and bad statistics information in media reports, publications, and shared between conversants that I think it is important to understand about facts and proofs and the associated pitfalls.
Weiye Loh

Edge: HOW DOES OUR LANGUAGE SHAPE THE WAY WE THINK? By Lera Boroditsky - 0 views

  • Do the languages we speak shape the way we see the world, the way we think, and the way we live our lives? Do people who speak different languages think differently simply because they speak different languages? Does learning new languages change the way you think? Do polyglots think differently when speaking different languages?
  • For a long time, the idea that language might shape thought was considered at best untestable and more often simply wrong. Research in my labs at Stanford University and at MIT has helped reopen this question. We have collected data around the world: from China, Greece, Chile, Indonesia, Russia, and Aboriginal Australia.
  • What we have learned is that people who speak different languages do indeed think differently and that even flukes of grammar can profoundly affect how we see the world.
  • ...15 more annotations...
  • Suppose you want to say, "Bush read Chomsky's latest book." Let's focus on just the verb, "read." To say this sentence in English, we have to mark the verb for tense; in this case, we have to pronounce it like "red" and not like "reed." In Indonesian you need not (in fact, you can't) alter the verb to mark tense. In Russian you would have to alter the verb to indicate tense and gender. So if it was Laura Bush who did the reading, you'd use a different form of the verb than if it was George. In Russian you'd also have to include in the verb information about completion. If George read only part of the book, you'd use a different form of the verb than if he'd diligently plowed through the whole thing. In Turkish you'd have to include in the verb how you acquired this information: if you had witnessed this unlikely event with your own two eyes, you'd use one verb form, but if you had simply read or heard about it, or inferred it from something Bush said, you'd use a different verb form.
  • Clearly, languages require different things of their speakers. Does this mean that the speakers think differently about the world? Do English, Indonesian, Russian, and Turkish speakers end up attending to, partitioning, and remembering their experiences differently just because they speak different languages?
  • For some scholars, the answer to these questions has been an obvious yes. Just look at the way people talk, they might say. Certainly, speakers of different languages must attend to and encode strikingly different aspects of the world just so they can use their language properly. Scholars on the other side of the debate don't find the differences in how people talk convincing. All our linguistic utterances are sparse, encoding only a small part of the information we have available. Just because English speakers don't include the same information in their verbs that Russian and Turkish speakers do doesn't mean that English speakers aren't paying attention to the same things; all it means is that they're not talking about them. It's possible that everyone thinks the same way, notices the same things, but just talks differently.
  • Believers in cross-linguistic differences counter that everyone does not pay attention to the same things: if everyone did, one might think it would be easy to learn to speak other languages. Unfortunately, learning a new language (especially one not closely related to those you know) is never easy; it seems to require paying attention to a new set of distinctions. Whether it's distinguishing modes of being in Spanish, evidentiality in Turkish, or aspect in Russian, learning to speak these languages requires something more than just learning vocabulary: it requires paying attention to the right things in the world so that you have the correct information to include in what you say.
  • Follow me to Pormpuraaw, a small Aboriginal community on the western edge of Cape York, in northern Australia. I came here because of the way the locals, the Kuuk Thaayorre, talk about space. Instead of words like "right," "left," "forward," and "back," which, as commonly used in English, define space relative to an observer, the Kuuk Thaayorre, like many other Aboriginal groups, use cardinal-direction terms — north, south, east, and west — to define space.1 This is done at all scales, which means you have to say things like "There's an ant on your southeast leg" or "Move the cup to the north northwest a little bit." One obvious consequence of speaking such a language is that you have to stay oriented at all times, or else you cannot speak properly. The normal greeting in Kuuk Thaayorre is "Where are you going?" and the answer should be something like " Southsoutheast, in the middle distance." If you don't know which way you're facing, you can't even get past "Hello."
  • The result is a profound difference in navigational ability and spatial knowledge between speakers of languages that rely primarily on absolute reference frames (like Kuuk Thaayorre) and languages that rely on relative reference frames (like English).2 Simply put, speakers of languages like Kuuk Thaayorre are much better than English speakers at staying oriented and keeping track of where they are, even in unfamiliar landscapes or inside unfamiliar buildings. What enables them — in fact, forces them — to do this is their language. Having their attention trained in this way equips them to perform navigational feats once thought beyond human capabilities. Because space is such a fundamental domain of thought, differences in how people think about space don't end there. People rely on their spatial knowledge to build other, more complex, more abstract representations. Representations of such things as time, number, musical pitch, kinship relations, morality, and emotions have been shown to depend on how we think about space. So if the Kuuk Thaayorre think differently about space, do they also think differently about other things, like time? This is what my collaborator Alice Gaby and I came to Pormpuraaw to find out.
  • To test this idea, we gave people sets of pictures that showed some kind of temporal progression (e.g., pictures of a man aging, or a crocodile growing, or a banana being eaten). Their job was to arrange the shuffled photos on the ground to show the correct temporal order. We tested each person in two separate sittings, each time facing in a different cardinal direction. If you ask English speakers to do this, they'll arrange the cards so that time proceeds from left to right. Hebrew speakers will tend to lay out the cards from right to left, showing that writing direction in a language plays a role.3 So what about folks like the Kuuk Thaayorre, who don't use words like "left" and "right"? What will they do? The Kuuk Thaayorre did not arrange the cards more often from left to right than from right to left, nor more toward or away from the body. But their arrangements were not random: there was a pattern, just a different one from that of English speakers. Instead of arranging time from left to right, they arranged it from east to west. That is, when they were seated facing south, the cards went left to right. When they faced north, the cards went from right to left. When they faced east, the cards came toward the body and so on. This was true even though we never told any of our subjects which direction they faced. The Kuuk Thaayorre not only knew that already (usually much better than I did), but they also spontaneously used this spatial orientation to construct their representations of time.
  • I have described how languages shape the way we think about space, time, colors, and objects. Other studies have found effects of language on how people construe events, reason about causality, keep track of number, understand material substance, perceive and experience emotion, reason about other people's minds, choose to take risks, and even in the way they choose professions and spouses.8 Taken together, these results show that linguistic processes are pervasive in most fundamental domains of thought, unconsciously shaping us from the nuts and bolts of cognition and perception to our loftiest abstract notions and major life decisions. Language is central to our experience of being human, and the languages we speak profoundly shape the way we think, the way we see the world, the way we live our lives.
  • The fact that even quirks of grammar, such as grammatical gender, can affect our thinking is profound. Such quirks are pervasive in language; gender, for example, applies to all nouns, which means that it is affecting how people think about anything that can be designated by a noun.
  • How does an artist decide whether death, say, or time should be painted as a man or a woman? It turns out that in 85 percent of such personifications, whether a male or female figure is chosen is predicted by the grammatical gender of the word in the artist's native language. So, for example, German painters are more likely to paint death as a man, whereas Russian painters are more likely to paint death as a woman.
  • Does treating chairs as masculine and beds as feminine in the grammar make Russian speakers think of chairs as being more like men and beds as more like women in some way? It turns out that it does. In one study, we asked German and Spanish speakers to describe objects having opposite gender assignment in those two languages. The descriptions they gave differed in a way predicted by grammatical gender. For example, when asked to describe a "key" — a word that is masculine in German and feminine in Spanish — the German speakers were more likely to use words like "hard," "heavy," "jagged," "metal," "serrated," and "useful," whereas Spanish speakers were more likely to say "golden," "intricate," "little," "lovely," "shiny," and "tiny." To describe a "bridge," which is feminine in German and masculine in Spanish, the German speakers said "beautiful," "elegant," "fragile," "peaceful," "pretty," and "slender," and the Spanish speakers said "big," "dangerous," "long," "strong," "sturdy," and "towering." This was true even though all testing was done in English, a language without grammatical gender. The same pattern of results also emerged in entirely nonlinguistic tasks (e.g., rating similarity between pictures). And we can also show that it is aspects of language per se that shape how people think: teaching English speakers new grammatical gender systems influences mental representations of objects in the same way it does with German and Spanish speakers. Apparently even small flukes of grammar, like the seemingly arbitrary assignment of gender to a noun, can have an effect on people's ideas of concrete objects in the world.
  • Even basic aspects of time perception can be affected by language. For example, English speakers prefer to talk about duration in terms of length (e.g., "That was a short talk," "The meeting didn't take long"), while Spanish and Greek speakers prefer to talk about time in terms of amount, relying more on words like "much" "big", and "little" rather than "short" and "long" Our research into such basic cognitive abilities as estimating duration shows that speakers of different languages differ in ways predicted by the patterns of metaphors in their language. (For example, when asked to estimate duration, English speakers are more likely to be confused by distance information, estimating that a line of greater length remains on the test screen for a longer period of time, whereas Greek speakers are more likely to be confused by amount, estimating that a container that is fuller remains longer on the screen.)
  • An important question at this point is: Are these differences caused by language per se or by some other aspect of culture? Of course, the lives of English, Mandarin, Greek, Spanish, and Kuuk Thaayorre speakers differ in a myriad of ways. How do we know that it is language itself that creates these differences in thought and not some other aspect of their respective cultures? One way to answer this question is to teach people new ways of talking and see if that changes the way they think. In our lab, we've taught English speakers different ways of talking about time. In one such study, English speakers were taught to use size metaphors (as in Greek) to describe duration (e.g., a movie is larger than a sneeze), or vertical metaphors (as in Mandarin) to describe event order. Once the English speakers had learned to talk about time in these new ways, their cognitive performance began to resemble that of Greek or Mandarin speakers. This suggests that patterns in a language can indeed play a causal role in constructing how we think.6 In practical terms, it means that when you're learning a new language, you're not simply learning a new way of talking, you are also inadvertently learning a new way of thinking. Beyond abstract or complex domains of thought like space and time, languages also meddle in basic aspects of visual perception — our ability to distinguish colors, for example. Different languages divide up the color continuum differently: some make many more distinctions between colors than others, and the boundaries often don't line up across languages.
  • To test whether differences in color language lead to differences in color perception, we compared Russian and English speakers' ability to discriminate shades of blue. In Russian there is no single word that covers all the colors that English speakers call "blue." Russian makes an obligatory distinction between light blue (goluboy) and dark blue (siniy). Does this distinction mean that siniy blues look more different from goluboy blues to Russian speakers? Indeed, the data say yes. Russian speakers are quicker to distinguish two shades of blue that are called by the different names in Russian (i.e., one being siniy and the other being goluboy) than if the two fall into the same category. For English speakers, all these shades are still designated by the same word, "blue," and there are no comparable differences in reaction time. Further, the Russian advantage disappears when subjects are asked to perform a verbal interference task (reciting a string of digits) while making color judgments but not when they're asked to perform an equally difficult spatial interference task (keeping a novel visual pattern in memory). The disappearance of the advantage when performing a verbal task shows that language is normally involved in even surprisingly basic perceptual judgments — and that it is language per se that creates this difference in perception between Russian and English speakers.
  • What it means for a language to have grammatical gender is that words belonging to different genders get treated differently grammatically and words belonging to the same grammatical gender get treated the same grammatically. Languages can require speakers to change pronouns, adjective and verb endings, possessives, numerals, and so on, depending on the noun's gender. For example, to say something like "my chair was old" in Russian (moy stul bil' stariy), you'd need to make every word in the sentence agree in gender with "chair" (stul), which is masculine in Russian. So you'd use the masculine form of "my," "was," and "old." These are the same forms you'd use in speaking of a biological male, as in "my grandfather was old." If, instead of speaking of a chair, you were speaking of a bed (krovat'), which is feminine in Russian, or about your grandmother, you would use the feminine form of "my," "was," and "old."
  •  
    For a long time, the idea that language might shape thought was considered at best untestable and more often simply wrong. Research in my labs at Stanford University and at MIT has helped reopen this question. We have collected data around the world: from China, Greece, Chile, Indonesia, Russia, and Aboriginal Australia. What we have learned is that people who speak different languages do indeed think differently and that even flukes of grammar can profoundly affect how we see the world. Language is a uniquely human gift, central to our experience of being human. Appreciating its role in constructing our mental lives brings us one step closer to understanding the very nature of humanity.
test and tagging

Be Safe With [e]Safe - 1 views

The welfare of my employees is my number one priority so that I can ensure that they will work productively. That is why when I established my company, I made sure that the equipment to be used are...

test and tagging

started by test and tagging on 15 Dec 11 no follow-up yet
test and tagging

Excellent Test and Tagging in Adelaide - 1 views

I have been looking for a reliable electrical safety specialist to check on my electrical equipment which we have been using in my restaurant in Adelaide. After a week of searching, I finally found...

test and tagging

started by test and tagging on 24 Nov 11 no follow-up yet
Weiye Loh

Rational Irrationality: Do Good Kindergarten Teachers Raise their Pupils' Wages? : The ... - 0 views

  • Columnist David Leonhardt reports the findings of a new study which suggests that children who are fortunate enough to have an unusually good kindergarten teacher can expect to make roughly an extra twenty dollars a week by the age of twenty-seven.
  • it implies that during each school year a good kindergarten teacher creates an additional $320,000 of earnings.
  • The new research (pdf), the work of six economists—four from Harvard, one from Berkeley, and one from Northwestern—upends this finding. It is based on test scores and demographic data from a famous experiment carried out in Tennessee during the late nineteen-eighties, which tracked the progress of about 11,500 students from kindergarten to third grade. Most of these students are now about thirty years old, which means they have been working for up to twelve years. The researchers also gained access to income-tax data and matched it up with the test scores. Their surprising conclusion is that the uplifting effect of a good kindergarten experience, after largely disappearing during a child’s teen years, somehow reappears in the adult workplace. (See Figure 7 in the paper.) Why does this happen? The author don’t say, but Leonhardt offers this explanation: “Good early education can impart skills that last a lifetime—patience, discipline, manners, perseverance. The tests that 5-year-olds take may pick up these skills, even if later multiple-choice tests do not.”
  • ...3 more annotations...
  • However, as I read the story and the findings it is based upon, some questions crept into my mind. I relate them not out of any desire to discredit the study, which is enterprising and newsworthy, but simply as a warning to parents and policymakers not to go overboard.
  • from a dense academic article that hasn’t been published or peer reviewed. At this stage, there isn’t even a working paper detailing how the results were arrived at: just a set of slides. Why is this important? Because economics is a disputatious subject, and surprising empirical findings invariably get challenged by rival groups of researchers. The authors of the paper include two rising stars of the economics profession—Berkeley’s Emmanuel Saez and Harvard’s Raj Chetty—both of whom have reputations for careful and rigorous work. However, many other smart researchers have had their findings overturned. That is how science proceeds. Somebody says something surprising, and others in the field try to knock it down. Sometimes they succeed; sometimes they don’t. Until that Darwinian process is completed, which won’t be for another couple of years, at least, the new findings should be regarded as provisional.
  • A second point, which is related to the first, concerns methodology. In coming up with the $320,000 a year figure for the effects that kindergarten teachers have on adult earnings, the authors make use of complicated statistical techniques, including something called a “jack knife regression.” Such methods are perfectly legitimate and are now used widely in economics, but their application often adds an additional layer of ambiguity to the findings they generate. Is this particular statistical method appropriate for the task at hand? Do other methods generate different results? These are the sorts of question that other researchers will be pursuing.
  •  
    JULY 29, 2010 DO GOOD KINDERGARTEN TEACHERS RAISE THEIR PUPILS' WAGES? Posted by John Cassidy
Weiye Loh

The Mysterious Decline Effect | Wired Science | Wired.com - 0 views

  • Question #1: Does this mean I don’t have to believe in climate change? Me: I’m afraid not. One of the sad ironies of scientific denialism is that we tend to be skeptical of precisely the wrong kind of scientific claims. In poll after poll, Americans have dismissed two of the most robust and widely tested theories of modern science: evolution by natural selection and climate change. These are theories that have been verified in thousands of different ways by thousands of different scientists working in many different fields. (This doesn’t mean, of course, that such theories won’t change or get modified – the strength of science is that nothing is settled.) Instead of wasting public debate on creationism or the rhetoric of Senator Inhofe, I wish we’d spend more time considering the value of spinal fusion surgery, or second generation antipsychotics, or the verity of the latest gene association study. The larger point is that we need to be a better job of considering the context behind every claim. In 1952, the Harvard philosopher Willard Von Orman published “The Two Dogmas of Empiricism.” In the essay, Quine compared the truths of science to a spider’s web, in which the strength of the lattice depends upon its interconnectedness. (Quine: “The unit of empirical significance is the whole of science.”) One of the implications of Quine’s paper is that, when evaluating the power of a given study, we need to also consider the other studies and untested assumptions that it depends upon. Don’t just fixate on the effect size – look at the web. Unfortunately for the denialists, climate change and natural selection have very sturdy webs.
  • biases are not fraud. We sometimes forget that science is a human pursuit, mingled with all of our flaws and failings. (Perhaps that explains why an episode like Climategate gets so much attention.) If there’s a single theme that runs through the article it’s that finding the truth is really hard. It’s hard because reality is complicated, shaped by a surreal excess of variables. But it’s also hard because scientists aren’t robots: the act of observation is simultaneously an act of interpretation.
  • (As Paul Simon sang, “A man sees what he wants to see and disregards the rest.”) Most of the time, these distortions are unconscious – we don’t know even we are misperceiving the data. However, even when the distortion is intentional it’s still rarely rises to the level of outright fraud. Consider the story of Mike Rossner. He’s executive director of the Rockefeller University Press, and helps oversee several scientific publications, including The Journal of Cell Biology.  In 2002, while trying to format a scientific image in Photoshop that was going to appear in one of the journals, Rossner noticed that the background of the image contained distinct intensities of pixels. “That’s a hallmark of image manipulation,” Rossner told me. “It means the scientist has gone in and deliberately changed what the data looks like. What’s disturbing is just how easy this is to do.” This led Rossner and his colleagues to begin analyzing every image in every accepted paper. They soon discovered that approximately 25 percent of all papers contained at least one “inappropriately manipulated” picture. Interestingly, the vast, vast majority of these manipulations (~99 percent) didn’t affect the interpretation of the results. Instead, the scientists seemed to be photoshopping the pictures for aesthetic reasons: perhaps a line on a gel was erased, or a background blur was deleted, or the contrast was exaggerated. In other words, they wanted to publish pretty images. That’s a perfectly understandable desire, but it gets problematic when that same basic instinct – we want our data to be neat, our pictures to be clean, our charts to be clear – is transposed across the entire scientific process.
  • ...2 more annotations...
  • One of the philosophy papers that I kept on thinking about while writing the article was Nancy Cartwright’s essay “Do the Laws of Physics State the Facts?” Cartwright used numerous examples from modern physics to argue that there is often a basic trade-off between scientific “truth” and experimental validity, so that the laws that are the most true are also the most useless. “Despite their great explanatory power, these laws [such as gravity] do not describe reality,” Cartwright writes. “Instead, fundamental laws describe highly idealized objects in models.”  The problem, of course, is that experiments don’t test models. They test reality.
  • Cartwright’s larger point is that many essential scientific theories – those laws that explain things – are not actually provable, at least in the conventional sense. This doesn’t mean that gravity isn’t true or real. There is, perhaps, no truer idea in all of science. (Feynman famously referred to gravity as the “greatest generalization achieved by the human mind.”) Instead, what the anomalies of physics demonstrate is that there is no single test that can define the truth. Although we often pretend that experiments and peer-review and clinical trials settle the truth for us – that we are mere passive observers, dutifully recording the results – the actuality of science is a lot messier than that. Richard Rorty said it best: “To say that we should drop the idea of truth as out there waiting to be discovered is not to say that we have discovered that, out there, there is no truth.” Of course, the very fact that the facts aren’t obvious, that the truth isn’t “waiting to be discovered,” means that science is intensely human. It requires us to look, to search, to plead with nature for an answer.
Weiye Loh

Rationally Speaking: A new eugenics? - 0 views

  • an interesting article I read recently, penned by Julian Savulescu for the Practical Ethics blog.
  • Savulescu discusses an ongoing controversy in Germany about genetic testing of human embryos. The Leopoldina, Germany’s equivalent of the National Academy of Sciences, has recommended genetic testing of pre-implant embryos, to screen for serious and incurable defects. The German Chancellor, Angela Merkel, has agreed to allow a parliamentary vote on this issue, but also said that she personally supports a ban on this type of testing. Her fear is that the testing would quickly lead to “designer babies,” i.e. to parents making choices about their unborn offspring based not on knowledge about serious disease, but simply because they happen to prefer a particular height or eye color.
  • He infers from Merkel’s comments (and many similar others) that people tend to think of selecting traits like eye color as eugenics, while acting to avoid incurable disease is not considered eugenics. He argues that this is exactly wrong: eugenics, as he points out, means “well born,” so eugenicists have historically been concerned with eliminating traits that would harm society (Wendell Holmes’ “three generation of imbeciles”), not with simple aesthetic choices. As Savulescu puts it: “[eugenics] is selecting embryos which are better, in this context, have better lives. Being healthy rather than sick is ‘better.’ Having blond hair and blue eyes is not in any plausible sense ‘better,’ even if people mistakenly think so.”
  • ...9 more annotations...
  • And there is another, related aspect of discussions about eugenics that should be at the forefront of our consideration: what was particularly objectionable about American and Nazi early 20th century eugenics is that the state, not individuals, were to make decisions about who could reproduce and who couldn’t. Savulescu continues: “to grant procreative liberty is the only way to avoid the objectionable form of eugenics that the Nazis practiced.” In other words, it makes all the difference in the world if it is an individual couple who decides to have or not have a baby, or if it is the state that imposes a particular reproductive choice on its citizenry.
  • but then Savulescu expands his argument to a point where I begin to feel somewhat uncomfortable. He says: “[procreative liberty] involves the freedom to choose a child with red hair or blond hair or no hair.”
  • Savulescu has suddenly sneaked into his argument for procreative liberty the assumption that all choices in this area are on the same level. But while it is hard to object to action aimed at avoiding devastating diseases, it is not quite so obvious to me what arguments favor the idea of designer babies. The first intervention can be justified, for instance, on consequentialist grounds because it reduces the pain and suffering of both the child and the parents. The second intervention is analogous to shopping for a new bag, or a new car, which means that it commodifies the act of conceiving a baby, thus degrading its importance. I’m not saying that that in itself is sufficient to make it illegal, but the ethics of it is different, and that difference cannot simply be swept under the broad rug of “procreative liberty.”
  • designing babies is to treat them as objects, not as human beings, and there are a couple of strong philosophical traditions in ethics that go squarely against that (I’m thinking, obviously, of Kant’s categorical imperative, as well as of virtue ethics; not sure what a consequentialist would say about this, probably she would remain neutral on the issue).
  • Commodification of human beings has historically produced all sorts of bad stuff, from slavery to exploitative prostitution, and arguably to war (after all, we are using our soldiers as means to gain access to power, resources, territory, etc.)
  • And of course, there is the issue of access. Across-the-board “procreative liberty” of the type envisioned by Savulescu will cost money because it requires considerable resources.
  • imagine that these parents decide to purchase the ability to produce babies that have the type of characteristics that will make them more successful in society: taller, more handsome, blue eyed, blonde, more symmetrical, whatever. We have just created yet another way for the privileged to augment and pass their privileges to the next generation — in this case literally through their genes, not just as real estate or bank accounts. That would quickly lead to an even further divide between the haves and the have-nots, more inequality, more injustice, possibly, in the long run, even two different species (why not design your babies so that they can’t breed with certain types of undesirables, for instance?). Is that the sort of society that Savulescu is willing to envision in the name of his total procreative liberty? That begins to sounds like the libertarian version of the eugenic ideal, something potentially only slightly less nightmarish than the early 20th century original.
  • Rich people already have better choices when it comes to their babies. Taller and richer men can choose between more attractive and physically fit women and attractive women can choose between more physically fit and rich men. So it is reasonable to conclude that on average rich and attractive people already have more options when it comes to their offspring. Moreover no one is questioning their right to do so and this is based on a respect for a basic instinct which we all have and which is exactly why these people would choose to have a DB. Is it fair for someone to be tall because his daddy was rich and married a supermodel but not because his daddy was rich and had his DNA resequenced? Is it former good because its natural and the latter bad because its not? This isn't at all obvious to me.
  • Not to mention that rich people can provide better health care, education and nutrition to their children and again no one is questioning their right to do so. Wouldn't a couple of inches be pretty negligible compared to getting into a good school? Aren't we applying double standards by objecting to this issue alone? Do we really live in a society that values equal opportunities? People (may) be equal before the law but they are not equal to each other and each one of us is tacitly accepting that fact when we acknowledge the social hierarchy (in other words, every time we interact with someone who is our superior). I am not crazy about this fact but that's just how people are and this has to be taken into account when discussing this.
Weiye Loh

Google's Marissa Mayer Assaults Designers With Data | Designerati | Fast Company - 0 views

  • The irony was not lost on anyone in attendance at AIGA's national conference in Memphis last weekend. Marissa Mayer, "keeper" of the Google homepage since 1998, walked into a room filled with over 1,200 mostly graphic designers to talk about how well design worked at the design-dismissive Google. She even had the charts and graphs of user-tested research to prove it, she said.
  • In an almost robotic delivery, Mayer acknowledged that design was never the primary concern when developing the site. When she mentioned to founder Sergey Brin that he might want to do something to spiff up the brand-new homepage for users, his response was uncomfortably eloquent: "I don't do HTML."
  • About the now-notorious claim that she once tested 41 shades of blue? All true. Turns out Google was using two different colors of blue, one on the homepage, one on the Gmail page. To find out which was more effective so they could standardize it across the system, they tested an imperceptible range of blues between the two. The winning color, according to dozens of charts and graphs, was not too green, not too red.
  • ...1 more annotation...
  • This kind of over-analytical testing was exactly why designer Doug Bowman made a very public break from Google earlier this year. "I had a recent debate over whether a border should be 3, 4, or 5 pixels wide and was asked to prove my case," he wrote in a post after his departure. Maybe he couldn't, but someone won a recent battle to widen the search box by a few pixels, the most major change for the homepage in quite some time.
  •  
    I don't really know where this fits but I find this really amusing. The article is about how Google uses data, very specific data to determine their designs, almost to the point of being anal (to me). I wonder if this is what it means by challenging forth the nature (human mind) to reveal.
Weiye Loh

gssq: Rational and Irrational Thought: The Thinking That IQ Tests Miss - 0 views

  • When approaching a problem, we can choose from any of several cognitive mechanisms. Some mechanisms have great computational power, letting us solve many problems with great accuracy, but they are slow, require much concentration and can interfere with other cognitive tasks. Others are comparatively low in computational power, but they are fast, require little concentration and do not interfere with other ongoing cognition. Humans are cognitive misers because our basic tendency is to default to the processing mechanisms that require less computational effort, even if they are less accurate.
  • our tendency to evaluate a situation from our own perspective. We weigh evidence and make moral judgments with a my-side bias that often leads to dysrationalia that is independent of measured intelligence. The same is true for other tendencies of the cognitive miser that have been much studied, such as attribute substitution and conjunction errors; they are at best only slightly related to intelligence and are poorly captured by conventional intelligence tests.
  •  
    No doubt you know several folks with perfectly respectable IQs who just don't seem all that sharp. The behavior of such people tells us that we are missing something important by treating intelligence as if it encompassed all cognitive abilities. I coined the term dysrationalia (analogous to "dyslexia"), meaning the inability to think and behave rationally despite having adequate intelligence, to draw attention to a large domain of cognitive life that intelligence tests fail to assess.
Weiye Loh

"Cancer by the Numbers" by John Allen Paulos | Project Syndicate - 0 views

  • The USPSTF recently issued an even sharper warning about the prostate-specific antigen test for prostate cancer, after concluding that the test’s harms outweigh its benefits. Chest X-rays for lung cancer and Pap tests for cervical cancer have received similar, albeit less definitive, criticism.CommentsView/Create comment on this paragraphThe next step in the reevaluation of cancer screening was taken last year, when researchers at the Dartmouth Institute for Health Policy announced that the costs of screening for breast cancer were often minimized, and that the benefits were much exaggerated. Indeed, even a mammogram (almost 40 million are given annually in the US) that detects a cancer does not necessarily save a life.CommentsView/Create comment on this paragraphThe Dartmouth researchers found that, of the estimated 138,000 breast cancers detected annually in the US, the test did not help 120,000-134,000 of the afflicted women. The cancers either were growing so slowly that they did not pose a problem, or they would have been treated successfully if discovered clinically later (or they were so aggressive that little could be done).
Weiye Loh

Digital Domain - Computers at Home - Educational Hope vs. Teenage Reality - NYTimes.com - 0 views

  • MIDDLE SCHOOL students are champion time-wasters. And the personal computer may be the ultimate time-wasting appliance.
  • there is an automatic inclination to think of the machine in its most idealized form, as the Great Equalizer. In developing countries, computers are outfitted with grand educational hopes, like those that animate the One Laptop Per Child initiative, which was examined in this space in April.
  • Economists are trying to measure a home computer’s educational impact on schoolchildren in low-income households. Taking widely varying routes, they are arriving at similar conclusions: little or no educational benefit is found. Worse, computers seem to have further separated children in low-income households, whose test scores often decline after the machine arrives, from their more privileged counterparts.
  • ...5 more annotations...
  • Professor Malamud and his collaborator, Cristian Pop-Eleches, an assistant professor of economics at Columbia University, did their field work in Romania in 2009, where the government invited low-income families to apply for vouchers worth 200 euros (then about $300) that could be used for buying a home computer. The program provided a control group: the families who applied but did not receive a voucher.
  • the professors report finding “strong evidence that children in households who won a voucher received significantly lower school grades in math, English and Romanian.” The principal positive effect on the students was improved computer skills.
  • few children whose families obtained computers said they used the machines for homework. What they were used for — daily — was playing games.
  • negative effect on test scores was not universal, but was largely confined to lower-income households, in which, the authors hypothesized, parental supervision might be spottier, giving students greater opportunity to use the computer for entertainment unrelated to homework and reducing the amount of time spent studying.
  • The North Carolina study suggests the disconcerting possibility that home computers and Internet access have such a negative effect only on some groups and end up widening achievement gaps between socioeconomic groups. The expansion of broadband service was associated with a pronounced drop in test scores for black students in both reading and math, but no effect on the math scores and little on the reading scores of other students.
  •  
    Computers at Home: Educational Hope vs. Teenage Reality By RANDALL STROSS Published: July 9, 2010
Weiye Loh

The American Spectator : Can't Live With Them… - 1 views

  • ommentators have repeatedly told us in recent years that the gap between rich and poor has been widening. It is true, if you compare the income of those in the top fifth of earners with the income of those in the bottom fifth, that the spread between them increased between 1996 and 2005. But, as Sowell points out, this frequently cited figure is not counting the same people. If you look at individual taxpayers, Sowell notes, those who happened to be in the bottom fifth in 1996 saw their incomes nearly double over the decade, while those who happened to be in the top fifth in 1995 saw gains of only 10 percent on average and those in the top 5 percent actually experienced decline in their incomes. Similar distortions are perpetrated by those bewailing "stagnation" in average household incomes -- without taking into account that households have been getting smaller, as rising wealth allows people to move out of large family homes.
  • Sometimes the distortion seems to be deliberate. Sowell gives the example of an ABC news report in the 1980s focusing on five states where "unemployment is most severe" -- without mentioning that unemployment was actually declining in all the other 45 states. Sometimes there seems to be willful incomprehension. Journalists have earnestly reported that "prisons are ineffective" because two-thirds of prisoners are rearrested within three years of their release. As Sowell comments: "By this kind of reasoning, food is ineffective as a response to hunger because it is only a matter of time after eating before you get hungry again. Like many other things, incarceration only works when it is done."
  • why do intellectuals often seem so lacking in common sense? Sowell thinks it goes with the job-literally: He defines "intellectuals" as "an occupational category [Sowell's emphasis], people whose occupations deal primarily with ideas -- writers, academics and the like." Medical researchers or engineers or even "financial wizards" may apply specialized knowledge in ways that require great intellectual skill, but that does not make them "intellectuals," in Sowell's view: "An intellectual's work begins and ends with ideas [Sowell's emphasis]." So an engineer "is ruined" if his bridges or buildings collapse and so with a financier who "goes broke… the proof of the pudding is ultimately in the eating…. but the ultimate test of a [literary] deconstructionist's ideas is whether other deconstructionists find those ideas interesting, original, persuasive, elegant or ingenious. There is no external test." The ideas dispensed by intellectuals aren't subject to "external" checks or exposed to the test of "verifiability" (apart from what "like-minded individuals" find "plausible") and so intellectuals are not really "accountable" in the same way as people in other occupations.
  • ...7 more annotations...
  • it is not quite true, even among tenured professors in the humanities, that idea-mongers can entirely ignore "external" checks. Even academics want to be respectable, which means they can't entirely ignore the realities that others notice. There were lots of academics talking about the achievements of socialism in the 1970s (I can remember them) but very few talking that way after China and Russia repudiated these fantasies.
  • THE MOST DISTORTING ASPECT of Sowell's account is that, in focusing so much on the delusions of intellectuals, he leaves us more confused about what motivates the rest of society. In a characteristic passage, Sowell protests that "intellectuals...have sought to replace the groups into which people have sorted themselves with groupings created and imposed by the intelligentsia. Ties of family, religion, and patriotism, for example, have long been rated as suspect or detrimental by the intelligentsia, and new ties that intellectuals have created, such as class -- and more recently 'gender' -- have been projected as either more real or more important."
  • There's no disputing the claim that most "intellectuals" -- surely most professors in the humanities-are down on "patriotism" and "religion" and probably even "family." But how did people get to be patriotic and religious in the first place? In Sowell's account, they just "sorted themselves" -- as if by the invisible hand of the market.
  • Let's put aside all the violence and intimidation that went into building so many nations and so many faiths in the past. What is it, even today, that makes people revere this country (or some other); what makes people adhere to a particular faith or church? Don't inspiring words often move people? And those who arrange these words -- aren't they doing something similar to what Sowell says intellectuals do? Is it really true, when it comes to embracing national or religious loyalties, that "the proof of the pudding is in the eating"?
  • Even when it comes to commercial products, people don't always want to be guided by mundane considerations of reliable performance. People like glamour, prestige, associations between the product and things they otherwise admire. That's why companies spend so much on advertising. And that's part of the reason people are willing to pay more for brand names -- to enjoy the associations generated by advertising. Even advertising plays on assumptions about what is admirable and enticing-assumptions that may change from decade to decade, as background opinions change. How many products now flaunt themselves as "green" -- and how many did so 20 years ago?
  • If we closed down universities and stopped subsidizing intellectual publications, would people really judge every proposed policy by external results? Intellectuals tend to see what they expect to see, as Sowell's examples show -- but that's true of almost everyone. We have background notions about how the world works that help us make sense of what we experience. We might have distorted and confused notions, but we don't just perceive isolated facts. People can improve in their understanding, developing background understandings that are more defined or more reliable. That's part of what makes people interested in the ideas of intellectuals -- the hope of improving their own understanding.
  • On Sowell's account, we wouldn't need the contributions of a Friedrich Hayek -- or a Thomas Sowell -- if we didn't have so many intellectuals peddling so many wrong-headed ideas. But the wealthier the society, the more it liberates individuals to make different choices and the more it can afford to indulge even wasteful or foolish choices. I'd say that means not that we have less need of intellectuals, but more need of better ones. 
Weiye Loh

What Is Skepticism? Week 3: Skepticism vs. Denial « Skepticism « Critical Thi... - 0 views

  • Everyone is a skeptic nowadays, or so it seems. From climate change to evolution to vaccination, large proportions of the population claim to be skeptical about many of the claims of mainstream science. So why are we, member of the skeptical community, not rejoicing?
  • A skeptic, in popular discourse, is simply someone who denies a particular claim. But true skepticism, as espoused by philosophers and scientists for millenia, is more an intellectual attitude than a position on a specific issue. A skeptic is someone who always demands sufficient evidence or reasons before accepting a claim. This skeptical attitude – its opposite is credulity – leads skeptics to reject as unfounded any claim that cannot withstand the rigours of the scientific method, which includes controlled experimental testing. The more extraordinary the claim, the more rigourously it must be tested before a skeptic will be willing to accept
  • skepticism does not always lead to denial. Extraordinary claims require extraordinary evidence, but sometimes that extraordinary evidence can be provided. Einstein’s theory of relativity, which holds that matter can change the very shape of space and time, is an extraordinary claim, yet it has stood up to the most demanding of scientific testing.
  • ...1 more annotation...
  • let us turn to the climate change “skeptics”. Are they just being more demanding than us in their skepticism? After all, nothing in science is ever certain; some room for doubt always exists. For that doubt to warrant disbelief in the face of all the positive evidence, however, skeptics would require significant contrary evidence, or a plausible alternative theory which fit the data. But climate change deniers have not provided any such evidence or theory (theories involving variations in solar activity simply don’t fit the data). Nor have they shown significant inclination to provide such evidence, generally being content to gesture frantically at any minor mistake, no matter how irrelevant, in the climate change literature. In fact, in denying climate change, these “skeptics” find themselves committed to claims no less extraordinary than the ones they deny, yet with far less evidence.
  •  
    Skepticism vs. Denial
Weiye Loh

Why Kindergarten-Admission Tests Are Worthless -- New York Magazine - 0 views

  •  
    Should a child's fate be sealed by an exam he takes at the age of 4? Why kindergarten-admission tests are worthless, at best.
1 - 20 of 101 Next › Last »
Showing 20 items per page