Skip to main content

Home/ New Media Ethics 2009 course/ Group items tagged Measurement

Rss Feed Group items tagged

Weiye Loh

Measuring the Unmeasurable (Internet) and Why It Matters « Gurstein's Communi... - 0 views

  • it appears that there is a quite significant hole in the National Accounting (and thus the GDP statistics) around Internet related activities since most of this accounting is concerned with measuring the production and distribution of tangible products and the associated services. For the most part the available numbers don’t include many Internet (or “social capital” e.g. in health and education) related activities as they are linked to intangible outputs. The significance of not including social capital components in the GDP has been widely discussed elsewhere. The significance (and potential remediation) of the absence of much of the Internet related activities was the subject of the workshop.
  • there had been a series of critiques of GDP statistics from Civil Society (CS) over the last few years—each associated with a CS “movements—the Woman’s Movement and the absence of measurement of “women’s (and particularly domestic) work”; the Environmental Movement and the absence of the longer term and environmental costs of the production of the goods that the GDP so blithely counts as a measure of national economic well-being; and most recently with the Sustainability Movement, and the absence of measures reflective of the longer term negative effects/costs of resource depletion and environmental degradation. What I didn’t see anywhere apart from the background discussions to the OECD workshop itself were critiques reflecting issues related to the Internet or ICTs.
  • the implications of the limitations in the Internet accounting went beyond a simple technical glitch and had potentially quite profound implications from a national policy and particularly a CS and community based development perspective. The possible distortions in economic measurement arising from the absence of Internet associated numbers in the SNA (there may be some $750 BILLION a year in “value’ being generated by Internet based search alone!) lead to the very real possibility that macro-economic analysis and related policy making may be operating on the basis of inadequate and even fallacious assumptions.
  • ...2 more annotations...
  • perhaps of greatest significance from the perspective of Civil Society and of communities is the overall absence of measurement and thus inclusion in the economic accounting of the value of the contributions provided to, through and on the Internet of various voluntary and not-for-profit initiatives and activities. Thus for example, the millions of hours of labour contributed to Wikipedia, or to the development of Free or Open Source software, or to providing support for public Internet access and training is not included as a net contribution or benefit to the economy (as measured through the GDP). Rather, this is measured as a negative effect since, as some would argue, those who are making this contribution could be using their time and talents in more “productive” (and “economically measurable”) activities. Thus for example, a region or country that chooses to go with free or open source software as the basis for its in-school computing is not only “not contributing to ‘economic well being’” it is “statistically” a “cost” to the economy since it is not allowing for expenditures on, for example, suites of Microsoft products.
  • there appears to have been no systematic attention paid to the relationship of the activities and growth of voluntary contributions to the Internet and the volume, range and depth of Internet activity, digital literacy and economic value being derived from the use of the Internet.
kenneth yang

SD ballot measure would ease restrictions on stem cell research - 1 views

PIERRE, S.D. (AP) - A proposed ballot issue to ease restrictions on stem cell research will strike a chord with South Dakotans because nearly everyone has had a serious disease or knows someone who...

ethics rights stem cell

started by kenneth yang on 21 Oct 09 no follow-up yet
Weiye Loh

Jonathan Stray » Measuring and improving accuracy in journalism - 0 views

  • Accuracy is a hard thing to measure because it’s a hard thing to define. There are subjective and objective errors, and no standard way of determining whether a reported fact is true or false
  • The last big study of mainstream reporting accuracy found errors (defined below) in 59% of 4,800 stories across 14 metro newspapers. This level of inaccuracy — where about one in every two articles contains an error — has persisted for as long as news accuracy has been studied, over seven decades now.
  • With the explosion of available information, more than ever it’s time to get serious about accuracy, about knowing which sources can be trusted. Fortunately, there are emerging techniques that might help us to measure media accuracy cheaply, and then increase it.
  • ...7 more annotations...
  • We could continuously sample a news source’s output to produce ongoing accuracy estimates, and build social software to help the audience report and filter errors. Meticulously applied, this approach would give a measure of the accuracy of each information source, and a measure of the efficiency of their corrections process (currently only about 3% of all errors are corrected.)
  • Real world reporting isn’t always clearly “right” or “wrong,” so it will often be hard to decide whether something is an error or not. But we’re not going for ultimate Truth here,  just a general way of measuring some important aspect of the idea we call “accuracy.” In practice it’s important that the error counting method is simple, clear and repeatable, so that you can compare error rates of different times and sources.
  • Subjective errors, though by definition involving judgment, should not be dismissed as merely differences in opinion. Sources found such errors to be about as common as factual errors and often more egregious [as rated by the sources.] But subjective errors are a very complex category
  • One of the major problems with previous news accuracy metrics is the effort and time required to produce them. In short, existing accuracy measurement methods are expensive and slow. I’ve been wondering if we can do better, and a simple idea comes to mind: sampling. The core idea is this: news sources could take an ongoing random sample of their output and check it for accuracy — a fact check spot check
  • Standard statistical theory tells us what the error on that estimate will be for any given number of samples (If I’ve got this right, the relevant formula is standard error of a population proportion estimate without replacement.) At a sample rate of a few stories per day, daily estimates of error rate won’t be worth much. But weekly and monthly aggregates will start to produce useful accuracy estimates
  • the first step would be admitting how inaccurate journalism has historically been. Then we have to come up with standardized accuracy evaluation procedures, in pursuit of metrics that capture enough of what we mean by “true” to be worth optimizing. Meanwhile, we can ramp up the efficiency of our online corrections processes until we find as many useful, legitimate errors as possible with as little staff time as possible. It might also be possible do data mining on types of errors and types of stories to figure out if there are patterns in how an organization fails to get facts right.
  • I’d love to live in a world where I could compare the accuracy of information sources, where errors got found and fixed with crowd-sourced ease, and where news organizations weren’t shy about telling me what they did and did not know. Basic factual accuracy is far from the only measure of good journalism, but perhaps it’s an improvement over the current sad state of affairs
  •  
    Professional journalism is supposed to be "factual," "accurate," or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem like important questions, so I've been poking around, trying to figure out what we know and don't know about the accuracy of our news sources. Meanwhile, the online news corrections process continues to evolve, which gives us hope that the news will become more accurate in the future.
Weiye Loh

The Matthew Effect § SEEDMAGAZINE.COM - 0 views

  • For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away. —Matthew 25:29
  • Sociologist Robert K. Merton was the first to publish a paper on the similarity between this phrase in the Gospel of Matthew and the realities of how scientific research is rewarded
  • Even if two researchers do similar work, the most eminent of the pair will get more acclaim, Merton observed—more praise within the community, more or better job offers, better opportunities. And it goes without saying that even if a graduate student publishes stellar work in a prestigious journal, their well-known advisor is likely to get more of the credit. 
  • ...7 more annotations...
  • Merton published his theory, called the “Matthew Effect,” in 1968. At that time, the average age of a biomedical researcher in the US receiving his or her first significant funding was 35 or younger. That meant that researchers who had little in terms of fame (at 35, they would have completed a PhD and a post-doc and would be just starting out on their own) could still get funded if they wrote interesting proposals. So Merton’s observation about getting credit for one’s work, however true in terms of prestige, wasn’t adversely affecting the funding of new ideas.
  • Over the last 40 years, the importance of fame in science has increased. The effect has compounded because famous researchers have gathered the smartest and most ambitious graduate students and post-docs around them, so that each notable paper from a high-wattage group bootstraps their collective power. The famous grow more famous, and the younger researchers in their coterie are able to use that fame to their benefit. The effect of this concentration of power has finally trickled down to the level of funding: The average age on first receipt of the most common “starter” grants at the NIH is now almost 42. This means younger researchers without the strength of a fame-based community are cut out of the funding process, and their ideas, separate from an older researcher’s sphere of influence, don’t get pursued. This causes a founder effect in modern science, where the prestigious few dictate the direction of research. It’s not only unfair—it’s also actively dangerous to science’s progress.
  • How can we fund science in a way that is fair? By judging researchers independently of their fame—in other words, not by how many times their papers have been cited. By judging them instead via new measures, measures that until recently have been too ephemeral to use.
  • Right now, the gold standard worldwide for measuring a scientist’s worth is the number of times his or her papers are cited, along with the importance of the journal where the papers were published. Decisions of funding, faculty positions, and eminence in the field all derive from a scientist’s citation history. But relying on these measures entrenches the Matthew Effect: Even when the lead author is a graduate student, the majority of the credit accrues to the much older principal investigator. And an influential lab can inflate its citations by referring to its own work in papers that themselves go on to be heavy-hitters.
  • what is most profoundly unbalanced about relying on citations is that the paper-based metric distorts the reality of the scientific enterprise. Scientists make data points, narratives, research tools, inventions, pictures, sounds, videos, and more. Journal articles are a compressed and heavily edited version of what happens in the lab.
  • We have the capacity to measure the quality of a scientist across multiple dimensions, not just in terms of papers and citations. Was the scientist’s data online? Was it comprehensible? Can I replicate the results? Run the code? Access the research tools? Use them to write a new paper? What ideas were examined and discarded along the way, so that I might know the reality of the research? What is the impact of the scientist as an individual, rather than the impact of the paper he or she wrote? When we can see the scientist as a whole, we’re less prone to relying on reputation alone to assess merit.
  • Multidimensionality is one of the only counters to the Matthew Effect we have available. In forums where this kind of meritocracy prevails over seniority, like Linux or Wikipedia, the Matthew Effect is much less pronounced. And we have the capacity to measure each of these individual factors of a scientist’s work, using the basic discourse of the Web: the blog, the wiki, the comment, the trackback. We can find out who is talented in a lab, not just who was smart enough to hire that talent. As we develop the ability to measure multiple dimensions of scientific knowledge creation, dissemination, and re-use, we open up a new way to recognize excellence. What we can measure, we can value.
  •  
    WHEN IT COMES TO SCIENTIFIC PUBLISHING AND FAME, THE RICH GET RICHER AND THE POOR GET POORER. HOW CAN WE BREAK THIS FEEDBACK LOOP?
Weiye Loh

Measuring Social Media: Who Has Access to the Firehose? - 0 views

  • The question that the audience member asked — and one that we tried to touch on a bit in the panel itself — was who has access to this raw data. Twitter doesn’t comment on who has full access to its firehose, but to Weil’s credit he was at least forthcoming with some of the names, including stalwarts like Microsoft, Google and Yahoo — plus a number of smaller companies.
  • In the case of Twitter, the company offers free access to its API for developers. The API can provide access and insight into information about tweets, replies and keyword searches, but as developers who work with Twitter — or any large scale social network — know, that data isn’t always 100% reliable. Unreliable data is a problem when talking about measurements and analytics, where the data is helping to influence decisions related to social media marketing strategies and allocations of resources.
  • One of the companies that has access to Twitter’s data firehose is Gnip. As we discussed in November, Twitter has entered into a partnership with Gnip that allows the social data provider to resell access to the Twitter firehose.This is great on one level, because it means that businesses and services can access the data. The problem, as noted by panelist Raj Kadam, the CEO of Viralheat, is that Gnip’s access can be prohibitively expensive.
  • ...3 more annotations...
  • The problems with reliable access to analytics and measurement information is by no means limited to Twitter. Facebook data is also tightly controlled. With Facebook, privacy controls built into the API are designed to prevent mass data scraping. This is absolutely the right decision. However, a reality of social media measurement is that Facebook Insights isn’t always reachable and the data collected from the tool is sometimes inaccurate.It’s no surprise there’s a disconnect between the data that marketers and community managers want and the data that can be reliably accessed. Twitter and Facebook were both designed as tools for consumers. It’s only been in the last two years that the platform ecosystem aimed at serving large brands and companies
  • The data that companies like Twitter, Facebook and Foursquare collect are some of their most valuable assets. It isn’t fair to expect a free ride or first-class access to the data by anyone who wants it.Having said that, more transparency about what data is available to services and brands is needed and necessary.We’re just scraping the service of what social media monitoring, measurement and management tools can do. To get to the next level, it’s important that we all question who has access to the firehose.
  • We Need More Transparency for How to Access and Connect with Data
Weiye Loh

Science, Strong Inference -- Proper Scientific Method - 0 views

  • Scientists these days tend to keep up a polite fiction that all science is equal. Except for the work of the misguided opponent whose arguments we happen to be refuting at the time, we speak as though every scientist's field and methods of study are as good as every other scientist's and perhaps a little better. This keeps us all cordial when it comes to recommending each other for government grants.
  • Why should there be such rapid advances in some fields and not in others? I think the usual explanations that we tend to think of - such as the tractability of the subject, or the quality or education of the men drawn into it, or the size of research contracts - are important but inadequate. I have begun to believe that the primary factor in scientific advance is an intellectual one. These rapidly moving fields are fields where a particular method of doing scientific research is systematically used and taught, an accumulative method of inductive inference that is so effective that I think it should be given the name of "strong inference." I believe it is important to examine this method, its use and history and rationale, and to see whether other groups and individuals might learn to adopt it profitably in their own scientific and intellectual work. In its separate elements, strong inference is just the simple and old-fashioned method of inductive inference that goes back to Francis Bacon. The steps are familiar to every college student and are practiced, off and on, by every scientist. The difference comes in their systematic application. Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly: Devising alternative hypotheses; Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly is possible, exclude one or more of the hypotheses; Carrying out the experiment so as to get a clean result; Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain, and so on.
  • On any new problem, of course, inductive inference is not as simple and certain as deduction, because it involves reaching out into the unknown. Steps 1 and 2 require intellectual inventions, which must be cleverly chosen so that hypothesis, experiment, outcome, and exclusion will be related in a rigorous syllogism; and the question of how to generate such inventions is one which has been extensively discussed elsewhere (2, 3). What the formal schema reminds us to do is to try to make these inventions, to take the next step, to proceed to the next fork, without dawdling or getting tied up in irrelevancies.
  • ...28 more annotations...
  • It is clear why this makes for rapid and powerful progress. For exploring the unknown, there is no faster method; this is the minimum sequence of steps. Any conclusion that is not an exclusion is insecure and must be rechecked. Any delay in recycling to the next set of hypotheses is only a delay. Strong inference, and the logical tree it generates, are to inductive reasoning what the syllogism is to deductive reasoning in that it offers a regular method for reaching firm inductive conclusions one after the other as rapidly as possible.
  • "But what is so novel about this?" someone will say. This is the method of science and always has been, why give it a special name? The reason is that many of us have almost forgotten it. Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis? We may write our scientific papers so that it looks as if we had steps 1, 2, and 3 in mind all along. But in between, we do busywork. We become "method- oriented" rather than "problem-oriented." We say we prefer to "feel our way" toward generalizations. We fail to teach our students how to sharpen up their inductive inferences. And we do not realize the added power that the regular and explicit use of alternative hypothesis and sharp exclusion could give us at every step of our research.
  • A distinguished cell biologist rose and said, "No two cells give the same properties. Biology is the science of heterogeneous systems." And he added privately. "You know there are scientists, and there are people in science who are just working with these over-simplified model systems - DNA chains and in vitro systems - who are not doing science at all. We need their auxiliary work: they build apparatus, they make minor studies, but they are not scientists." To which Cy Levinthal replied: "Well, there are two kinds of biologists, those who are looking to see if there is one thing that can be understood and those who keep saying it is very complicated and that nothing can be understood. . . . You must study the simplest system you think has the properties you are interested in."
  • At the 1958 Conference on Biophysics, at Boulder, there was a dramatic confrontation between the two points of view. Leo Szilard said: "The problems of how enzymes are induced, of how proteins are synthesized, of how antibodies are formed, are closer to solution than is generally believed. If you do stupid experiments, and finish one a year, it can take 50 years. But if you stop doing experiments for a little while and think how proteins can possibly be synthesized, there are only about 5 different ways, not 50! And it will take only a few experiments to distinguish these." One of the young men added: "It is essentially the old question: How small and elegant an experiment can you perform?" These comments upset a number of those present. An electron microscopist said. "Gentlemen, this is off the track. This is philosophy of science." Szilard retorted. "I was not quarreling with third-rate scientists: I was quarreling with first-rate scientists."
  • Any criticism or challenge to consider changing our methods strikes of course at all our ego-defenses. But in this case the analytical method offers the possibility of such great increases in effectiveness that it is unfortunate that it cannot be regarded more often as a challenge to learning rather than as challenge to combat. Many of the recent triumphs in molecular biology have in fact been achieved on just such "oversimplified model systems," very much along the analytical lines laid down in the 1958 discussion. They have not fallen to the kind of men who justify themselves by saying "No two cells are alike," regardless of how true that may ultimately be. The triumphs are in fact triumphs of a new way of thinking.
  • the emphasis on strong inference
  • is also partly due to the nature of the fields themselves. Biology, with its vast informational detail and complexity, is a "high-information" field, where years and decades can easily be wasted on the usual type of "low-information" observations or experiments if one does not think carefully in advance about what the most important and conclusive experiments would be. And in high-energy physics, both the "information flux" of particles from the new accelerators and the million-dollar costs of operation have forced a similar analytical approach. It pays to have a top-notch group debate every experiment ahead of time; and the habit spreads throughout the field.
  • Historically, I think, there have been two main contributions to the development of a satisfactory strong-inference method. The first is that of Francis Bacon (13). He wanted a "surer method" of "finding out nature" than either the logic-chopping or all-inclusive theories of the time or the laudable but crude attempts to make inductions "by simple enumeration." He did not merely urge experiments as some suppose, he showed the fruitfulness of interconnecting theory and experiment so that the one checked the other. Of the many inductive procedures he suggested, the most important, I think, was the conditional inductive tree, which proceeded from alternative hypothesis (possible "causes," as he calls them), through crucial experiments ("Instances of the Fingerpost"), to exclusion of some alternatives and adoption of what is left ("establishing axioms"). His Instances of the Fingerpost are explicitly at the forks in the logical tree, the term being borrowed "from the fingerposts which are set up where roads part, to indicate the several directions."
  • ere was a method that could separate off the empty theories! Bacon, said the inductive method could be learned by anybody, just like learning to "draw a straighter line or more perfect circle . . . with the help of a ruler or a pair of compasses." "My way of discovering sciences goes far to level men's wit and leaves but little to individual excellence, because it performs everything by the surest rules and demonstrations." Even occasional mistakes would not be fatal. "Truth will sooner come out from error than from confusion."
  • Nevertheless there is a difficulty with this method. As Bacon emphasizes, it is necessary to make "exclusions." He says, "The induction which is to be available for the discovery and demonstration of sciences and arts, must analyze nature by proper rejections and exclusions, and then, after a sufficient number of negatives come to a conclusion on the affirmative instances." "[To man] it is granted only to proceed at first by negatives, and at last to end in affirmatives after exclusion has been exhausted." Or, as the philosopher Karl Popper says today there is no such thing as proof in science - because some later alternative explanation may be as good or better - so that science advances only by disproofs. There is no point in making hypotheses that are not falsifiable because such hypotheses do not say anything, "it must be possible for all empirical scientific system to be refuted by experience" (14).
  • The difficulty is that disproof is a hard doctrine. If you have a hypothesis and I have another hypothesis, evidently one of them must be eliminated. The scientist seems to have no choice but to be either soft-headed or disputatious. Perhaps this is why so many tend to resist the strong analytical approach and why some great scientists are so disputatious.
  • Fortunately, it seems to me, this difficulty can be removed by the use of a second great intellectual invention, the "method of multiple hypotheses," which is what was needed to round out the Baconian scheme. This is a method that was put forward by T.C. Chamberlin (15), a geologist at Chicago at the turn of the century, who is best known for his contribution to the Chamberlain-Moulton hypothesis of the origin of the solar system.
  • Chamberlin says our trouble is that when we make a single hypothesis, we become attached to it. "The moment one has offered an original explanation for a phenomenon which seems satisfactory, that moment affection for his intellectual child springs into existence, and as the explanation grows into a definite theory his parental affections cluster about his offspring and it grows more and more dear to him. . . . There springs up also unwittingly a pressing of the theory to make it fit the facts and a pressing of the facts to make them fit the theory..." "To avoid this grave danger, the method of multiple working hypotheses is urged. It differs from the simple working hypothesis in that it distributes the effort and divides the affections. . . . Each hypothesis suggests its own criteria, its own method of proof, its own method of developing the truth, and if a group of hypotheses encompass the subject on all sides, the total outcome of means and of methods is full and rich."
  • The conflict and exclusion of alternatives that is necessary to sharp inductive inference has been all too often a conflict between men, each with his single Ruling Theory. But whenever each man begins to have multiple working hypotheses, it becomes purely a conflict between ideas. It becomes much easier then for each of us to aim every day at conclusive disproofs - at strong inference - without either reluctance or combativeness. In fact, when there are multiple hypotheses, which are not anyone's "personal property," and when there are crucial experiments to test them, the daily life in the laboratory takes on an interest and excitement it never had, and the students can hardly wait to get to work to see how the detective story will come out. It seems to me that this is the reason for the development of those distinctive habits of mind and the "complex thought" that Chamberlin described, the reason for the sharpness, the excitement, the zeal, the teamwork - yes, even international teamwork - in molecular biology and high- energy physics today. What else could be so effective?
  • Unfortunately, I think, there are other other areas of science today that are sick by comparison, because they have forgotten the necessity for alternative hypotheses and disproof. Each man has only one branch - or none - on the logical tree, and it twists at random without ever coming to the need for a crucial decision at any point. We can see from the external symptoms that there is something scientifically wrong. The Frozen Method, The Eternal Surveyor, The Never Finished, The Great Man With a Single Hypothcsis, The Little Club of Dependents, The Vendetta, The All-Encompassing Theory Which Can Never Be Falsified.
  • a "theory" of this sort is not a theory at all, because it does not exclude anything. It predicts everything, and therefore does not predict anything. It becomes simply a verbal formula which the graduate student repeats and believes because the professor has said it so often. This is not science, but faith; not theory, but theology. Whether it is hand-waving or number-waving, or equation-waving, a theory is not a theory unless it can be disproved. That is, unless it can be falsified by some possible experimental outcome.
  • the work methods of a number of scientists have been testimony to the power of strong inference. Is success not due in many cases to systematic use of Bacon's "surest rules and demonstrations" as much as to rare and unattainable intellectual power? Faraday's famous diary (16), or Fermi's notebooks (3, 17), show how these men believed in the effectiveness of daily steps in applying formal inductive methods to one problem after another.
  • Surveys, taxonomy, design of equipment, systematic measurements and tables, theoretical computations - all have their proper and honored place, provided they are parts of a chain of precise induction of how nature works. Unfortunately, all too often they become ends in themselves, mere time-serving from the point of view of real scientific advance, a hypertrophied methodology that justifies itself as a lore of respectability.
  • We speak piously of taking measurements and making small studies that will "add another brick to the temple of science." Most such bricks just lie around the brickyard (20). Tables of constraints have their place and value, but the study of one spectrum after another, if not frequently re-evaluated, may become a substitute for thinking, a sad waste of intelligence in a research laboratory, and a mistraining whose crippling effects may last a lifetime.
  • Beware of the man of one method or one instrument, either experimental or theoretical. He tends to become method-oriented rather than problem-oriented. The method-oriented man is shackled; the problem-oriented man is at least reaching freely toward that is most important. Strong inference redirects a man to problem-orientation, but it requires him to be willing repeatedly to put aside his last methods and teach himself new ones.
  • anyone who asks the question about scientific effectiveness will also conclude that much of the mathematizing in physics and chemistry today is irrelevant if not misleading. The great value of mathematical formulation is that when an experiment agrees with a calculation to five decimal places, a great many alternative hypotheses are pretty well excluded (though the Bohr theory and the Schrödinger theory both predict exactly the same Rydberg constant!). But when the fit is only to two decimal places, or one, it may be a trap for the unwary; it may be no better than any rule-of-thumb extrapolation, and some other kind of qualitative exclusion might be more rigorous for testing the assumptions and more important to scientific understanding than the quantitative fit.
  • Today we preach that science is not science unless it is quantitative. We substitute correlations for causal studies, and physical equations for organic reasoning. Measurements and equations are supposed to sharpen thinking, but, in my observation, they more often tend to make the thinking noncausal and fuzzy. They tend to become the object of scientific manipulation instead of auxiliary tests of crucial inferences.
  • Many - perhaps most - of the great issues of science are qualitative, not quantitative, even in physics and chemistry. Equations and measurements are useful when and only when they are related to proof; but proof or disproof comes first and is in fact strongest when it is absolutely convincing without any quantitative measurement.
  • you can catch phenomena in a logical box or in a mathematical box. The logical box is coarse but strong. The mathematical box is fine-grained but flimsy. The mathematical box is a beautiful way of wrapping up a problem, but it will not hold the phenomena unless they have been caught in a logical box to begin with.
  • Of course it is easy - and all too common - for one scientist to call the others unscientific. My point is not that my particular conclusions here are necessarily correct, but that we have long needed some absolute standard of possible scientific effectiveness by which to measure how well we are succeeding in various areas - a standard that many could agree on and one that would be undistorted by the scientific pressures and fashions of the times and the vested interests and busywork that they develop. It is not public evaluation I am interested in so much as a private measure by which to compare one's own scientific performance with what it might be. I believe that strong inference provides this kind of standard of what the maximum possible scientific effectiveness could be - as well as a recipe for reaching it.
  • The strong-inference point of view is so resolutely critical of methods of work and values in science that any attempt to compare specific cases is likely to sound but smug and destructive. Mainly one should try to teach it by example and by exhorting to self-analysis and self-improvement only in general terms
  • one severe but useful private test - a touchstone of strong inference - that removes the necessity for third-person criticism, because it is a test that anyone can learn to carry with him for use as needed. It is our old friend the Baconian "exclusion," but I call it "The Question." Obviously it should be applied as much to one's own thinking as to others'. It consists of asking in your own mind, on hearing any scientific explanation or theory put forward, "But sir, what experiment could disprove your hypothesis?"; or, on hearing a scientific experiment described, "But sir, what hypothesis does your experiment disprove?"
  • It is not true that all science is equal; or that we cannot justly compare the effectiveness of scientists by any method other than a mutual-recommendation system. The man to watch, the man to put your money on, is not the man who wants to make "a survey" or a "more detailed study" but the man with the notebook, the man with the alternative hypotheses and the crucial experiments, the man who knows how to answer your Question of disproof and is already working on it.
  •  
    There is so much bad science and bad statistics information in media reports, publications, and shared between conversants that I think it is important to understand about facts and proofs and the associated pitfalls.
Weiye Loh

A Brief Primer on Criminal Statistics « Canada « Skeptic North - 0 views

  • Occurrences of crime are properly expressed as the number of incidences per 100,000 people. Total numbers are not informative on their own and it is very easy to manipulate an argument by cherry picking between a total number and a rate.  Beware of claims about crime that use raw incidence numbers. When a change in whole incidence numbers is observed, this might not have any bearing on crime levels at all, because levels of crime are dependent on population.
  • Whole Numbers versus Rates
  • Reliability Not every criminal statistic is equally reliable. Even though we have measures of incidences of crimes across types and subtypes, not every one of these statistics samples the actual incidence of these crimes in the same way. Indeed, very few measure the total incidences very reliably at all. The crime rates that you are most likely to encounter capture only crimes known and substantiated by police. These numbers are vulnerable to variances in how crimes become known and verified by police in the first place. Crimes very often go unreported or undiscovered. Some crimes are more likely to go unreported than others (such as sexual assaults and drug possession), and some crimes are more difficult to substantiate as having occurred than others.
  • ...9 more annotations...
  • Complicating matters further is the fact that these reporting patterns vary over time and are reflected in observed trends.   So, when a change in the police reported crime rate is observed from year to year or across a span of time we may be observing a “real” change, we may be observing a change in how these crimes come to the attention of police, or we may be seeing a mixture of both.
  • Generally, the most reliable criminal statistic is the homicide rate – it’s very difficult, though not impossible, to miss a dead body. In fact, homicides in Canada are counted in the year that they become known to police and not in the year that they occurred.  Our most reliable number is very, very close, but not infallible.
  • Crimes known to the police nearly always under measure the true incidence of crime, so other measures are needed to better complete our understanding. The reported crimes measure is reported every year to Statistics Canada from data that makes up the Uniform Crime Reporting Survey. This is a very rich data set that measures police data very accurately but tells us nothing about unreported crime.
  • We do have some data on unreported crime available. Victims are interviewed (after self-identifying) via the General Social Survey. The survey is conducted every five years
  • This measure captures information in eight crime categories both reported, and not reported to police. It has its own set of interpretation problems and pathways to misuse. The survey relies on self-reporting, so the accuracy of the information will be open to errors due to faulty memories, willingness to report, recording errors etc.
  • From the last data set available, self-identified victims did not report 69% of violent victimizations (sexual assault, robbery and physical assault), 62% of household victimizations (break and enter, motor vehicle/parts theft, household property theft and vandalism), and 71% of personal property theft victimizations.
  • while people generally understand that crimes go unreported and unknown to police, they tend to be surprised and perhaps even shocked at the actual amounts that get unreported. These numbers sound scary. However, the most common reasons reported by victims of violent and household crime for not reporting were: believing the incident was not important enough (68%) believing the police couldn’t do anything about the incident (59%), and stating that the incident was dealt with in another way (42%).
  • Also, note that the survey indicated that 82% of violent incidents did not result in injuries to the victims. Do claims that we should do something about all this hidden crime make sense in light of what this crime looks like in the limited way we can understand it? How could you be reasonably certain that whatever intervention proposed would in fact reduce the actual amount of crime and not just reduce the amount that goes unreported?
  • Data is collected at all levels of the crime continuum with differing levels of accuracy and applicability. This is nicely reflected in the concept of “the crime funnel”. All criminal incidents that are ever committed are at the opening of the funnel. There is “loss” all along the way to the bottom where only a small sample of incidences become known with charges laid, prosecuted successfully and responded to by the justice system.  What goes into the top levels of the funnel affects what we can know at any other point later.
Weiye Loh

What is the role of the state? | Martin Wolf's Exchange | FT.com - 0 views

  • This question has concerned western thinkers at least since Plato (5th-4th century BCE). It has also concerned thinkers in other cultural traditions: Confucius (6th-5th century BCE); China’s legalist tradition; and India’s Kautilya (4th-3rd century BCE). The perspective here is that of the contemporary democratic west.
  • The core purpose of the state is protection. This view would be shared by everybody, except anarchists, who believe that the protective role of the state is unnecessary or, more precisely, that people can rely on purely voluntary arrangements.
  • Contemporary Somalia shows the horrors that can befall a stateless society. Yet horrors can also befall a society with an over-mighty state. It is evident, because it is the story of post-tribal humanity that the powers of the state can be abused for the benefit of those who control it.
  • ...9 more annotations...
  • In his final book, Power and Prosperity, the late Mancur Olson argued that the state was a “stationary bandit”. A stationary bandit is better than a “roving bandit”, because the latter has no interest in developing the economy, while the former does. But it may not be much better, because those who control the state will seek to extract the surplus over subsistence generated by those under their control.
  • In the contemporary west, there are three protections against undue exploitation by the stationary bandit: exit, voice (on the first two of these, see this on Albert Hirschman) and restraint. By “exit”, I mean the possibility of escaping from the control of a given jurisdiction, by emigration, capital flight or some form of market exchange. By “voice”, I mean a degree of control over, the state, most obviously by voting. By “restraint”, I mean independent courts, division of powers, federalism and entrenched rights.
  • defining what a democratic state, viewed precisely as such a constrained protective arrangement, is entitled to do.
  • There exists a strand in classical liberal or, in contemporary US parlance, libertarian thought which believes the answer is to define the role of the state so narrowly and the rights of individuals so broadly that many political choices (the income tax or universal health care, for example) would be ruled out a priori. In other words, it seeks to abolish much of politics through constitutional restraints. I view this as a hopeless strategy, both intellectually and politically. It is hopeless intellectually, because the values people hold are many and divergent and some of these values do not merely allow, but demand, government protection of weak, vulnerable or unfortunate people. Moreover, such values are not “wrong”. The reality is that people hold many, often incompatible, core values. Libertarians argue that the only relevant wrong is coercion by the state. Others disagree and are entitled to do so. It is hopeless politically, because democracy necessitates debate among widely divergent opinions. Trying to rule out a vast range of values from the political sphere by constitutional means will fail. Under enough pressure, the constitution itself will be changed, via amendment or reinterpretation.
  • So what ought the protective role of the state to include? Again, in such a discussion, classical liberals would argue for the “night-watchman” role. The government’s responsibilities are limited to protecting individuals from coercion, fraud and theft and to defending the country from foreign aggression. Yet once one has accepted the legitimacy of using coercion (taxation) to provide the goods listed above, there is no reason in principle why one should not accept it for the provision of other goods that cannot be provided as well, or at all, by non-political means.
  • Those other measures would include addressing a range of externalities (e.g. pollution), providing information and supplying insurance against otherwise uninsurable risks, such as unemployment, spousal abandonment and so forth. The subsidisation or public provision of childcare and education is a way to promote equality of opportunity. The subsidisation or public provision of health insurance is a way to preserve life, unquestionably one of the purposes of the state. Safety standards are a way to protect people against the carelessness or malevolence of others or (more controversially) themselves. All these, then, are legitimate protective measures. The more complex the society and economy, the greater the range of the protections that will be sought.
  • What, then, are the objections to such actions? The answers might be: the proposed measures are ineffective, compared with what would happen in the absence of state intervention; the measures are unaffordable and might lead to state bankruptcy; the measures encourage irresponsible behaviour; and, at the limit, the measures restrict individual autonomy to an unacceptable degree. These are all, we should note, questions of consequences.
  • The vote is more evenly distributed than wealth and income. Thus, one would expect the tenor of democratic policymaking to be redistributive and so, indeed, it is. Those with wealth and income to protect will then make political power expensive to acquire and encourage potential supporters to focus on common enemies (inside and outside the country) and on cultural values. The more unequal are incomes and wealth and the more determined are the “haves” to avoid being compelled to support the “have-nots”, the more politics will take on such characteristics.
  • In the 1970s, the view that democracy would collapse under the weight of its excessive promises seemed to me disturbingly true. I am no longer convinced of this: as Adam Smith said, “There is a great deal of ruin in a nation”. Moreover, the capacity for learning by democracies is greater than I had realised. The conservative movements of the 1980s were part of that learning. But they went too far in their confidence in market arrangements and their indifference to the social and political consequences of inequality. I would support state pensions, state-funded health insurance and state regulation of environmental and other externalities. I am happy to debate details. The ancient Athenians called someone who had a purely private life “idiotes”. This is, of course, the origin of our word “idiot”. Individual liberty does indeed matter. But it is not the only thing that matters. The market is a remarkable social institution. But it is far from perfect. Democratic politics can be destructive. But it is much better than the alternatives. Each of us has an obligation, as a citizen, to make politics work as well as he (or she) can and to embrace the debate over a wide range of difficult choices that this entails.
  •  
    What is the role of the state?
Weiye Loh

Let's make science metrics more scientific : Article : Nature - 0 views

  • Measuring and assessing academic performance is now a fact of scientific life.
  • Yet current systems of measurement are inadequate. Widely used metrics, from the newly-fashionable Hirsch index to the 50-year-old citation index, are of limited use1
  • Existing metrics do not capture the full range of activities that support and transmit scientific ideas, which can be as varied as mentoring, blogging or creating industrial prototypes.
  • ...15 more annotations...
  • narrow or biased measures of scientific achievement can lead to narrow and biased science.
  • Global demand for, and interest in, metrics should galvanize stakeholders — national funding agencies, scientific research organizations and publishing houses — to combine forces. They can set an agenda and foster research that establishes sound scientific metrics: grounded in theory, built with high-quality data and developed by a community with strong incentives to use them.
  • Scientists are often reticent to see themselves or their institutions labelled, categorized or ranked. Although happy to tag specimens as one species or another, many researchers do not like to see themselves as specimens under a microscope — they feel that their work is too complex to be evaluated in such simplistic terms. Some argue that science is unpredictable, and that any metric used to prioritize research money risks missing out on an important discovery from left field.
    • Weiye Loh
       
      It is ironic that while scientists feel that their work are too complex to be evaluated in simplistic terms or matrics, they nevertheless feel ok to evaluate the world in simplistic terms. 
  • It is true that good metrics are difficult to develop, but this is not a reason to abandon them. Rather it should be a spur to basing their development in sound science. If we do not press harder for better metrics, we risk making poor funding decisions or sidelining good scientists.
  • Metrics are data driven, so developing a reliable, joined-up infrastructure is a necessary first step.
  • We need a concerted international effort to combine, augment and institutionalize these databases within a cohesive infrastructure.
  • On an international level, the issue of a unique researcher identification system is one that needs urgent attention. There are various efforts under way in the open-source and publishing communities to create unique researcher identifiers using the same principles as the Digital Object Identifier (DOI) protocol, which has become the international standard for identifying unique documents. The ORCID (Open Researcher and Contributor ID) project, for example, was launched in December 2009 by parties including Thompson Reuters and Nature Publishing Group. The engagement of international funding agencies would help to push this movement towards an international standard.
  • if all funding agencies used a universal template for reporting scientific achievements, it could improve data quality and reduce the burden on investigators.
    • Weiye Loh
       
      So in future, we'll only have one robust matric to evaluate scientific contribution? hmm...
  • Importantly, data collected for use in metrics must be open to the scientific community, so that metric calculations can be reproduced. This also allows the data to be efficiently repurposed.
  • As well as building an open and consistent data infrastructure, there is the added challenge of deciding what data to collect and how to use them. This is not trivial. Knowledge creation is a complex process, so perhaps alternative measures of creativity and productivity should be included in scientific metrics, such as the filing of patents, the creation of prototypes4 and even the production of YouTube videos.
  • Perhaps publications in these different media should be weighted differently in different fields.
  • There needs to be a greater focus on what these data mean, and how they can be best interpreted.
  • This requires the input of social scientists, rather than just those more traditionally involved in data capture, such as computer scientists.
  • An international data platform supported by funding agencies could include a virtual 'collaboratory', in which ideas and potential solutions can be posited and discussed. This would bring social scientists together with working natural scientists to develop metrics and test their validity through wikis, blogs and discussion groups, thus building a community of practice. Such a discussion should be open to all ideas and theories and not restricted to traditional bibliometric approaches.
  • Far-sighted action can ensure that metrics goes beyond identifying 'star' researchers, nations or ideas, to capturing the essence of what it means to be a good scientist.
  •  
    Let's make science metrics more scientific Julia Lane1 Top of pageAbstract To capture the essence of good science, stakeholders must combine forces to create an open, sound and consistent system for measuring all the activities that make up academic productivity, says Julia Lane.
Weiye Loh

CancerGuide: The Median Isn't the Message - 0 views

  • Statistics recognizes different measures of an "average," or central tendency. The mean is our usual concept of an overall average - add up the items and divide them by the number of sharers
  • The median, a different measure of central tendency, is the half-way point.
  • A politician in power might say with pride, "The mean income of our citizens is $15,000 per year." The leader of the opposition might retort, "But half our citizens make less than $10,000 per year." Both are right, but neither cites a statistic with impassive objectivity. The first invokes a mean, the second a median. (Means are higher than medians in such cases because one millionaire may outweigh hundreds of poor people in setting a mean; but he can balance only one mendicant in calculating a median).
  • ...7 more annotations...
  • The larger issue that creates a common distrust or contempt for statistics is more troubling. Many people make an unfortunate and invalid separation between heart and mind, or feeling and intellect. In some contemporary traditions, abetted by attitudes stereotypically centered on Southern California, feelings are exalted as more "real" and the only proper basis for action - if it feels good, do it - while intellect gets short shrift as a hang-up of outmoded elitism. Statistics, in this absurd dichotomy, often become the symbol of the enemy. As Hilaire Belloc wrote, "Statistics are the triumph of the quantitative method, and the quantitative method is the victory of sterility and death."
  • This is a personal story of statistics, properly interpreted, as profoundly nurturant and life-giving. It declares holy war on the downgrading of intellect by telling a small story about the utility of dry, academic knowledge about science. Heart and head are focal points of one body, one personality.
  • We still carry the historical baggage of a Platonic heritage that seeks sharp essences and definite boundaries. (Thus we hope to find an unambiguous "beginning of life" or "definition of death," although nature often comes to us as irreducible continua.) This Platonic heritage, with its emphasis in clear distinctions and separated immutable entities, leads us to view statistical measures of central tendency wrongly, indeed opposite to the appropriate interpretation in our actual world of variation, shadings, and continua. In short, we view means and medians as the hard "realities," and the variation that permits their calculation as a set of transient and imperfect measurements of this hidden essence. If the median is the reality and variation around the median just a device for its calculation, the "I will probably be dead in eight months" may pass as a reasonable interpretation.
  • But all evolutionary biologists know that variation itself is nature's only irreducible essence. Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions. Therefore, I looked at the mesothelioma statistics quite differently - and not only because I am an optimist who tends to see the doughnut instead of the hole, but primarily because I know that variation itself is the reality. I had to place myself amidst the variation. When I learned about the eight-month median, my first intellectual reaction was: fine, half the people will live longer; now what are my chances of being in that half. I read for a furious and nervous hour and concluded, with relief: damned good. I possessed every one of the characteristics conferring a probability of longer life: I was young; my disease had been recognized in a relatively early stage; I would receive the nation's best medical treatment; I had the world to live for; I knew how to read the data properly and not despair.
  • Another technical point then added even more solace. I immediately recognized that the distribution of variation about the eight-month median would almost surely be what statisticians call "right skewed." (In a symmetrical distribution, the profile of variation to the left of the central tendency is a mirror image of variation to the right. In skewed distributions, variation to one side of the central tendency is more stretched out - left skewed if extended to the left, right skewed if stretched out to the right.) The distribution of variation had to be right skewed, I reasoned. After all, the left of the distribution contains an irrevocable lower boundary of zero (since mesothelioma can only be identified at death or before). Thus, there isn't much room for the distribution's lower (or left) half - it must be scrunched up between zero and eight months. But the upper (or right) half can extend out for years and years, even if nobody ultimately survives. The distribution must be right skewed, and I needed to know how long the extended tail ran - for I had already concluded that my favorable profile made me a good candidate for that part of the curve.
  • The distribution was indeed, strongly right skewed, with a long tail (however small) that extended for several years above the eight month median. I saw no reason why I shouldn't be in that small tail, and I breathed a very long sigh of relief. My technical knowledge had helped. I had read the graph correctly. I had asked the right question and found the answers. I had obtained, in all probability, the most precious of all possible gifts in the circumstances - substantial time.
  • One final point about statistical distributions. They apply only to a prescribed set of circumstances - in this case to survival with mesothelioma under conventional modes of treatment. If circumstances change, the distribution may alter. I was placed on an experimental protocol of treatment and, if fortune holds, will be in the first cohort of a new distribution with high median and a right tail extending to death by natural causes at advanced old age.
  •  
    The Median Isn't the Message by Stephen Jay Gould
Weiye Loh

The Way We Live Now - Metric Mania - NYTimes.com - 0 views

  • In the realm of public policy, we live in an age of numbers.
  • do wehold an outsize belief in our ability to gauge complex phenomena, measure outcomes and come up with compelling numerical evidence? A well-known quotation usually attributed to Einstein is “Not everything that can be counted counts, and not everything that counts can be counted.” I’d amend it to a less eloquent, more prosaic statement: Unless we know how things are counted, we don’t know if it’s wise to count on the numbers.
  • The problem isn’t with statistical tests themselves but with what we do before and after we run them.
  • ...9 more annotations...
  • First, we count if we can, but counting depends a great deal on previous assumptions about categorization. Consider, for example, the number of homeless people in Philadelphia, or the number of battered women in Atlanta, or the number of suicides in Denver. Is someone homeless if he’s unemployed and living with his brother’s family temporarily? Do we require that a women self-identify as battered to count her as such? If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself? The answers to such questions significantly affect the count.
  • Second, after we’ve gathered some numbers relating to a phenomenon, we must reasonably aggregate them into some sort of recommendation or ranking. This is not easy. By appropriate choices of criteria, measurement protocols and weights, almost any desired outcome can be reached.
  • Are there good reasons the authors picked the criteria they did? Why did they weigh the criteria in the way they did?
  • Since the answer to the last question is usually yes, the problem of reasonable aggregation is no idle matter.
  • These two basic procedures — counting and aggregating — have important implications for public policy. Consider the plan to evaluate the progress of New York City public schools inaugurated by the city a few years ago. While several criteria were used, much of a school’s grade was determined by whether students’ performance on standardized state tests showed annual improvement. This approach risked putting too much weight on essentially random fluctuations and induced schools to focus primarily on the topics on the tests. It also meant that the better schools could receive mediocre grades becausethey were already performing well and had little room for improvement. Conversely, poor schools could receive high grades by improving just a bit.
  • Medical researchers face similar problems when it comes to measuring effectiveness.
  • Suppose that whenever people contract the disease, they always get it in their mid-60s and live to the age of 75. In the first region, an early screening program detects such people in their 60s. Because these people live to age 75, the five-year survival rate is 100 percent. People in the second region are not screened and thus do not receive their diagnoses until symptoms develop in their early 70s, but they, too, die at 75, so their five-year survival rate is 0 percent. The laissez-faire approach thus yields the same results as the universal screening program, yet if five-year survival were the criterion for effectiveness, universal screening would be deemed the best practice.
  • Because so many criteria can be used to assess effectiveness — median or mean survival times, side effects, quality of life and the like — there is a case to be made against mandating that doctors follow what seems at any given time to be the best practice. Perhaps, as some have suggested, we should merely nudge them with gentle incentives. A comparable tentativeness may be appropriate when devising criteria for effective schools.
  • Arrow’s Theorem, a famous result in mathematical economics, essentially states that no voting system satisfying certain minimal conditions can be guaranteed to always yield a fair or reasonable aggregation of the voters’ rankings of several candidates. A squishier analogue for the field of social measurement would say something like this: No method of measuring a societal phenomenon satisfying certain minimal conditions exists that can’t be second-guessed, deconstructed, cheated, rejected or replaced. This doesn’t mean we shouldn’t be counting — but it does mean we should do so with as much care and wisdom as we can muster.
  •  
    THE WAY WE LIVE NOW Metric Mania
Weiye Loh

Study: OLPC Fails Students as a Tool for Education | News & Opinion | PCMag.com - 0 views

  •  
    The One Laptop Per Child (OLPC) program of low-cost laptops for developing countries has not led to any measurable impact in academic achievement, according to a recent report. Instead, the study concluded that Peru might be better off spending funding on acquiring and training high-quality teachers, and not investing in technology without complementary instruction. The paper, published in February, concluded that the "intense access to computers" the program provided "does not lead to measurable effects in academic achievement, but it did generate some positive impact on general cognitive skills."
joanne ye

Measuring the effectiveness of online activism - 2 views

Reference: Krishnan, S. (2009, June 21). Measuring the effectiveness of online activism. The Hindu. Retrieved September 24, 2009, from Factiva. (Article can be found at bottom of the post) Summary...

online activism freedom control

started by joanne ye on 24 Sep 09 no follow-up yet
Jody Poh

Australia's porn-blocking plan unveiled - 10 views

Elaine said: What are the standards put in place to determine whether something is of adult content? Who set those standards? Based on 'general' beliefs and what the government/"web police'' think ...

Weiye Loh

The Decline Effect and the Scientific Method : The New Yorker - 0 views

  • On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties.
  • the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.
  • Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
  • ...30 more annotations...
  • But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
  • In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”
  • At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”
  • the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.
  • The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!
  • this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”
  • In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
  • the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.
  • the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
  • Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
  • While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts
  • an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
  • The funnel graph visually captures the distortions of selective reporting. For instance, after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results.
  • Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
  • Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
  • One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
  • John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.”
  • In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
  • The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds of studies on the various genes that control the differences in disease risk between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
  • the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
  • The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
  • scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,” he says. The current “obsession” with replicability distracts from the real problem, which is faulty design. He notes that nobody even tries to replicate most science papers—there are simply too many. (According to Nature, a third of all studies never even get cited, let alone repeated.)
  • Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”
  • Although such reforms would mitigate the dangers of publication bias and selective reporting, they still wouldn’t erase the decline effect. This is largely because scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging
  • John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
  • The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
  • The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
  • This suggests that the decline effect is actually a decline of illusion.
  • While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that. Many scientific theories continue to be considered true even after failing numerous experimental tests. Verbal overshadowing might exhibit the decline effect, but it remains extensively relied upon within the field. The same holds for any number of phenomena, from the disappearing benefits of second-generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite these findings, second-generation antipsychotics are still widely prescribed, and our model of the neutron hasn’t changed. The law of gravity remains the same.
  • Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
Weiye Loh

Let There Be More Efficient Light - NYTimes.com - 0 views

  • LAST week Michele Bachmann, a Republican representative from Minnesota, introduced a bill to roll back efficiency standards for light bulbs, which include a phasing out of incandescent bulbs in favor of more energy-efficient bulbs. The “government has no business telling an individual what kind of light bulb to buy,” she declared.
  • But this opposition ignores another, more important bit of American history: the critical role that government-mandated standards have played in scientific and industrial innovation.
  • inventions alone weren’t enough to guarantee progress. Indeed, at the time the lack of standards for everything from weights and measures to electricity — even the gallon, for example, had eight definitions — threatened to overwhelm industry and consumers with a confusing array of incompatible choices.
  • ...5 more annotations...
  • This wasn’t the case everywhere. Germany’s standards agency, established in 1887, was busy setting rules for everything from the content of dyes to the process for making porcelain; other European countries soon followed suit. Higher-quality products, in turn, helped the growth in Germany’s trade exceed that of the United States in the 1890s. America finally got its act together in 1894, when Congress standardized the meaning of what are today common scientific measures, including the ohm, the volt, the watt and the henry, in line with international metrics. And, in 1901, the United States became the last major economic power to establish an agency to set technological standards. The result was a boom in product innovation in all aspects of life during the 20th century. Today we can go to our hardware store and choose from hundreds of light bulbs that all conform to government-mandated quality and performance standards.
  • Technological standards not only promote innovation — they also can help protect one country’s industries from falling behind those of other countries. Today China, India and other rapidly growing nations are adopting standards that speed the deployment of new technologies. Without similar requirements to manufacture more technologically advanced products, American companies risk seeing the overseas markets for their products shrink while innovative goods from other countries flood the domestic market. To prevent that from happening, America needs not only to continue developing standards, but also to devise a strategy to apply them consistently and quickly.
  • The best approach would be to borrow from Japan, whose Top Runner program sets energy-efficiency standards by identifying technological leaders in a particular industry — say, washing machines — and mandating that the rest of the industry keep up. As technologies improve, the standards change as well, enabling a virtuous cycle of improvement. At the same time, the government should work with businesses to devise multidimensional standards, so that consumers don’t balk at products because they sacrifice, say, brightness and cost for energy efficiency.
  • This is not to say that innovation doesn’t bring disruption, and American policymakers can’t ignore the jobs that are lost when government standards sweep older technologies into the dustbin of history. An effective way forward on light bulbs, then, would be to apply standards only to those manufacturers that produce or import in large volume. Meanwhile, smaller, legacy light-bulb producers could remain, cushioning the blow to workers and meeting consumer demand.
  • Technologies and the standards that guide their deployment have revolutionized American society. They’ve been so successful, in fact, that the role of government has become invisible — so much so that even members of Congress should be excused for believing the government has no business mandating your choice of light bulbs.
Weiye Loh

Freakonomics » The U.K.'s 'Under-Aged' Socially Networked Children - 0 views

  • The study’s authors argue that removing age restrictions from sites like Facebook might actually be the best way of improving child safety online.
  • Elisabeth Staksrud, from the University of Oslo and one of the report’s authors comments that: “since children often lie about their age to join ‘forbidden’ sites it would be more practical to identify younger users and to target them with protective measures.”
  • This flies in the face of what many see as a critical security wall  protecting children from cyber-crime on social networking sites. A report released in January by Internet security firm PandaLabs identified Facebook and Twitter as the sites which are most prone to security breaches. The danger is particularly accute when young children enter their real personal information on their profile. Though, as the new research indicates, children are already lying about their age to sign up for a profile. So from a safety standpoint, the most important measure for children to take is to refrain from entering real information such as their address or where they go to school.
  •  
    The study's authors argue that removing age restrictions from sites like Facebook might actually be the best way of improving child safety online. Elisabeth Staksrud, from the University of Oslo and one of the report's authors comments that: "since children often lie about their age to join 'forbidden' sites it would be more practical to identify younger users and to target them with protective measures."
Weiye Loh

Green Column - Rethinking the Measure of Growth - NYTimes.com - 0 views

  • Asia is not invulnerable to an environmental disaster on the scale of the BP spill. Singapore is a major refining center and transshipment point for crude oil shipments between markets in East Asia and the other oil-rich Gulf region. The quest for more plentiful and less expensive oil for fast-growing Asian economies has also brought a wave of offshore drilling from India and the Gulf of Thailand, to Vietnam and Bohai Bay, on the northeast coast of China.
  • Asian governments have become particularly enthralled with gross domestic product statistics for validation
  • Gross domestic product has come in for some particularly hard knocks since the global financial crisis, notably after a report last year whose co-author was Joseph E. Stiglitz, a Nobel laureate in economics, that said reliance on gross domestic product had blinded governments to the increasing risks in the world economy since 2004.
  •  
    Rethinking the Measure of Growth By WAYNE ARNOLD Published: July 18, 2010
Weiye Loh

The Inequality That Matters - Tyler Cowen - The American Interest Magazine - 0 views

  • most of the worries about income inequality are bogus, but some are probably better grounded and even more serious than even many of their heralds realize.
  • In terms of immediate political stability, there is less to the income inequality issue than meets the eye. Most analyses of income inequality neglect two major points. First, the inequality of personal well-being is sharply down over the past hundred years and perhaps over the past twenty years as well. Bill Gates is much, much richer than I am, yet it is not obvious that he is much happier if, indeed, he is happier at all. I have access to penicillin, air travel, good cheap food, the Internet and virtually all of the technical innovations that Gates does. Like the vast majority of Americans, I have access to some important new pharmaceuticals, such as statins to protect against heart disease. To be sure, Gates receives the very best care from the world’s top doctors, but our health outcomes are in the same ballpark. I don’t have a private jet or take luxury vacations, and—I think it is fair to say—my house is much smaller than his. I can’t meet with the world’s elite on demand. Still, by broad historical standards, what I share with Bill Gates is far more significant than what I don’t share with him.
  • when average people read about or see income inequality, they don’t feel the moral outrage that radiates from the more passionate egalitarian quarters of society. Instead, they think their lives are pretty good and that they either earned through hard work or lucked into a healthy share of the American dream.
  • ...35 more annotations...
  • This is why, for example, large numbers of Americans oppose the idea of an estate tax even though the current form of the tax, slated to return in 2011, is very unlikely to affect them or their estates. In narrowly self-interested terms, that view may be irrational, but most Americans are unwilling to frame national issues in terms of rich versus poor. There’s a great deal of hostility toward various government bailouts, but the idea of “undeserving” recipients is the key factor in those feelings. Resentment against Wall Street gamesters hasn’t spilled over much into resentment against the wealthy more generally. The bailout for General Motors’ labor unions wasn’t so popular either—again, obviously not because of any bias against the wealthy but because a basic sense of fairness was violated. As of November 2010, congressional Democrats are of a mixed mind as to whether the Bush tax cuts should expire for those whose annual income exceeds $250,000; that is in large part because their constituents bear no animus toward rich people, only toward undeservedly rich people.
  • envy is usually local. At least in the United States, most economic resentment is not directed toward billionaires or high-roller financiers—not even corrupt ones. It’s directed at the guy down the hall who got a bigger raise. It’s directed at the husband of your wife’s sister, because the brand of beer he stocks costs $3 a case more than yours, and so on. That’s another reason why a lot of people aren’t so bothered by income or wealth inequality at the macro level. Most of us don’t compare ourselves to billionaires. Gore Vidal put it honestly: “Whenever a friend succeeds, a little something in me dies.”
  • Occasionally the cynic in me wonders why so many relatively well-off intellectuals lead the egalitarian charge against the privileges of the wealthy. One group has the status currency of money and the other has the status currency of intellect, so might they be competing for overall social regard? The high status of the wealthy in America, or for that matter the high status of celebrities, seems to bother our intellectual class most. That class composes a very small group, however, so the upshot is that growing income inequality won’t necessarily have major political implications at the macro level.
  • All that said, income inequality does matter—for both politics and the economy.
  • The numbers are clear: Income inequality has been rising in the United States, especially at the very top. The data show a big difference between two quite separate issues, namely income growth at the very top of the distribution and greater inequality throughout the distribution. The first trend is much more pronounced than the second, although the two are often confused.
  • When it comes to the first trend, the share of pre-tax income earned by the richest 1 percent of earners has increased from about 8 percent in 1974 to more than 18 percent in 2007. Furthermore, the richest 0.01 percent (the 15,000 or so richest families) had a share of less than 1 percent in 1974 but more than 6 percent of national income in 2007. As noted, those figures are from pre-tax income, so don’t look to the George W. Bush tax cuts to explain the pattern. Furthermore, these gains have been sustained and have evolved over many years, rather than coming in one or two small bursts between 1974 and today.1
  • At the same time, wage growth for the median earner has slowed since 1973. But that slower wage growth has afflicted large numbers of Americans, and it is conceptually distinct from the higher relative share of top income earners. For instance, if you take the 1979–2005 period, the average incomes of the bottom fifth of households increased only 6 percent while the incomes of the middle quintile rose by 21 percent. That’s a widening of the spread of incomes, but it’s not so drastic compared to the explosive gains at the very top.
  • The broader change in income distribution, the one occurring beneath the very top earners, can be deconstructed in a manner that makes nearly all of it look harmless. For instance, there is usually greater inequality of income among both older people and the more highly educated, if only because there is more time and more room for fortunes to vary. Since America is becoming both older and more highly educated, our measured income inequality will increase pretty much by demographic fiat. Economist Thomas Lemieux at the University of British Columbia estimates that these demographic effects explain three-quarters of the observed rise in income inequality for men, and even more for women.2
  • Attacking the problem from a different angle, other economists are challenging whether there is much growth in inequality at all below the super-rich. For instance, real incomes are measured using a common price index, yet poorer people are more likely to shop at discount outlets like Wal-Mart, which have seen big price drops over the past twenty years.3 Once we take this behavior into account, it is unclear whether the real income gaps between the poor and middle class have been widening much at all. Robert J. Gordon, an economist from Northwestern University who is hardly known as a right-wing apologist, wrote in a recent paper that “there was no increase of inequality after 1993 in the bottom 99 percent of the population”, and that whatever overall change there was “can be entirely explained by the behavior of income in the top 1 percent.”4
  • And so we come again to the gains of the top earners, clearly the big story told by the data. It’s worth noting that over this same period of time, inequality of work hours increased too. The top earners worked a lot more and most other Americans worked somewhat less. That’s another reason why high earners don’t occasion more resentment: Many people understand how hard they have to work to get there. It also seems that most of the income gains of the top earners were related to performance pay—bonuses, in other words—and not wildly out-of-whack yearly salaries.5
  • It is also the case that any society with a lot of “threshold earners” is likely to experience growing income inequality. A threshold earner is someone who seeks to earn a certain amount of money and no more. If wages go up, that person will respond by seeking less work or by working less hard or less often. That person simply wants to “get by” in terms of absolute earning power in order to experience other gains in the form of leisure—whether spending time with friends and family, walking in the woods and so on. Luck aside, that person’s income will never rise much above the threshold.
  • The funny thing is this: For years, many cultural critics in and of the United States have been telling us that Americans should behave more like threshold earners. We should be less harried, more interested in nurturing friendships, and more interested in the non-commercial sphere of life. That may well be good advice. Many studies suggest that above a certain level more money brings only marginal increments of happiness. What isn’t so widely advertised is that those same critics have basically been telling us, without realizing it, that we should be acting in such a manner as to increase measured income inequality. Not only is high inequality an inevitable concomitant of human diversity, but growing income inequality may be, too, if lots of us take the kind of advice that will make us happier.
  • Why is the top 1 percent doing so well?
  • Steven N. Kaplan and Joshua Rauh have recently provided a detailed estimation of particular American incomes.6 Their data do not comprise the entire U.S. population, but from partial financial records they find a very strong role for the financial sector in driving the trend toward income concentration at the top. For instance, for 2004, nonfinancial executives of publicly traded companies accounted for less than 6 percent of the top 0.01 percent income bracket. In that same year, the top 25 hedge fund managers combined appear to have earned more than all of the CEOs from the entire S&P 500. The number of Wall Street investors earning more than $100 million a year was nine times higher than the public company executives earning that amount. The authors also relate that they shared their estimates with a former U.S. Secretary of the Treasury, one who also has a Wall Street background. He thought their estimates of earnings in the financial sector were, if anything, understated.
  • Many of the other high earners are also connected to finance. After Wall Street, Kaplan and Rauh identify the legal sector as a contributor to the growing spread in earnings at the top. Yet many high-earning lawyers are doing financial deals, so a lot of the income generated through legal activity is rooted in finance. Other lawyers are defending corporations against lawsuits, filing lawsuits or helping corporations deal with complex regulations. The returns to these activities are an artifact of the growing complexity of the law and government growth rather than a tale of markets per se. Finance aside, there isn’t much of a story of market failure here, even if we don’t find the results aesthetically appealing.
  • When it comes to professional athletes and celebrities, there isn’t much of a mystery as to what has happened. Tiger Woods earns much more, even adjusting for inflation, than Arnold Palmer ever did. J.K. Rowling, the first billionaire author, earns much more than did Charles Dickens. These high incomes come, on balance, from the greater reach of modern communications and marketing. Kids all over the world read about Harry Potter. There is more purchasing power to spend on children’s books and, indeed, on culture and celebrities more generally. For high-earning celebrities, hardly anyone finds these earnings so morally objectionable as to suggest that they be politically actionable. Cultural critics can complain that good schoolteachers earn too little, and they may be right, but that does not make celebrities into political targets. They’re too popular. It’s also pretty clear that most of them work hard to earn their money, by persuading fans to buy or otherwise support their product. Most of these individuals do not come from elite or extremely privileged backgrounds, either. They worked their way to the top, and even if Rowling is not an author for the ages, her books tapped into the spirit of their time in a special way. We may or may not wish to tax the wealthy, including wealthy celebrities, at higher rates, but there is no need to “cure” the structural causes of higher celebrity incomes.
  • to be sure, the high incomes in finance should give us all pause.
  • The first factor driving high returns is sometimes called by practitioners “going short on volatility.” Sometimes it is called “negative skewness.” In plain English, this means that some investors opt for a strategy of betting against big, unexpected moves in market prices. Most of the time investors will do well by this strategy, since big, unexpected moves are outliers by definition. Traders will earn above-average returns in good times. In bad times they won’t suffer fully when catastrophic returns come in, as sooner or later is bound to happen, because the downside of these bets is partly socialized onto the Treasury, the Federal Reserve and, of course, the taxpayers and the unemployed.
  • if you bet against unlikely events, most of the time you will look smart and have the money to validate the appearance. Periodically, however, you will look very bad. Does that kind of pattern sound familiar? It happens in finance, too. Betting against a big decline in home prices is analogous to betting against the Wizards. Every now and then such a bet will blow up in your face, though in most years that trading activity will generate above-average profits and big bonuses for the traders and CEOs.
  • To this mix we can add the fact that many money managers are investing other people’s money. If you plan to stay with an investment bank for ten years or less, most of the people playing this investing strategy will make out very well most of the time. Everyone’s time horizon is a bit limited and you will bring in some nice years of extra returns and reap nice bonuses. And let’s say the whole thing does blow up in your face? What’s the worst that can happen? Your bosses fire you, but you will still have millions in the bank and that MBA from Harvard or Wharton. For the people actually investing the money, there’s barely any downside risk other than having to quit the party early. Furthermore, if everyone else made more or less the same mistake (very surprising major events, such as a busted housing market, affect virtually everybody), you’re hardly disgraced. You might even get rehired at another investment bank, or maybe a hedge fund, within months or even weeks.
  • Moreover, smart shareholders will acquiesce to or even encourage these gambles. They gain on the upside, while the downside, past the point of bankruptcy, is borne by the firm’s creditors. And will the bondholders object? Well, they might have a difficult time monitoring the internal trading operations of financial institutions. Of course, the firm’s trading book cannot be open to competitors, and that means it cannot be open to bondholders (or even most shareholders) either. So what, exactly, will they have in hand to object to?
  • Perhaps more important, government bailouts minimize the damage to creditors on the downside. Neither the Treasury nor the Fed allowed creditors to take any losses from the collapse of the major banks during the financial crisis. The U.S. government guaranteed these loans, either explicitly or implicitly. Guaranteeing the debt also encourages equity holders to take more risk. While current bailouts have not in general maintained equity values, and while share prices have often fallen to near zero following the bust of a major bank, the bailouts still give the bank a lifeline. Instead of the bank being destroyed, sometimes those equity prices do climb back out of the hole. This is true of the major surviving banks in the United States, and even AIG is paying back its bailout. For better or worse, we’re handing out free options on recovery, and that encourages banks to take more risk in the first place.
  • there is an unholy dynamic of short-term trading and investing, backed up by bailouts and risk reduction from the government and the Federal Reserve. This is not good. “Going short on volatility” is a dangerous strategy from a social point of view. For one thing, in so-called normal times, the finance sector attracts a big chunk of the smartest, most hard-working and most talented individuals. That represents a huge human capital opportunity cost to society and the economy at large. But more immediate and more important, it means that banks take far too many risks and go way out on a limb, often in correlated fashion. When their bets turn sour, as they did in 2007–09, everyone else pays the price.
  • And it’s not just the taxpayer cost of the bailout that stings. The financial disruption ends up throwing a lot of people out of work down the economic food chain, often for long periods. Furthermore, the Federal Reserve System has recapitalized major U.S. banks by paying interest on bank reserves and by keeping an unusually high interest rate spread, which allows banks to borrow short from Treasury at near-zero rates and invest in other higher-yielding assets and earn back lots of money rather quickly. In essence, we’re allowing banks to earn their way back by arbitraging interest rate spreads against the U.S. government. This is rarely called a bailout and it doesn’t count as a normal budget item, but it is a bailout nonetheless. This type of implicit bailout brings high social costs by slowing down economic recovery (the interest rate spreads require tight monetary policy) and by redistributing income from the Treasury to the major banks.
  • the “going short on volatility” strategy increases income inequality. In normal years the financial sector is flush with cash and high earnings. In implosion years a lot of the losses are borne by other sectors of society. In other words, financial crisis begets income inequality. Despite being conceptually distinct phenomena, the political economy of income inequality is, in part, the political economy of finance. Simon Johnson tabulates the numbers nicely: From 1973 to 1985, the financial sector never earned more than 16 percent of domestic corporate profits. In 1986, that figure reached 19 percent. In the 1990s, it oscillated between 21 percent and 30 percent, higher than it had ever been in the postwar period. This decade, it reached 41 percent. Pay rose just as dramatically. From 1948 to 1982, average compensation in the financial sector ranged between 99 percent and 108 percent of the average for all domestic private industries. From 1983, it shot upward, reaching 181 percent in 2007.7
  • There’s a second reason why the financial sector abets income inequality: the “moving first” issue. Let’s say that some news hits the market and that traders interpret this news at different speeds. One trader figures out what the news means in a second, while the other traders require five seconds. Still other traders require an entire day or maybe even a month to figure things out. The early traders earn the extra money. They buy the proper assets early, at the lower prices, and reap most of the gains when the other, later traders pile on. Similarly, if you buy into a successful tech company in the early stages, you are “moving first” in a very effective manner, and you will capture most of the gains if that company hits it big.
  • The moving-first phenomenon sums to a “winner-take-all” market. Only some relatively small number of traders, sometimes just one trader, can be first. Those who are first will make far more than those who are fourth or fifth. This difference will persist, even if those who are fourth come pretty close to competing with those who are first. In this context, first is first and it doesn’t matter much whether those who come in fourth pile on a month, a minute or a fraction of a second later. Those who bought (or sold, as the case may be) first have captured and locked in most of the available gains. Since gains are concentrated among the early winners, and the closeness of the runner-ups doesn’t so much matter for income distribution, asset-market trading thus encourages the ongoing concentration of wealth. Many investors make lots of mistakes and lose their money, but each year brings a new bunch of projects that can turn the early investors and traders into very wealthy individuals.
  • These two features of the problem—“going short on volatility” and “getting there first”—are related. Let’s say that Goldman Sachs regularly secures a lot of the best and quickest trades, whether because of its quality analysis, inside connections or high-frequency trading apparatus (it has all three). It builds up a treasure chest of profits and continues to hire very sharp traders and to receive valuable information. Those profits allow it to make “short on volatility” bets faster than anyone else, because if it messes up, it still has a large enough buffer to pad losses. This increases the odds that Goldman will repeatedly pull in spectacular profits.
  • Still, every now and then Goldman will go bust, or would go bust if not for government bailouts. But the odds are in any given year that it won’t because of the advantages it and other big banks have. It’s as if the major banks have tapped a hole in the social till and they are drinking from it with a straw. In any given year, this practice may seem tolerable—didn’t the bank earn the money fair and square by a series of fairly normal looking trades? Yet over time this situation will corrode productivity, because what the banks do bears almost no resemblance to a process of getting capital into the hands of those who can make most efficient use of it. And it leads to periodic financial explosions. That, in short, is the real problem of income inequality we face today. It’s what causes the inequality at the very top of the earning pyramid that has dangerous implications for the economy as a whole.
  • What about controlling bank risk-taking directly with tight government oversight? That is not practical. There are more ways for banks to take risks than even knowledgeable regulators can possibly control; it just isn’t that easy to oversee a balance sheet with hundreds of billions of dollars on it, especially when short-term positions are wound down before quarterly inspections. It’s also not clear how well regulators can identify risky assets. Some of the worst excesses of the financial crisis were grounded in mortgage-backed assets—a very traditional function of banks—not exotic derivatives trading strategies. Virtually any asset position can be used to bet long odds, one way or another. It is naive to think that underpaid, undertrained regulators can keep up with financial traders, especially when the latter stand to earn billions by circumventing the intent of regulations while remaining within the letter of the law.
  • For the time being, we need to accept the possibility that the financial sector has learned how to game the American (and UK-based) system of state capitalism. It’s no longer obvious that the system is stable at a macro level, and extreme income inequality at the top has been one result of that imbalance. Income inequality is a symptom, however, rather than a cause of the real problem. The root cause of income inequality, viewed in the most general terms, is extreme human ingenuity, albeit of a perverse kind. That is why it is so hard to control.
  • Another root cause of growing inequality is that the modern world, by so limiting our downside risk, makes extreme risk-taking all too comfortable and easy. More risk-taking will mean more inequality, sooner or later, because winners always emerge from risk-taking. Yet bankers who take bad risks (provided those risks are legal) simply do not end up with bad outcomes in any absolute sense. They still have millions in the bank, lots of human capital and plenty of social status. We’re not going to bring back torture, trial by ordeal or debtors’ prisons, nor should we. Yet the threat of impoverishment and disgrace no longer looms the way it once did, so we no longer can constrain excess financial risk-taking. It’s too soft and cushy a world.
  • Why don’t we simply eliminate the safety net for clueless or unlucky risk-takers so that losses equal gains overall? That’s a good idea in principle, but it is hard to put into practice. Once a financial crisis arrives, politicians will seek to limit the damage, and that means they will bail out major financial institutions. Had we not passed TARP and related policies, the United States probably would have faced unemployment rates of 25 percent of higher, as in the Great Depression. The political consequences would not have been pretty. Bank bailouts may sound quite interventionist, and indeed they are, but in relative terms they probably were the most libertarian policy we had on tap. It meant big one-time expenses, but, for the most part, it kept government out of the real economy (the General Motors bailout aside).
  • We probably don’t have any solution to the hazards created by our financial sector, not because plutocrats are preventing our political system from adopting appropriate remedies, but because we don’t know what those remedies are. Yet neither is another crisis immediately upon us. The underlying dynamic favors excess risk-taking, but banks at the current moment fear the scrutiny of regulators and the public and so are playing it fairly safe. They are sitting on money rather than lending it out. The biggest risk today is how few parties will take risks, and, in part, the caution of banks is driving our current protracted economic slowdown. According to this view, the long run will bring another financial crisis once moods pick up and external scrutiny weakens, but that day of reckoning is still some ways off.
  • Is the overall picture a shame? Yes. Is it distorting resource distribution and productivity in the meantime? Yes. Will it again bring our economy to its knees? Probably. Maybe that’s simply the price of modern society. Income inequality will likely continue to rise and we will search in vain for the appropriate political remedies for our underlying problems.
Weiye Loh

Adventures in Flay-land: James Delingpole and the "Science" of Denialism - 0 views

  • Perhaps like me, you watched the BBC Two Horizons program Monday night presented by Sir Paul Nurse, president of the Royal Society and Nobel Prize winning geneticist for his discovery of the genes of cell division.
  • James. He really believes there's some kind of mainstream science "warmist" conspiracy against the brave outliers who dare to challenge the consensus. He really believes that "climategate" is a real scandal. He fails to understand that it is a common practice in statistics to splice together two or more datasets where you know that the quality of data is patchy. In the case of "climategate", researchers found that indirect temperature measurements based on tree ring widths (the tree ring temperature proxy) is consistent with other proxy methods of recording temperature from before the start of the instrumental temperature record (around 1950) but begins to show a decline in temperature after that for reasons which are unclear. Actual temperature measurements however show the opposite. The researcher at the head of the climategate affair, Phil Jones, created a graph of the temperature record to include on the cover of a report for policy makers and journalists. For this graph he simply spliced together the tree ring proxy data up until 1950 with the recorded data after that using statistical techniques to bring them into agreement. What made this seem particularly dodgy was an email intercepted by a hacker in which Jones referred to this practice as a "Mike's Nature trick", referring to a paper published by his colleague Mike Hulme Michael Mann in the journal Nature. It is however nothing out of the ordinary. Delingpole and others have talked about how this "trick" was used here to "hide the decline" revealed by the other dataset, as though this was some sort of deception. The fact that all parties were found to have behaved ethically is simply further evidence of the global warmist conspiracy. Delingpole takes it further and casts aspersions on scientific consensus and the entire peer review process.
  • When Nurse asked Delingpole the very straightforward question of whether he would be willing to trust a scientific consensus if he required treatment for cancer, he could have said "Gee, that's an interesting question. Let me think about that and why it's different."
  • ...7 more annotations...
  • Instead, he became defensive and lost his focus. Eventually he would make such regrettable statements as this one: "It is not my job to sit down and read peer-reviewed papers because I simply haven’t got the time, I haven’t got the scientific expertise… I am an interpreter of interpretation."
  • In a parallel universe where James Delingpole is not the "penis" that Ben Goldacre describes him to be, he might have said the following: Gee, that's an interesting question. Let me think about why it's different. (Thinks) Well, it seems to me that when evaluating a scientifically agreed treatment for a disease such as cancer, we have not only all the theory to peruse and the randomized and blinded trials, but also thousands if not millions of case studies where people have undergone the intervention. We have enough data to estimate a person's chances of recovery and know that on average they will do better. When discussing climate change, we really only have the one case study. Just the one earth. And it's a patient that has not undergone any intervention. The scientific consensus is therfore entirely theoretical and intangible. This makes it more difficult for the lay person such as myself to trust it.
  • Sir Paul ended the program saying "Scientists have got to get out there… if we do not do that it will be filled by others who don’t understand the science, and who may be driven by politics and ideology."
  • f proxy tracks instrumental from 1850 to 1960 but then diverges for unknown reasons, how do we know that the proxy is valid for reconstructing temperatures in periods prior to 1850?
  • This is a good question and one I'm not sure I can answer it to anyone's satisfaction. We seem to have good agreement among several forms of temperature proxy going back centuries and with direct measurements back to 1880. There is divergence in more recent years and there are several theories as to why that might be. Some possible explanations here:http://www.skepticalscience.com/Tree-ring-proxies-divergence-problem.htm
  • In the physical world we can never be absolutely certain of anything. Rene Des Cartes showed it was impossible to prove that everything he sensed wasn't manipulated by some invisible demon.
  • It is necessary to first make certain assumptions about the universe that we observe. After that, we can only go with the best theories available that allow us to make scientific progress.
1 - 20 of 108 Next › Last »
Showing 20 items per page