Skip to main content

Home/ New Media Ethics 2009 course/ Group items tagged Algorithm

Rss Feed Group items tagged

Weiye Loh

Google's Fight Against 'Low-Quality' Sites Continues - Slashdot - 0 views

  •  
    "A couple weeks ago, JC Penney made the news for plummeting in Google rankings for everything from 'area rugs' to 'grommet top curtains.' Turns out the retail site had a number of suspicious links pointing at it that could be traced back to a link network intended to manipulate Google's ranking algorithms. Now, Overstock.com has lost rankings for another type of link that Google finds to be manipulation of their algorithms. This situation has led Google to implement a significant change to their search algorithms, affecting almost 12% of queries in an effort to cull content farms and other webspam. And in the midst of all of this, a company with substantial publicity lately for running a paid link network announces they are getting out of the link business entirely."
Weiye Loh

Kevin Slavin: How algorithms shape our world | Video on TED.com - 0 views

  •  
    Kevin Slavin argues that we're living in a world designed for -- and increasingly controlled by -- algorithms. In this riveting talk from TEDGlobal, he shows how these complex computer programs determine: espionage tactics, stock prices, movie scripts, and architecture. And he warns that we are writing code we can't understand, with implications we can't control.
Weiye Loh

How the net traps us all in our own little bubbles | Technology | The Observer - 0 views

  • Google would use 57 signals – everything from where you were logging in from to what browser you were using to what you had searched for before – to make guesses about who you were and what kinds of sites you'd like. Even if you were logged out, it would customise its results, showing you the pages it predicted you were most likely to click on.
  • Most of us assume that when we google a term, we all see the same results – the ones that the company's famous Page Rank algorithm suggests are the most authoritative based on other pages' links. But since December 2009, this is no longer true. Now you get the result that Google's algorithm suggests is best for you in particular – and someone else may see something entirely different. In other words, there is no standard Google any more.
  • In the spring of 2010, while the remains of the Deepwater Horizon oil rig were spewing oil into the Gulf of Mexico, I asked two friends to search for the term "BP". They're pretty similar – educated white left-leaning women who live in the north-east. But the results they saw were quite different. One saw investment information about BP. The other saw news.
  • ...7 more annotations...
  • the query "stem cells" might produce diametrically opposed results for scientists who support stem-cell research and activists who oppose it.
  • "Proof of climate change" might turn up different results for an environmental activist and an oil-company executive.
  • majority of us assume search engines are unbiased. But that may be just because they're increasingly biased to share our own views. More and more, your computer monitor is a kind of one-way mirror, reflecting your own interests while algorithmic observers watch what you click. Google's announcement marked the turning point of an important but nearly invisible revolution in how we consume information. You could say that on 4 December 2009 the era of personalisation began.
  • We are predisposed to respond to a pretty narrow set of stimuli – if a piece of news is about sex, power, gossip, violence, celebrity or humour, we are likely to read it first. This is the content that most easily makes it into the filter bubble. It's easy to push "Like" and increase the visibility of a friend's post about finishing a marathon or an instructional article about how to make onion soup. It's harder to push the "Like" button on an article titled "Darfur sees bloodiest month in two years". In a personalised world, important but complex or unpleasant issues – the rising prison population, for example, or homelessness – are less likely to come to our attention at all.
  • As a consumer, it's hard to argue with blotting out the irrelevant and unlikable. But what is good for consumers is not necessarily good for citizens. What I seem to like may not be what I actually want, let alone what I need to know to be an informed member of my community or country. "It's a civic virtue to be exposed to things that appear to be outside your interest," technology journalist Clive Thompson told me. Cultural critic Lee Siegel puts it a different way: "Customers are always right, but people aren't."
  • Personalisation is based on a bargain. In exchange for the service of filtering, you hand large companies an enormous amount of data about your daily life – much of which you might not trust friends with.
  • To be the author of your life, professor Yochai Benkler argues, you have to be aware of a diverse array of options and lifestyles. When you enter a filter bubble, you're letting the companies that construct it choose which options you're aware of. You may think you're the captain of your own destiny, but personalisation can lead you down a road to a kind of informational determinism in which what you've clicked on in the past determines what you see next – a web history you're doomed to repeat. You can get stuck in a static, ever- narrowing version of yourself – an endless you-loop.
  •  
    An invisible revolution has taken place is the way we use the net, but the increasing personalisation of information by search engines such as Google threatens to limit our access to information and enclose us in a self-reinforcing world view, writes Eli Pariser in an extract from The Filter Bubble
Weiye Loh

Search Optimization and Its Dirty Little Secrets - NYTimes.com - 0 views

  • Here’s another hypothesis, this one for the conspiracy-minded. Last year, Advertising Age obtained a Google document that listed some of its largest advertisers, including AT&T, eBay and yes, J. C. Penney. The company, this document said, spent $2.46 million a month on paid Google search ads — the kind you see next to organic results.
  • Is it possible that Google was willing to countenance an extensive black-hat campaign because it helped one of its larger advertisers? It’s the sort of question that European Union officials are now studying in an investigation of possible antitrust abuses by Google.
  • Investigators have been asking advertisers in Europe questions like this: “Please explain whether and, if yes, to what extent your advertising spending with Google has ever had an influence on your ranking in Google’s natural search.” And: “Has Google ever mentioned to you that increasing your advertising spending could improve your ranking in Google’s natural search?”
  • ...5 more annotations...
  • Asked if Penney received any breaks because of the money it has spent on ads, Mr. Cutts said, “I’ll give a categorical denial.” He then made an impassioned case for Google’s commitment to separating the money side of the business from the search side. The former has zero influence on the latter, he said.
  • “There is a very long history at Google of saying ‘We are not going to worry about short-term revenue.’ ” He added: “We rely on the trust of our users. We realize the responsibility that we have to our users.”
  • He noted, too, that before The Times presented evidence of the paid links to JCPenney.com, Google had just begun to roll out an algorithm change that had a negative effect on Penney’s search results. (
  • True, JCPenney.com’s showing in Google searches had declined slightly by Feb. 8, as the algorithm change began to take effect. In “comforter sets,” Penney went from No. 1 to No. 7. In “sweater dresses,” from No. 1 to No. 10. But the real damage to Penney’s results began when Google started that “manual action.” The decline can be charted: On Feb. 1, the average Penney position for 59 search terms was 1.3.
  • MR. CUTTS said he did not plan to write about Penney’s situation, as he did with BMW in 2006. Rarely, he explained, does he single out a company publicly, because Google’s goal is to preserve the integrity of results, not to embarrass people. “But just because we don’t talk about it,” he said, “doesn’t mean we won’t take strong action.”
Weiye Loh

EdgeRank: The Secret Sauce That Makes Facebook's News Feed Tick - 0 views

  • but News Feed only displays a subset of the stories generated by your friends — if it displayed everything, there’s a good chance you’d be overwhelmed. Developers are always trying to make sure their sites and apps are publishing stories that make the cut, which has led to the concept of “News Feed Optimization”, and their success is dictated by EdgeRank.
  • At a high level, the EdgeRank formula is fairly straightforward. But first, some definitions: every item that shows up in your News Feed is considered an Object. If you have an Object in the News Feed (say, a status update), whenever another user interacts with that Object they’re creating what Facebook calls an Edge, which includes actions like tags and comments. Each Edge has three components important to Facebook’s algorithm: First, there’s an affinity score between the viewing user and the item’s creator — if you send your friend a lot of Facebook messages and check their profile often, then you’ll have a higher affinity score for that user than you would, say, an old acquaintance you haven’t spoken to in years. Second, there’s a weight given to each type of Edge. A comment probably has more importance than a Like, for example. And finally there’s the most obvious factor — time. The older an Edge is, the less important it becomes.
  • Multiply these factors for each Edge then add the Edge scores up and you have an Object’s EdgeRank. And the higher that is, the more likely your Object is to appear in the user’s feed. It’s worth pointing out that the act of creating an Object is also considered an Edge, which is what allows Objects to show up in your friends’ feeds before anyone has interacted with them.
  • ...3 more annotations...
  • an Object is more likely to show up in your News Feed if people you know have been interacting with it recently. That really isn’t particularly surprising. Neither is the resulting message to developers: if you want your posts to show up in News Feed, make sure people will actually want to interact with them.
  • Steinberg hinted that a simpler version of News Feed may be on the way, as the current two-tabbed system is a bit complicated. That said, many people still use both tabs, with over 50% of users clicking over to the ‘most recent’ tab on a regular basis.
  • If you want to watch the video for yourself, click here, navigate to the Techniques sessions, and click on ‘Focus on Feed’. The talk about Facebook’s algorithms begins around 22 minutes in.
Weiye Loh

Google's War on Nonsense - NYTimes.com - 0 views

  • As a verbal artifact, farmed content exhibits neither style nor substance.
  • The insultingly vacuous and frankly bizarre prose of the content farms — it seems ripped from Wikipedia and translated from the Romanian — cheapens all online information.
  • These prose-widgets are not hammered out by robots, surprisingly. But they are written by writers who work like robots. As recent accounts of life in these words-are-money mills make clear, some content-farm writers have deadlines as frequently as every 25 minutes. Others are expected to turn around reported pieces, containing interviews with several experts, in an hour. Some compose, edit, format and publish 10 articles in a single shift. Many with decades of experience in journalism work 70-hour weeks for salaries of $40,000 with no vacation time. The content farms have taken journalism hackwork to a whole new level.
  • ...6 more annotations...
  • So who produces all this bulk jive? Business Insider, the business-news site, has provided a forum to a half dozen low-paid content farmers, especially several who work at AOL’s enormous Seed and Patch ventures. They describe exhausting and sometimes exploitative writing conditions. Oliver Miller, a journalist with an MFA in fiction from Sarah Lawrence who once believed he’d write the Great American Novel, told me AOL paid him about $28,000 for writing 300,000 words about television, all based on fragments of shows he’d never seen, filed in half-hour intervals, on a graveyard shift that ran from 11 p.m. to 7 or 8 in the morning.
  • Mr. Miller’s job, as he made clear in an article last week in The Faster Times, an online newspaper, was to cram together words that someone’s research had suggested might be in demand on Google, position these strings as titles and headlines, embellish them with other inoffensive words and make the whole confection vaguely resemble an article. AOL would put “Rick Fox mustache” in a headline, betting that some number of people would put “Rick Fox mustache” into Google, and retrieve Mr. Miller’s article. Readers coming to AOL, expecting information, might discover a subliterate wasteland. But before bouncing out, they might watch a video clip with ads on it. Their visits would also register as page views, which AOL could then sell to advertisers.
  • commodify writing: you pay little or nothing to writers, and make readers pay a lot — in the form of their “eyeballs.” But readers get zero back, no useful content.
  • You can’t mess with Google forever. In February, the corporation concocted what it concocts best: an algorithm. The algorithm, called Panda, affects some 12 percent of searches, and it has — slowly and imperfectly — been improving things. Just a short time ago, the Web seemed ungovernable; bad content was driving out good. But Google asserted itself, and credit is due: Panda represents good cyber-governance. It has allowed Google to send untrustworthy, repetitive and unsatisfying content to the back of the class. No more A’s for cheaters.
  • the goal, according to Amit Singhal and Matt Cutts, who worked on Panda, is to “provide better rankings for high-quality sites — sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.”
  • Google officially rolled out Panda 2.2. Put “Whitey Bulger” into Google, and where you might once have found dozens of content farms, today you get links to useful articles from sites ranging from The Boston Globe, The Los Angeles Times, the F.B.I. and even Mashable, doing original analysis of how federal agents used social media to find Bulger. Last month, Demand Media, once the most notorious of the content farms, announced plans to improve quality by publishing more feature articles by hired writers, and fewer by “users” — code for unpaid freelancers. Amazing. Demand Media is stepping up its game.
  •  
    Content farms, which have flourished on the Web in the past 18 months, are massive news sites that use headlines, keywords and other tricks to lure Web-users into looking at ads. These sites confound and embarrass Google by gaming its ranking system. As a business proposition, they once seemed exciting. Last year, The Economist admiringly described Associated Content and Demand Media as cleverly cynical operations that "aim to produce content at a price so low that even meager advertising revenue can support it."
YongTeck Lee

State 2.0: a new front end? - 3 views

http://www.opendemocracy.net/article/state-2-0-a-new-front-end A paragraph in the article sums up what the article is about quite well: "Indeed, new problematics are emerging around online democr...

digital democracy

started by YongTeck Lee on 15 Sep 09 no follow-up yet
Weiye Loh

Odds Are, It's Wrong - Science News - 0 views

  • science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.
  • a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.
  • science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.
  • ...24 more annotations...
  • Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.
  • “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”
  • In 2007, for instance, researchers combing the medical literature found numerous studies linking a total of 85 genetic variants in 70 different genes to acute coronary syndrome, a cluster of heart problems. When the researchers compared genetic tests of 811 patients that had the syndrome with a group of 650 (matched for sex and age) that didn’t, only one of the suspect gene variants turned up substantially more often in those with the syndrome — a number to be expected by chance.“Our null results provide no support for the hypothesis that any of the 85 genetic variants tested is a susceptibility factor” for the syndrome, the researchers reported in the Journal of the American Medical Association.How could so many studies be wrong? Because their conclusions relied on “statistical significance,” a concept at the heart of the mathematical analysis of modern scientific experiments.
  • Statistical significance is a phrase that every science graduate student learns, but few comprehend. While its origins stretch back at least to the 19th century, the modern notion was pioneered by the mathematician Ronald A. Fisher in the 1920s. His original interest was agriculture. He sought a test of whether variation in crop yields was due to some specific intervention (say, fertilizer) or merely reflected random factors beyond experimental control.Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.Fisher’s P value eventually became the ultimate arbiter of credibility for science results of all sorts
  • But in fact, there’s no logical basis for using a P value from a single study to draw any conclusion. If the chance of a fluke is less than 5 percent, two possible conclusions remain: There is a real effect, or the result is an improbable fluke. Fisher’s method offers no way to know which is which. On the other hand, if a study finds no statistically significant effect, that doesn’t prove anything, either. Perhaps the effect doesn’t exist, or maybe the statistical test wasn’t powerful enough to detect a small but real effect.
  • Soon after Fisher established his system of statistical significance, it was attacked by other mathematicians, notably Egon Pearson and Jerzy Neyman. Rather than testing a null hypothesis, they argued, it made more sense to test competing hypotheses against one another. That approach also produces a P value, which is used to gauge the likelihood of a “false positive” — concluding an effect is real when it actually isn’t. What  eventually emerged was a hybrid mix of the mutually inconsistent Fisher and Neyman-Pearson approaches, which has rendered interpretations of standard statistics muddled at best and simply erroneous at worst. As a result, most scientists are confused about the meaning of a P value or how to interpret it. “It’s almost never, ever, ever stated correctly, what it means,” says Goodman.
  • experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”
  • That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time. (See Box 2)
    • Weiye Loh
       
      Does the problem then, lie not in statistics, but the interpretation of statistics? Is the fallacy of appeal to probability is at work in such interpretation? 
  • Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms. A new drug may be statistically better than an old drug, but for every thousand people you treat you might get just one or two additional cures — not clinically significant. Similarly, when studies claim that a chemical causes a “significantly increased risk of cancer,” they often mean that it is just statistically significant, possibly posing only a tiny absolute increase in risk.
  • Statisticians perpetually caution against mistaking statistical significance for practical importance, but scientific papers commit that error often. Ziliak studied journals from various fields — psychology, medicine and economics among others — and reported frequent disregard for the distinction.
  • “I found that eight or nine of every 10 articles published in the leading journals make the fatal substitution” of equating statistical significance to importance, he said in an interview. Ziliak’s data are documented in the 2008 book The Cult of Statistical Significance, coauthored with Deirdre McCloskey of the University of Illinois at Chicago.
  • Multiplicity of mistakesEven when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly.
  • Recognizing these problems, some researchers now calculate a “false discovery rate” to warn of flukes disguised as real effects. And genetics researchers have begun using “genome-wide association studies” that attempt to ameliorate the multiplicity issue (SN: 6/21/08, p. 20).
  • Many researchers now also commonly report results with confidence intervals, similar to the margins of error reported in opinion polls. Such intervals, usually given as a range that should include the actual value with 95 percent confidence, do convey a better sense of how precise a finding is. But the 95 percent confidence calculation is based on the same math as the .05 P value and so still shares some of its problems.
  • Statistical problems also afflict the “gold standard” for medical research, the randomized, controlled clinical trials that test drugs for their ability to cure or their power to harm. Such trials assign patients at random to receive either the substance being tested or a placebo, typically a sugar pill; random selection supposedly guarantees that patients’ personal characteristics won’t bias the choice of who gets the actual treatment. But in practice, selection biases may still occur, Vance Berger and Sherri Weinstein noted in 2004 in ControlledClinical Trials. “Some of the benefits ascribed to randomization, for example that it eliminates all selection bias, can better be described as fantasy than reality,” they wrote.
  • Randomization also should ensure that unknown differences among individuals are mixed in roughly the same proportions in the groups being tested. But statistics do not guarantee an equal distribution any more than they prohibit 10 heads in a row when flipping a penny. With thousands of clinical trials in progress, some will not be well randomized. And DNA differs at more than a million spots in the human genetic catalog, so even in a single trial differences may not be evenly mixed. In a sufficiently large trial, unrandomized factors may balance out, if some have positive effects and some are negative. (See Box 3) Still, trial results are reported as averages that may obscure individual differences, masking beneficial or harm­ful effects and possibly leading to approval of drugs that are deadly for some and denial of effective treatment to others.
  • nother concern is the common strategy of combining results from many trials into a single “meta-analysis,” a study of studies. In a single trial with relatively few participants, statistical tests may not detect small but real and possibly important effects. In principle, combining smaller studies to create a larger sample would allow the tests to detect such small effects. But statistical techniques for doing so are valid only if certain criteria are met. For one thing, all the studies conducted on the drug must be included — published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,” he says.
  • Meta-analyses have produced many controversial conclusions. Common claims that antidepressants work no better than placebos, for example, are based on meta-analyses that do not conform to the criteria that would confer validity. Similar problems afflicted a 2007 meta-analysis, published in the New England Journal of Medicine, that attributed increased heart attack risk to the diabetes drug Avandia. Raw data from the combined trials showed that only 55 people in 10,000 had heart attacks when using Avandia, compared with 59 people per 10,000 in comparison groups. But after a series of statistical manipulations, Avandia appeared to confer an increased risk.
  • combining small studies in a meta-analysis is not a good substitute for a single trial sufficiently large to test a given question. “Meta-analyses can reduce the role of chance in the interpretation but may introduce bias and confounding,” Hennekens and DeMets write in the Dec. 2 Journal of the American Medical Association. “Such results should be considered more as hypothesis formulating than as hypothesis testing.”
  • Some studies show dramatic effects that don’t require sophisticated statistics to interpret. If the P value is 0.0001 — a hundredth of a percent chance of a fluke — that is strong evidence, Goodman points out. Besides, most well-accepted science is based not on any single study, but on studies that have been confirmed by repetition. Any one result may be likely to be wrong, but confidence rises quickly if that result is independently replicated.“Replication is vital,” says statistician Juliet Shaffer, a lecturer emeritus at the University of California, Berkeley. And in medicine, she says, the need for replication is widely recognized. “But in the social sciences and behavioral sciences, replication is not common,” she noted in San Diego in February at the annual meeting of the American Association for the Advancement of Science. “This is a sad situation.”
  • Most critics of standard statistics advocate the Bayesian approach to statistical reasoning, a methodology that derives from a theorem credited to Bayes, an 18th century English clergyman. His approach uses similar math, but requires the added twist of a “prior probability” — in essence, an informed guess about the expected probability of something in advance of the study. Often this prior probability is more than a mere guess — it could be based, for instance, on previous studies.
  • it basically just reflects the need to include previous knowledge when drawing conclusions from new observations. To infer the odds that a barking dog is hungry, for instance, it is not enough to know how often the dog barks when well-fed. You also need to know how often it eats — in order to calculate the prior probability of being hungry. Bayesian math combines a prior probability with observed data to produce an estimate of the likelihood of the hunger hypothesis. “A scientific hypothesis cannot be properly assessed solely by reference to the observational data,” but only by viewing the data in light of prior belief in the hypothesis, wrote George Diamond and Sanjay Kaul of UCLA’s School of Medicine in 2004 in the Journal of the American College of Cardiology. “Bayes’ theorem is ... a logically consistent, mathematically valid, and intuitive way to draw inferences about the hypothesis.” (See Box 4)
  • In many real-life contexts, Bayesian methods do produce the best answers to important questions. In medical diagnoses, for instance, the likelihood that a test for a disease is correct depends on the prevalence of the disease in the population, a factor that Bayesian math would take into account.
  • But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics. “Subjective prior beliefs are anathema to the frequentist, who relies instead on a series of ad hoc algorithms that maintain the facade of scientific objectivity,” Diamond and Kaul wrote.Conflict between frequentists and Bayesians has been ongoing for two centuries. So science’s marriage to mathematics seems to entail some irreconcilable differences. Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.“What does probability mean in real life?” the statistician David Salsburg asked in his 2001 book The Lady Tasting Tea. “This problem is still unsolved, and ... if it remains un­solved, the whole of the statistical approach to science may come crashing down from the weight of its own inconsistencies.”
  •  
    Odds Are, It's Wrong Science fails to face the shortcomings of statistics
Weiye Loh

Search Optimization and Its Dirty Little Secrets - NYTimes.com - 0 views

  • in the last several months, one name turned up, with uncanny regularity, in the No. 1 spot for each and every term: J. C. Penney. The company bested millions of sites — and not just in searches for dresses, bedding and area rugs. For months, it was consistently at or near the top in searches for “skinny jeans,” “home decor,” “comforter sets,” “furniture” and dozens of other words and phrases, from the blandly generic (“tablecloths”) to the strangely specific (“grommet top curtains”).
  • J. C. Penney even beat out the sites of manufacturers in searches for the products of those manufacturers. Type in “Samsonite carry on luggage,” for instance, and Penney for months was first on the list, ahead of Samsonite.com.
  • the digital age’s most mundane act, the Google search, often represents layer upon layer of intrigue. And the intrigue starts in the sprawling, subterranean world of “black hat” optimization, the dark art of raising the profile of a Web site with methods that Google considers tantamount to cheating.
  • ...8 more annotations...
  • Despite the cowboy outlaw connotations, black-hat services are not illegal, but trafficking in them risks the wrath of Google. The company draws a pretty thick line between techniques it considers deceptive and “white hat” approaches, which are offered by hundreds of consulting firms and are legitimate ways to increase a site’s visibility. Penney’s results were derived from methods on the wrong side of that line, says Mr. Pierce. He described the optimization as the most ambitious attempt to game Google’s search results that he has ever seen.
  • TO understand the strategy that kept J. C. Penney in the pole position for so many searches, you need to know how Web sites rise to the top of Google’s results. We’re talking, to be clear, about the “organic” results — in other words, the ones that are not paid advertisements. In deriving organic results, Google’s algorithm takes into account dozens of criteria, many of which the company will not discuss.
  • But it has described one crucial factor in detail: links from one site to another. If you own a Web site, for instance, about Chinese cooking, your site’s Google ranking will improve as other sites link to it. The more links to your site, especially those from other Chinese cooking-related sites, the higher your ranking. In a way, what Google is measuring is your site’s popularity by polling the best-informed online fans of Chinese cooking and counting their links to your site as votes of approval.
  • But even links that have nothing to do with Chinese cooking can bolster your profile if your site is barnacled with enough of them. And here’s where the strategy that aided Penney comes in. Someone paid to have thousands of links placed on hundreds of sites scattered around the Web, all of which lead directly to JCPenney.com.
  • Who is that someone? A spokeswoman for J. C. Penney, Darcie Brossart, says it was not Penney.
  • “J. C. Penney did not authorize, and we were not involved with or aware of, the posting of the links that you sent to us, as it is against our natural search policies,” Ms. Brossart wrote in an e-mail. She added, “We are working to have the links taken down.”
  • Using an online tool called Open Site Explorer, Mr. Pierce found 2,015 pages with phrases like “casual dresses,” “evening dresses,” “little black dress” or “cocktail dress.” Click on any of these phrases on any of these 2,015 pages, and you are bounced directly to the main page for dresses on JCPenney.com.
  • Some of the 2,015 pages are on sites related, at least nominally, to clothing. But most are not. The phrase “black dresses” and a Penney link were tacked to the bottom of a site called nuclear.engineeringaddict.com. “Evening dresses” appeared on a site called casino-focus.com. “Cocktail dresses” showed up on bulgariapropertyportal.com. ”Casual dresses” was on a site called elistofbanks.com. “Semi-formal dresses” was pasted, rather incongruously, on usclettermen.org.
Weiye Loh

Can a group of scientists in California end the war on climate change? | Science | The ... - 0 views

  • Muller calls his latest obsession the Berkeley Earth project. The aim is so simple that the complexity and magnitude of the undertaking is easy to miss. Starting from scratch, with new computer tools and more data than has ever been used, they will arrive at an independent assessment of global warming. The team will also make every piece of data it uses – 1.6bn data points – freely available on a website. It will post its workings alongside, including full information on how more than 100 years of data from thousands of instruments around the world are stitched together to give a historic record of the planet's temperature.
  • Muller is fed up with the politicised row that all too often engulfs climate science. By laying all its data and workings out in the open, where they can be checked and challenged by anyone, the Berkeley team hopes to achieve something remarkable: a broader consensus on global warming. In no other field would Muller's dream seem so ambitious, or perhaps, so naive.
  • "We are bringing the spirit of science back to a subject that has become too argumentative and too contentious," Muller says, over a cup of tea. "We are an independent, non-political, non-partisan group. We will gather the data, do the analysis, present the results and make all of it available. There will be no spin, whatever we find." Why does Muller feel compelled to shake up the world of climate change? "We are doing this because it is the most important project in the world today. Nothing else comes close," he says.
  • ...20 more annotations...
  • There are already three heavyweight groups that could be considered the official keepers of the world's climate data. Each publishes its own figures that feed into the UN's Intergovernmental Panel on Climate Change. Nasa's Goddard Institute for Space Studies in New York City produces a rolling estimate of the world's warming. A separate assessment comes from another US agency, the National Oceanic and Atmospheric Administration (Noaa). The third group is based in the UK and led by the Met Office. They all take readings from instruments around the world to come up with a rolling record of the Earth's mean surface temperature. The numbers differ because each group uses its own dataset and does its own analysis, but they show a similar trend. Since pre-industrial times, all point to a warming of around 0.75C.
  • You might think three groups was enough, but Muller rolls out a list of shortcomings, some real, some perceived, that he suspects might undermine public confidence in global warming records. For a start, he says, warming trends are not based on all the available temperature records. The data that is used is filtered and might not be as representative as it could be. He also cites a poor history of transparency in climate science, though others argue many climate records and the tools to analyse them have been public for years.
  • Then there is the fiasco of 2009 that saw roughly 1,000 emails from a server at the University of East Anglia's Climatic Research Unit (CRU) find their way on to the internet. The fuss over the messages, inevitably dubbed Climategate, gave Muller's nascent project added impetus. Climate sceptics had already attacked James Hansen, head of the Nasa group, for making political statements on climate change while maintaining his role as an objective scientist. The Climategate emails fuelled their protests. "With CRU's credibility undergoing a severe test, it was all the more important to have a new team jump in, do the analysis fresh and address all of the legitimate issues raised by sceptics," says Muller.
  • This latest point is where Muller faces his most delicate challenge. To concede that climate sceptics raise fair criticisms means acknowledging that scientists and government agencies have got things wrong, or at least could do better. But the debate around global warming is so highly charged that open discussion, which science requires, can be difficult to hold in public. At worst, criticising poor climate science can be taken as an attack on science itself, a knee-jerk reaction that has unhealthy consequences. "Scientists will jump to the defence of alarmists because they don't recognise that the alarmists are exaggerating," Muller says.
  • The Berkeley Earth project came together more than a year ago, when Muller rang David Brillinger, a statistics professor at Berkeley and the man Nasa called when it wanted someone to check its risk estimates of space debris smashing into the International Space Station. He wanted Brillinger to oversee every stage of the project. Brillinger accepted straight away. Since the first meeting he has advised the scientists on how best to analyse their data and what pitfalls to avoid. "You can think of statisticians as the keepers of the scientific method, " Brillinger told me. "Can scientists and doctors reasonably draw the conclusions they are setting down? That's what we're here for."
  • For the rest of the team, Muller says he picked scientists known for original thinking. One is Saul Perlmutter, the Berkeley physicist who found evidence that the universe is expanding at an ever faster rate, courtesy of mysterious "dark energy" that pushes against gravity. Another is Art Rosenfeld, the last student of the legendary Manhattan Project physicist Enrico Fermi, and something of a legend himself in energy research. Then there is Robert Jacobsen, a Berkeley physicist who is an expert on giant datasets; and Judith Curry, a climatologist at Georgia Institute of Technology, who has raised concerns over tribalism and hubris in climate science.
  • Robert Rohde, a young physicist who left Berkeley with a PhD last year, does most of the hard work. He has written software that trawls public databases, themselves the product of years of painstaking work, for global temperature records. These are compiled, de-duplicated and merged into one huge historical temperature record. The data, by all accounts, are a mess. There are 16 separate datasets in 14 different formats and they overlap, but not completely. Muller likens Rohde's achievement to Hercules's enormous task of cleaning the Augean stables.
  • The wealth of data Rohde has collected so far – and some dates back to the 1700s – makes for what Muller believes is the most complete historical record of land temperatures ever compiled. It will, of itself, Muller claims, be a priceless resource for anyone who wishes to study climate change. So far, Rohde has gathered records from 39,340 individual stations worldwide.
  • Publishing an extensive set of temperature records is the first goal of Muller's project. The second is to turn this vast haul of data into an assessment on global warming.
  • The big three groups – Nasa, Noaa and the Met Office – work out global warming trends by placing an imaginary grid over the planet and averaging temperatures records in each square. So for a given month, all the records in England and Wales might be averaged out to give one number. Muller's team will take temperature records from individual stations and weight them according to how reliable they are.
  • This is where the Berkeley group faces its toughest task by far and it will be judged on how well it deals with it. There are errors running through global warming data that arise from the simple fact that the global network of temperature stations was never designed or maintained to monitor climate change. The network grew in a piecemeal fashion, starting with temperature stations installed here and there, usually to record local weather.
  • Among the trickiest errors to deal with are so-called systematic biases, which skew temperature measurements in fiendishly complex ways. Stations get moved around, replaced with newer models, or swapped for instruments that record in celsius instead of fahrenheit. The times measurements are taken varies, from say 6am to 9pm. The accuracy of individual stations drift over time and even changes in the surroundings, such as growing trees, can shield a station more from wind and sun one year to the next. Each of these interferes with a station's temperature measurements, perhaps making it read too cold, or too hot. And these errors combine and build up.
  • This is the real mess that will take a Herculean effort to clean up. The Berkeley Earth team is using algorithms that automatically correct for some of the errors, a strategy Muller favours because it doesn't rely on human interference. When the team publishes its results, this is where the scrutiny will be most intense.
  • Despite the scale of the task, and the fact that world-class scientific organisations have been wrestling with it for decades, Muller is convinced his approach will lead to a better assessment of how much the world is warming. "I've told the team I don't know if global warming is more or less than we hear, but I do believe we can get a more precise number, and we can do it in a way that will cool the arguments over climate change, if nothing else," says Muller. "Science has its weaknesses and it doesn't have a stranglehold on the truth, but it has a way of approaching technical issues that is a closer approximation of truth than any other method we have."
  • It might not be a good sign that one prominent climate sceptic contacted by the Guardian, Canadian economist Ross McKitrick, had never heard of the project. Another, Stephen McIntyre, whom Muller has defended on some issues, hasn't followed the project either, but said "anything that [Muller] does will be well done". Phil Jones at the University of East Anglia was unclear on the details of the Berkeley project and didn't comment.
  • Elsewhere, Muller has qualified support from some of the biggest names in the business. At Nasa, Hansen welcomed the project, but warned against over-emphasising what he expects to be the minor differences between Berkeley's global warming assessment and those from the other groups. "We have enough trouble communicating with the public already," Hansen says. At the Met Office, Peter Stott, head of climate monitoring and attribution, was in favour of the project if it was open and peer-reviewed.
  • Peter Thorne, who left the Met Office's Hadley Centre last year to join the Co-operative Institute for Climate and Satellites in North Carolina, is enthusiastic about the Berkeley project but raises an eyebrow at some of Muller's claims. The Berkeley group will not be the first to put its data and tools online, he says. Teams at Nasa and Noaa have been doing this for many years. And while Muller may have more data, they add little real value, Thorne says. Most are records from stations installed from the 1950s onwards, and then only in a few regions, such as North America. "Do you really need 20 stations in one region to get a monthly temperature figure? The answer is no. Supersaturating your coverage doesn't give you much more bang for your buck," he says. They will, however, help researchers spot short-term regional variations in climate change, something that is likely to be valuable as climate change takes hold.
  • Despite his reservations, Thorne says climate science stands to benefit from Muller's project. "We need groups like Berkeley stepping up to the plate and taking this challenge on, because it's the only way we're going to move forwards. I wish there were 10 other groups doing this," he says.
  • Muller's project is organised under the auspices of Novim, a Santa Barbara-based non-profit organisation that uses science to find answers to the most pressing issues facing society and to publish them "without advocacy or agenda". Funding has come from a variety of places, including the Fund for Innovative Climate and Energy Research (funded by Bill Gates), and the Department of Energy's Lawrence Berkeley Lab. One donor has had some climate bloggers up in arms: the man behind the Charles G Koch Charitable Foundation owns, with his brother David, Koch Industries, a company Greenpeace called a "kingpin of climate science denial". On this point, Muller says the project has taken money from right and left alike.
  • No one who spoke to the Guardian about the Berkeley Earth project believed it would shake the faith of the minority who have set their minds against global warming. "As new kids on the block, I think they will be given a favourable view by people, but I don't think it will fundamentally change people's minds," says Thorne. Brillinger has reservations too. "There are people you are never going to change. They have their beliefs and they're not going to back away from them."
Weiye Loh

Why a hyper-personalized Web is bad for you - Internet - Insight - ZDNet Asia - 0 views

  • Invisibly but quickly, the Internet is changing. Sites like Google and Facebook show you what they think you want to see, based on data they've collected about you.
  • The filter bubble is the invisible, personal universe of information that results--a bubble you live in, and you don't even know it. And it means that the world you see online and the world I see may be very different.
  • As consumers, we can vary our information pathways more and use things like incognito browsing to stop some of the tracking that leads to personalization.
  • ...6 more annotations...
  • it's in these companies' hands to do this ethically--to build algorithms that show us what we need to know and what we don't know, not just what we like.
  • why would the Googles and Facebooks of the world change what they're doing (absent government regulation)? My hope is that, like newspapers, they'll move from a pure profit-making posture to one that recognizes that they're keepers of the public trust.
  • most people don't know how Google and Facebook are controlling their information flows. And once they do, most people I've met want to have more control and transparency than these companies currently offer. So it's a way in to that conversation. First people have to know how the Internet is being edited for them.
  • what's good and bad about the personalization. Tell me some ways that this is not a good thing? Here's a few. 1) It's a distorted view of the world. Hearing your own views and ideas reflected back is comfortable, but it can lead to really bad decisions--you need to see the whole picture to make good decisions; 2) It can limit creativity and innovation, which often come about when two relatively unrelated concepts or ideas are juxtaposed; and 3) It's not great for democracy, because democracy requires a common sense of the big problems that face us and an ability to put ourselves in other peoples' shoes.
  • Stanford researchers Dean Eckles and Maurits Kapstein, who figured out that not only do people have personal tastes, they have personal "persuasion profiles". So I might respond more to appeals to authority (Barack Obama says buy this book), and you might respond more to scarcity ("only 2 left!"). In theory, if a site like Amazon could identify your persuasion profile, it could sell it to other sites--so that everywhere you go, people are using your psychological weak spots to get you to do stuff. I also really enjoyed talking to the guys behind OKCupid, who take the logic of Google and apply it to dating.
  • Nobody noticed when Google went all-in on personalization, because the filtering is very hard to see.
1 - 11 of 11
Showing 20 items per page