Skip to main content

Home/ New Media Ethics 2009 course/ Group items tagged Data

Rss Feed Group items tagged

Weiye Loh

Open data, democracy and public sector reform - 0 views

  •  
    Governments are increasingly making their data available online in standard formats and under licenses that permit the free re-use of data. The justifications advanced for this include claims regarding the economic potential of open government data (OGD), the potential for OGD to promote transparency and accountability of government and the role of OGD in supporting the reform and reshaping of public services. This paper takes a pragmatic mixed-methods approach to exploring uses of data from the UK national open government data portal, data.gov.uk, and identifies how the emerging practices of OGD use are developing. It sets out five 'processes' of data use, and describes a series of embedded cases of education OGD use, and use of public-spending OGD. Drawing upon quantitative and qualitative data it presents an outline account of the motivations driving different individuals to engage with open government data, and it identifies a range of connections between open government data use of processes of civic change. It argues that a "data for developers" narrative that assumes OGD use will primarily be mediated by technology developers is misplaced, and that whilst innovation-based routes to OGD-driven public sector reform are evident, the relationship between OGD and democracy is less clear. As strategic research it highlights a number of emerging policy issues for developing OGD provision and use, and makes a contribution towards theoretical understandings of OGD use in practice.
Weiye Loh

A Data Divide? Data "Haves" and "Have Nots" and Open (Government) Data « Gurs... - 0 views

  • Researchers have extensively explored the range of social, economic, geographical and other barriers which underlie and to a considerable degree “explain” (cause) the Digital Divide.  My own contribution has been to argue that “access is not enough”, it is whether opportunities and pre-conditions are in place for the “effective use” of the technology particularly for those at the grassroots.
  • The idea of a possible parallel “Data Divide” between those who have access and the opportunity to make effective use of data and particularly “open data” and those who do not, began to occur to me.  I was attending several planning/recruitment events for the Open Data “movement” here in Vancouver and the socio-demographics and some of the underlying political assumptions seemed to be somewhat at odds with the expressed advocacy position of “data for all”.
  • Thus the “open data” which was being argued for would not likely be accessible and usable to the groups and individuals with which Community Informatics has largely been concerned – the grassroots, the poor and marginalized, indigenous people, rural people and slum dwellers in Less Developed countries. It was/is hard to see, given the explanations, provided to date how these folks could use this data in any effective way to help them in responding to the opportunities for advance and social betterment which open data advocates have been indicating as the outcome of their efforts.
  • ...5 more annotations...
  • many involved in “open data” saw their interests and activities being confined to making data ‘legally” and “technically” accessible — what happened to it after that was somebody else’s responsibility.
  • while the Digital Divide deals with, for the most part “infrastructure” issues, the Data Divide is concerned with “content” issues.
  • where a Digital Divide might exist for example, as a result of geographical or policy considerations and thus have uniform effects on all those on the wrong side of the “divide” whatever their socio-demographic situation; a Data Divide and particularly one of the most significant current components of the Open Data movement i.e. OGD, would have particularly damaging negative effects and result in particularly significant lost opportunities for the most vulnerable groups and individuals in society and globally. (I’ve discussed some examples here at length in a previous blogpost.)
  • Data Divide thus would be the gap between those who have access to and are able to use Open (Government) Data and those who are not so enabled.
  • 1. infrastructure—being on the wrong side of the “Digital Divide” and thus not having access to the basic infrastructure supporting the availability of OGD. 2. devices—OGD that is not universally accessible and device independent (that only runs on I-Phones for example) 3. software—“accessible” OGD that requires specialized technical software/training to become “usable” 4. content—OGD not designed for use by those with handicaps, non-English speakers, those with low levels of functional literacy for example 5.  interpretation/sense-making—OGD that is only accessible for use through a technical intermediary and/or is useful only if “interpreted” by a professional intermediary 6. advocacy—whether the OGD is in a form and context that is supportive for use in advocacy (or other purposes) on behalf of marginalized and other groups and individuals 7. governance—whether the OGD process includes representation from the broad public in its overall policy development and governance (not just lawyers, techies and public servants).
Weiye Loh

Data.gov.sg - 0 views

  •  
    What is data.gov.sg? data.gov.sg is the first-stop portal to search and access publicly-available data published by the Singapore Government. Launched in June 2011, data.gov.sg brings together over 5000 datasets from 50 government ministries and agencies. The aims of the portal are to : Provide convenient access to publicly-available data published by the government Create value by catalysing application development Facilitate analysis and research Besides government data and metadata, data.gov.sg also offers a listing of applications developed using government data, as well as a resource page for developers. data.gov.sg was initiated by the Ministry of Finance along with the Infocomm Development Authority of Singapore . Key partners for this initiative are the Singapore Land Authority and the Singapore Department of Statistics.
Weiye Loh

Executive Insight | Think Quarterly - 0 views

  • it’s all about making the data work. “I triangulate an objective assessment of the new technologies coming in, a subjective assessment of the public’s reaction to new propositions, and then I take a punt.” This ‘triangulation’ is the combination of hardheaded data analysis, coupled with business nous. Data is something that informs his hunches – but never rules them.
  • As situations unfold in real time in Egypt or Bahrain, we can see how that affects the network, too.” Even a bill being sent by email triggers a whole chain of data events: customer gets bill, most open it; some have a query and call the centre. Forty thousand bills go out an hour but if the centre gets hit with too many queries, billings are dialled down to reduce calls in. It’s about fighting the data overload.
  • we are truly overloaded by data. Governments around the world are unleashing a deluge of numbers on their citizens. That has huge implications for big businesses with lucrative government contracts. In the UK, the government recently published every item of public spending over £25,000. Search the database for ‘Vodafone’ and you get 2,448 individual transactions covering millions of pounds. Information that companies once believed was commercially confidential is now routinely published – or leaked to websites like Wikileaks.
  • ...5 more annotations...
  • “Companies will become more transparent as a necessity – customers now see that as an essential part of the trust equation.” The bigger impact may come from the technology that is making access to this data a mobile phenomenon. “This industry is de-linking access to data from physical location,” he says. In a world where shoppers can check out the competition’s prices while they’re in your store, keeping control of data is no longer an option.
  • for now, managing the information out there is the priority. Access to information was once the big problem
  • Then it quickly flipped, through technology, to data overload. “We were brought up to believe more data was good, and that’s no longer true,” he argues.
  • Laurence refuses to read reports from his product managers with more than five of the vital key performance indicators on them. “The amount of data is obscene. The managers that are going to be successful are going to be the ones who are prepared to take a knife to the amount of data… Otherwise, it’s like a virus.
  • Data plus hunch equals a powerful combination. Or, as Laurence concludes: “Data on its own is impotent.”
  •  
    "We were brought up to believe more data was good, and that's no longer true"
Weiye Loh

The Ashtray: The Ultimatum (Part 1) - NYTimes.com - 0 views

  • “Under no circumstances are you to go to those lectures. Do you hear me?” Kuhn, the head of the Program in the History and Philosophy of Science at Princeton where I was a graduate student, had issued an ultimatum. It concerned the philosopher Saul Kripke’s lectures — later to be called “Naming and Necessity” — which he had originally given at Princeton in 1970 and planned to give again in the Fall, 1972.
  • Whiggishness — in history of science, the tendency to evaluate and interpret past scientific theories not on their own terms, but in the context of current knowledge. The term comes from Herbert Butterfield’s “The Whig Interpretation of History,” written when Butterfield, a future Regius professor of history at Cambridge, was only 31 years old. Butterfield had complained about Whiggishness, describing it as “…the study of the past with direct and perpetual reference to the present” – the tendency to see all history as progressive, and in an extreme form, as an inexorable march to greater liberty and enlightenment. [3] For Butterfield, on the other hand, “…real historical understanding” can be achieved only by “attempting to see life with the eyes of another century than our own.” [4][5].
  • Kuhn had attacked my Whiggish use of the term “displacement current.” [6] I had failed, in his view, to put myself in the mindset of Maxwell’s first attempts at creating a theory of electricity and magnetism. I felt that Kuhn had misinterpreted my paper, and that he — not me — had provided a Whiggish interpretation of Maxwell. I said, “You refuse to look through my telescope.” And he said, “It’s not a telescope, Errol. It’s a kaleidoscope.” (In this respect, he was probably right.) [7].
  • ...9 more annotations...
  • I asked him, “If paradigms are really incommensurable, how is history of science possible? Wouldn’t we be merely interpreting the past in the light of the present? Wouldn’t the past be inaccessible to us? Wouldn’t it be ‘incommensurable?’ ” [8] ¶He started moaning. He put his head in his hands and was muttering, “He’s trying to kill me. He’s trying to kill me.” ¶And then I added, “…except for someone who imagines himself to be God.” ¶It was at this point that Kuhn threw the ashtray at me.
  • I call Kuhn’s reply “The Ashtray Argument.” If someone says something you don’t like, you throw something at him. Preferably something large, heavy, and with sharp edges. Perhaps we were engaged in a debate on the nature of language, meaning and truth. But maybe we just wanted to kill each other.
  • That's the problem with relativism: Who's to say who's right and who's wrong? Somehow I'm not surprised to hear Kuhn was an ashtray-hurler. In the end, what other argument could he make?
  • For us to have a conversation and come to an agreement about the meaning of some word without having to refer to some outside authority like a dictionary, we would of necessity have to be satisfied that our agreement was genuine and not just a polite acknowledgement of each others' right to their opinion, can you agree with that? If so, then let's see if we can agree on the meaning of the word 'know' because that may be the crux of the matter. When I use the word 'know' I mean more than the capacity to apprehend some aspect of the world through language or some other represenational symbolism. Included in the word 'know' is the direct sensorial perception of some aspect of the world. For example, I sense the floor that my feet are now resting upon. I 'know' the floor is really there, I can sense it. Perhaps I don't 'know' what the floor is made of, who put it there, and other incidental facts one could know through the usual symbolism such as language as in a story someone tells me. Nevertheless, the reality I need to 'know' is that the floor, or whatever you may wish to call the solid - relative to my body - flat and level surface supported by more structure then the earth, is really there and reliably capable of supporting me. This is true and useful knowledge that goes directly from the floor itself to my knowing about it - via sensation - that has nothing to do with my interpretive system.
  • Now I am interested in 'knowing' my feet in the same way that my feet and the whole body they are connected to 'know' the floor. I sense my feet sensing the floor. My feet are as real as the floor and I know they are there, sensing the floor because I can sense them. Furthermore, now I 'know' that it is 'I' sensing my feet, sensing the floor. Do you see where I am going with this line of thought? I am including in the word 'know' more meaning than it is commonly given by everyday language. Perhaps it sounds as if I want to expand on the Cartesian formula of cogito ergo sum, and in truth I prefer to say I sense therefore I am. It is my sensations of the world first and foremost that my awareness, such as it is, is actively engaged with reality. Now, any healthy normal animal senses the world but we can't 'know' if they experience reality as we do since we can't have a conversation with them to arrive at agreement. But we humans can have this conversation and possibly agree that we can 'know' the world through sensation. We can even know what is 'I' through sensation. In fact, there is no other way to know 'I' except through sensation. Thought is symbolic representation, not direct sensing, so even though the thoughtful modality of regarding the world may be a far more reliable modality than sensation in predicting what might happen next, its very capacity for such accurate prediction is its biggest weakness, which is its capacity for error
  • Sensation cannot be 'wrong' unless it is used to predict outcomes. Thought can be wrong for both predicting outcomes and for 'knowing' reality. Sensation alone can 'know' reality even though it is relatively unreliable, useless even, for making predictions.
  • If we prioritize our interests by placing predictability over pure knowing through sensation, then of course we will not value the 'knowledge' to be gained through sensation. But if we can switch the priorities - out of sheer curiosity perhaps - then we can enter a realm of knowledge through sensation that is unbelievably spectacular. Our bodies are 'made of' reality, and by methodically exercising our nascent capacity for self sensing, we can connect our knowing 'I' to reality directly. We will not be able to 'know' what it is that we are experiencing in the way we might wish, which is to be able to predict what will happen next or to represent to ourselves symbolically what we might experience when we turn our attention to that sensation. But we can arrive at a depth and breadth of 'knowing' that is utterly unprecedented in our lives by operating that modality.
  • One of the impressions that comes from a sustained practice of self sensing is a clearer feeling for what "I" is and why we have a word for that self referential phenomenon, seemingly located somewhere behind our eyes and between our ears. The thing we call "I" or "me" depending on the context, turns out to be a moving point, a convergence vector for a variety of images, feelings and sensations. It is a reference point into which certain impressions flow and out of which certain impulses to act diverge and which may or may not animate certain muscle groups into action. Following this tricky exercize in attention and sensation, we can quickly see for ourselves that attention is more like a focused beam and awareness is more like a diffuse cloud, but both are composed of energy, and like all energy they vibrate, they oscillate with a certain frequency. That's it for now.
  • I loved the writer's efforts to find a fixed definition of “Incommensurability;” there was of course never a concrete meaning behind the word. Smoke and mirrors.
Weiye Loh

Measuring Social Media: Who Has Access to the Firehose? - 0 views

  • The question that the audience member asked — and one that we tried to touch on a bit in the panel itself — was who has access to this raw data. Twitter doesn’t comment on who has full access to its firehose, but to Weil’s credit he was at least forthcoming with some of the names, including stalwarts like Microsoft, Google and Yahoo — plus a number of smaller companies.
  • In the case of Twitter, the company offers free access to its API for developers. The API can provide access and insight into information about tweets, replies and keyword searches, but as developers who work with Twitter — or any large scale social network — know, that data isn’t always 100% reliable. Unreliable data is a problem when talking about measurements and analytics, where the data is helping to influence decisions related to social media marketing strategies and allocations of resources.
  • One of the companies that has access to Twitter’s data firehose is Gnip. As we discussed in November, Twitter has entered into a partnership with Gnip that allows the social data provider to resell access to the Twitter firehose.This is great on one level, because it means that businesses and services can access the data. The problem, as noted by panelist Raj Kadam, the CEO of Viralheat, is that Gnip’s access can be prohibitively expensive.
  • ...3 more annotations...
  • The problems with reliable access to analytics and measurement information is by no means limited to Twitter. Facebook data is also tightly controlled. With Facebook, privacy controls built into the API are designed to prevent mass data scraping. This is absolutely the right decision. However, a reality of social media measurement is that Facebook Insights isn’t always reachable and the data collected from the tool is sometimes inaccurate.It’s no surprise there’s a disconnect between the data that marketers and community managers want and the data that can be reliably accessed. Twitter and Facebook were both designed as tools for consumers. It’s only been in the last two years that the platform ecosystem aimed at serving large brands and companies
  • The data that companies like Twitter, Facebook and Foursquare collect are some of their most valuable assets. It isn’t fair to expect a free ride or first-class access to the data by anyone who wants it.Having said that, more transparency about what data is available to services and brands is needed and necessary.We’re just scraping the service of what social media monitoring, measurement and management tools can do. To get to the next level, it’s important that we all question who has access to the firehose.
  • We Need More Transparency for How to Access and Connect with Data
Weiye Loh

Oxford academic wins right to read UEA climate data | Environment | guardian.co.uk - 0 views

  • Jonathan Jones, physics professor at Oxford University and self-confessed "climate change agnostic", used freedom of information law to demand the data that is the life's work of the head of the University of East Anglia's Climatic Research Unit, Phil Jones. UEA resisted the requests to disclose the data, but this week it was compelled to do so.
  • Graham gave the UEA one month to deliver the data, which includes more than 4m individual thermometer readings taken from 4,000 weather stations over the past 160 years. The commissioner's office said this was his first ruling on demands for climate data made in the wake of the climategate affair.
  • an archive of world temperature records collected jointly with the Met Office.
  • ...3 more annotations...
  • Critics of the UEA's scientists say an independent analysis of the temperature data may reveal that Phil Jones and his colleagues have misinterpreted the evidence of global warming. They may have failed to allow for local temperature influences, such as the growth of cities close to many of the thermometers.
  • when Jonathan Jones and others asked for the data in the summer of 2009, the UEA said legal exemptions applied. It said variously that the temperature data were the property of foreign meteorological offices; were intellectual property that might be valuable if sold to other researchers; and were in any case often publicly available.
  • Jonathan Jones said this week that he took up the cause of data freedom after Steve McIntyre, a Canadian mathematician, had requests for the data turned down. He thought this was an unreasonable response when Phil Jones had already shared the data with academic collaborators, including Prof Peter Webster of the Georgia Institute of Technology in the US. He asked to be given the data already sent to Webster, and was also turned down.
  •  
    An Oxford academic has won the right to read previously secret data on climate change held by the University of East Anglia (UEA). The decision, by the government's information commissioner, Christopher Graham, is being hailed as a landmark ruling that will mean that thousands of British researchers are required to share their data with the public.
Weiye Loh

The Decline Effect and the Scientific Method : The New Yorker - 0 views

  • On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties.
  • the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.
  • Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
  • ...30 more annotations...
  • But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
  • In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”
  • At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”
  • the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.
  • The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time!
  • this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”
  • In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
  • the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.
  • the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
  • Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
  • While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts
  • an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
  • The funnel graph visually captures the distortions of selective reporting. For instance, after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results.
  • Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
  • Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
  • One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
  • John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.”
  • In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
  • The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds of studies on the various genes that control the differences in disease risk between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
  • the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
  • The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
  • scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,” he says. The current “obsession” with replicability distracts from the real problem, which is faulty design. He notes that nobody even tries to replicate most science papers—there are simply too many. (According to Nature, a third of all studies never even get cited, let alone repeated.)
  • Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”
  • Although such reforms would mitigate the dangers of publication bias and selective reporting, they still wouldn’t erase the decline effect. This is largely because scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging
  • John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
  • The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
  • The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
  • This suggests that the decline effect is actually a decline of illusion.
  • While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that. Many scientific theories continue to be considered true even after failing numerous experimental tests. Verbal overshadowing might exhibit the decline effect, but it remains extensively relied upon within the field. The same holds for any number of phenomena, from the disappearing benefits of second-generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite these findings, second-generation antipsychotics are still widely prescribed, and our model of the neutron hasn’t changed. The law of gravity remains the same.
  • Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
Weiye Loh

Can a group of scientists in California end the war on climate change? | Science | The ... - 0 views

  • Muller calls his latest obsession the Berkeley Earth project. The aim is so simple that the complexity and magnitude of the undertaking is easy to miss. Starting from scratch, with new computer tools and more data than has ever been used, they will arrive at an independent assessment of global warming. The team will also make every piece of data it uses – 1.6bn data points – freely available on a website. It will post its workings alongside, including full information on how more than 100 years of data from thousands of instruments around the world are stitched together to give a historic record of the planet's temperature.
  • Muller is fed up with the politicised row that all too often engulfs climate science. By laying all its data and workings out in the open, where they can be checked and challenged by anyone, the Berkeley team hopes to achieve something remarkable: a broader consensus on global warming. In no other field would Muller's dream seem so ambitious, or perhaps, so naive.
  • "We are bringing the spirit of science back to a subject that has become too argumentative and too contentious," Muller says, over a cup of tea. "We are an independent, non-political, non-partisan group. We will gather the data, do the analysis, present the results and make all of it available. There will be no spin, whatever we find." Why does Muller feel compelled to shake up the world of climate change? "We are doing this because it is the most important project in the world today. Nothing else comes close," he says.
  • ...20 more annotations...
  • There are already three heavyweight groups that could be considered the official keepers of the world's climate data. Each publishes its own figures that feed into the UN's Intergovernmental Panel on Climate Change. Nasa's Goddard Institute for Space Studies in New York City produces a rolling estimate of the world's warming. A separate assessment comes from another US agency, the National Oceanic and Atmospheric Administration (Noaa). The third group is based in the UK and led by the Met Office. They all take readings from instruments around the world to come up with a rolling record of the Earth's mean surface temperature. The numbers differ because each group uses its own dataset and does its own analysis, but they show a similar trend. Since pre-industrial times, all point to a warming of around 0.75C.
  • You might think three groups was enough, but Muller rolls out a list of shortcomings, some real, some perceived, that he suspects might undermine public confidence in global warming records. For a start, he says, warming trends are not based on all the available temperature records. The data that is used is filtered and might not be as representative as it could be. He also cites a poor history of transparency in climate science, though others argue many climate records and the tools to analyse them have been public for years.
  • Then there is the fiasco of 2009 that saw roughly 1,000 emails from a server at the University of East Anglia's Climatic Research Unit (CRU) find their way on to the internet. The fuss over the messages, inevitably dubbed Climategate, gave Muller's nascent project added impetus. Climate sceptics had already attacked James Hansen, head of the Nasa group, for making political statements on climate change while maintaining his role as an objective scientist. The Climategate emails fuelled their protests. "With CRU's credibility undergoing a severe test, it was all the more important to have a new team jump in, do the analysis fresh and address all of the legitimate issues raised by sceptics," says Muller.
  • This latest point is where Muller faces his most delicate challenge. To concede that climate sceptics raise fair criticisms means acknowledging that scientists and government agencies have got things wrong, or at least could do better. But the debate around global warming is so highly charged that open discussion, which science requires, can be difficult to hold in public. At worst, criticising poor climate science can be taken as an attack on science itself, a knee-jerk reaction that has unhealthy consequences. "Scientists will jump to the defence of alarmists because they don't recognise that the alarmists are exaggerating," Muller says.
  • The Berkeley Earth project came together more than a year ago, when Muller rang David Brillinger, a statistics professor at Berkeley and the man Nasa called when it wanted someone to check its risk estimates of space debris smashing into the International Space Station. He wanted Brillinger to oversee every stage of the project. Brillinger accepted straight away. Since the first meeting he has advised the scientists on how best to analyse their data and what pitfalls to avoid. "You can think of statisticians as the keepers of the scientific method, " Brillinger told me. "Can scientists and doctors reasonably draw the conclusions they are setting down? That's what we're here for."
  • For the rest of the team, Muller says he picked scientists known for original thinking. One is Saul Perlmutter, the Berkeley physicist who found evidence that the universe is expanding at an ever faster rate, courtesy of mysterious "dark energy" that pushes against gravity. Another is Art Rosenfeld, the last student of the legendary Manhattan Project physicist Enrico Fermi, and something of a legend himself in energy research. Then there is Robert Jacobsen, a Berkeley physicist who is an expert on giant datasets; and Judith Curry, a climatologist at Georgia Institute of Technology, who has raised concerns over tribalism and hubris in climate science.
  • Robert Rohde, a young physicist who left Berkeley with a PhD last year, does most of the hard work. He has written software that trawls public databases, themselves the product of years of painstaking work, for global temperature records. These are compiled, de-duplicated and merged into one huge historical temperature record. The data, by all accounts, are a mess. There are 16 separate datasets in 14 different formats and they overlap, but not completely. Muller likens Rohde's achievement to Hercules's enormous task of cleaning the Augean stables.
  • The wealth of data Rohde has collected so far – and some dates back to the 1700s – makes for what Muller believes is the most complete historical record of land temperatures ever compiled. It will, of itself, Muller claims, be a priceless resource for anyone who wishes to study climate change. So far, Rohde has gathered records from 39,340 individual stations worldwide.
  • Publishing an extensive set of temperature records is the first goal of Muller's project. The second is to turn this vast haul of data into an assessment on global warming.
  • The big three groups – Nasa, Noaa and the Met Office – work out global warming trends by placing an imaginary grid over the planet and averaging temperatures records in each square. So for a given month, all the records in England and Wales might be averaged out to give one number. Muller's team will take temperature records from individual stations and weight them according to how reliable they are.
  • This is where the Berkeley group faces its toughest task by far and it will be judged on how well it deals with it. There are errors running through global warming data that arise from the simple fact that the global network of temperature stations was never designed or maintained to monitor climate change. The network grew in a piecemeal fashion, starting with temperature stations installed here and there, usually to record local weather.
  • Among the trickiest errors to deal with are so-called systematic biases, which skew temperature measurements in fiendishly complex ways. Stations get moved around, replaced with newer models, or swapped for instruments that record in celsius instead of fahrenheit. The times measurements are taken varies, from say 6am to 9pm. The accuracy of individual stations drift over time and even changes in the surroundings, such as growing trees, can shield a station more from wind and sun one year to the next. Each of these interferes with a station's temperature measurements, perhaps making it read too cold, or too hot. And these errors combine and build up.
  • This is the real mess that will take a Herculean effort to clean up. The Berkeley Earth team is using algorithms that automatically correct for some of the errors, a strategy Muller favours because it doesn't rely on human interference. When the team publishes its results, this is where the scrutiny will be most intense.
  • Despite the scale of the task, and the fact that world-class scientific organisations have been wrestling with it for decades, Muller is convinced his approach will lead to a better assessment of how much the world is warming. "I've told the team I don't know if global warming is more or less than we hear, but I do believe we can get a more precise number, and we can do it in a way that will cool the arguments over climate change, if nothing else," says Muller. "Science has its weaknesses and it doesn't have a stranglehold on the truth, but it has a way of approaching technical issues that is a closer approximation of truth than any other method we have."
  • It might not be a good sign that one prominent climate sceptic contacted by the Guardian, Canadian economist Ross McKitrick, had never heard of the project. Another, Stephen McIntyre, whom Muller has defended on some issues, hasn't followed the project either, but said "anything that [Muller] does will be well done". Phil Jones at the University of East Anglia was unclear on the details of the Berkeley project and didn't comment.
  • Elsewhere, Muller has qualified support from some of the biggest names in the business. At Nasa, Hansen welcomed the project, but warned against over-emphasising what he expects to be the minor differences between Berkeley's global warming assessment and those from the other groups. "We have enough trouble communicating with the public already," Hansen says. At the Met Office, Peter Stott, head of climate monitoring and attribution, was in favour of the project if it was open and peer-reviewed.
  • Peter Thorne, who left the Met Office's Hadley Centre last year to join the Co-operative Institute for Climate and Satellites in North Carolina, is enthusiastic about the Berkeley project but raises an eyebrow at some of Muller's claims. The Berkeley group will not be the first to put its data and tools online, he says. Teams at Nasa and Noaa have been doing this for many years. And while Muller may have more data, they add little real value, Thorne says. Most are records from stations installed from the 1950s onwards, and then only in a few regions, such as North America. "Do you really need 20 stations in one region to get a monthly temperature figure? The answer is no. Supersaturating your coverage doesn't give you much more bang for your buck," he says. They will, however, help researchers spot short-term regional variations in climate change, something that is likely to be valuable as climate change takes hold.
  • Despite his reservations, Thorne says climate science stands to benefit from Muller's project. "We need groups like Berkeley stepping up to the plate and taking this challenge on, because it's the only way we're going to move forwards. I wish there were 10 other groups doing this," he says.
  • Muller's project is organised under the auspices of Novim, a Santa Barbara-based non-profit organisation that uses science to find answers to the most pressing issues facing society and to publish them "without advocacy or agenda". Funding has come from a variety of places, including the Fund for Innovative Climate and Energy Research (funded by Bill Gates), and the Department of Energy's Lawrence Berkeley Lab. One donor has had some climate bloggers up in arms: the man behind the Charles G Koch Charitable Foundation owns, with his brother David, Koch Industries, a company Greenpeace called a "kingpin of climate science denial". On this point, Muller says the project has taken money from right and left alike.
  • No one who spoke to the Guardian about the Berkeley Earth project believed it would shake the faith of the minority who have set their minds against global warming. "As new kids on the block, I think they will be given a favourable view by people, but I don't think it will fundamentally change people's minds," says Thorne. Brillinger has reservations too. "There are people you are never going to change. They have their beliefs and they're not going to back away from them."
Weiye Loh

Are the Open Data Warriors Fighting for Robin Hood or the Sheriff?: Some Refl... - 0 views

  • The ideal that these nerdy revolutionaries are pursuing is not, as with previous generations—justice, freedom, democracy—rather it is “openness” as in Open Data, Open Information, Open Government. Precisely what is meant by “openness” is never (at least certainly not in the context of this conference) really defined in a form that an outsider could grapple with (and perhaps critique). 
  • the “open data/open government” movement begins from a profoundly political perspective that government is largely ineffective and inefficient (and possibly corrupt) and that it hides that ineffectiveness and inefficiency (and possible corruption) from public scrutiny through lack of transparency in its operations and particularly in denying to the public access to information (data) about its operations.
  • further that this access once available would give citizens the means to hold bureaucrats (and their political masters) accountable for their actions. In doing so it would give these self-same citizens a platform on which to undertake (or at least collaborate with) these bureaucrats in certain key and significant activities—planning, analyzing, budgeting that sort of thing. Moreover through the implementation of processes of crowdsourcing this would also provide the bureaucrats with the overwhelming benefits of having access to and input from the knowledge and wisdom of the broader interested public.
  • ...3 more annotations...
  • t’s the taxpayer’s money and they have the right to participate in overseeing how it is spent. Having “open” access to government’s data/information gives citizens the tools to exercise that right. And (it is argued), solutions are available for putting into the hands of these citizens the means/technical tools for sifting and sorting and making critical analyses of government activities if only the key could be turned and government data was “accessible” (“open”).
  • A lot of the conference took place in specialized workshops where the technical details on how to link various sets of this newly available data together with other sets, how to structure this data so that it could serve various purposes and perhaps most importantly how to design the architecture and ontology (ultimately the management policies and procedures) of the data itself within government so that it is “born open” rather than only liberated after the fact with this latter process making the usefulness of the data in the larger world of open and universally accessible data much much greater.
  • it matters very much who the (anticipated) user is since what is being put in place are the frameworks for the data environment  of the future and these will include for the most part some assumptions about who the ultimate user is or will be and whether or not a new “data divide” will emerge written more deeply into the fabric of the Information Society than even the earlier “digital (access) divide”.
Weiye Loh

'Scrapers' Dig Deep for Data on the Web - WSJ.com - 0 views

  • website PatientsLikeMe.com noticed suspicious activity on its "Mood" discussion board. There, people exchange highly personal stories about their emotional disorders, ranging from bipolar disease to a desire to cut themselves. It was a break-in. A new member of the site, using sophisticated software, was "scraping," or copying, every single message off PatientsLikeMe's private online forums.
  • PatientsLikeMe managed to block and identify the intruder: Nielsen Co., the privately held New York media-research firm. Nielsen monitors online "buzz" for clients, including major drug makers, which buy data gleaned from the Web to get insight from consumers about their products, Nielsen says.
  • The market for personal data about Internet users is booming, and in the vanguard is the practice of "scraping." Firms offer to harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives. The emerging business of web scraping provides some of the raw material for a rapidly expanding data economy. Marketers spent $7.8 billion on online and offline data in 2009, according to the New York management consulting firm Winterberry Group LLC. Spending on data from online sources is set to more than double, to $840 million in 2012 from $410 million in 2009.
  • ...6 more annotations...
  • The Wall Street Journal's examination of scraping—a trade that involves personal information as well as many other types of data—is part of the newspaper's investigation into the business of tracking people's activities online and selling details about their behavior and personal interests.
  • Some companies collect personal information for detailed background reports on individuals, such as email addresses, cell numbers, photographs and posts on social-network sites. Others offer what are known as listening services, which monitor in real time hundreds or thousands of news sources, blogs and websites to see what people are saying about specific products or topics.
  • One such service is offered by Dow Jones & Co., publisher of the Journal. Dow Jones collects data from the Web—which may include personal information contained in news articles and blog postings—that help corporate clients monitor how they are portrayed. It says it doesn't gather information from password-protected parts of sites.
  • The competition for data is fierce. PatientsLikeMe also sells data about its users. PatientsLikeMe says the data it sells is anonymized, no names attached.
  • Nielsen spokesman Matt Anchin says the company's reports to its clients include publicly available information gleaned from the Internet, "so if someone decides to share personally identifiable information, it could be included."
  • Internet users often have little recourse if personally identifiable data is scraped: There is no national law requiring data companies to let people remove or change information about themselves, though some firms let users remove their profiles under certain circumstances.
  •  
    he market for personal data about Internet users is booming, and in the vanguard is the practice of "scraping." Firms offer to harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives.
Weiye Loh

A Data State of Mind | Think Quarterly - 0 views

  • Rosling has maintained a fact-based worldview – an understanding of how global health trends act as a signifier for economic development based on hard data. Today, he argues, countries and corporations alike need to adopt that same data-driven understanding of the world if they are to make sense of the changes we are experiencing in this new century, and the opportunities and challenges that lie ahead.
  • the world has changed so much, what people need isn’t more data but a new mindset. They need a new storage system that can handle this new information. But what I have found over the years is that the CEOs of the biggest companies are actually those that already have the most fact-based worldview, more so than in media, academia or politics. Those CEOs that haven’t grasped the reality of the world have already failed in business. If they don’t understand what is happening in terms of potential new markets in the Middle East, Africa and so on, they are out. So the bigger and more international the organisation, the more fact-based the CEO’s worldview is likely to be. The problem is that they are slow in getting their organisation to follow.
  • Companies as a whole are stuck in the rut of an old mindset. They think in outworn categories and follow habits and assumptions that are not, or only rarely, based on fact.
  • ...10 more annotations...
  • For instance, in terms of education levels, we no longer live in a world that is divided into the West and the rest; our world today stretches from Canada to Yemen with all the other countries somewhere in between. There’s a broad spectrum of levels
  • even when people act within a fact-based worldview, they are used to talking with sterile figures. They are used to standing on a podium, clicking through slide shows in PowerPoint rather than interacting with their presentation. The problem is that companies have a strict separation between their IT department, where datasets are produced, and the design department, so hardly any presenters are proficient in both. Yet this is what we need. Getting people used to talking with animated data is, to my mind, a literacy project.
  • What’s important today is not just financial data but child mortality rates, the number of children per women, education levels, etc. In the world today, it’s not money that drags people into modern times, it’s people that drag money into modern times.
  • I can demonstrate human resources successes in Asia through health being improved, family size decreasing and then education levels increasing. That makes sense: when more children survive, parents accept that there is less need for multiple births, and they can afford to put their children through school. So Pfizer have moved their research and development of drugs to Asia, where there are brilliant young people who are amazing at developing drugs. It’s realising this kind of change that’s important.
  • The problem isn’t that specialised companies lack the data they need, it’s that they don’t go and look for it, they don’t understand how to handle it.
  • What is so strong with animation is that it provides that mindset shift in market segmentation. We can see where there are highly developed countries with a good economy and a healthy and well-educated staff.
  • At the moment, I’m quarrelling with Sweden’s Minister of Foreign Affairs. He says that the West has to make sure its lead over the rest of the world doesn’t erode. This is a completely wrong attitude. Western Europe and other high-income countries have to integrate themselves into the world in the same way big companies are doing. They have to look at the advantages, resources and markets that exist in different places around the world.
  • And some organisations aren’t willing to share their data, even though it would be a win-win situation for everybody and we would do much better in tackling the problems we need to tackle. Last April, the World Bank caved in and finally embraced an open data policy, but the OECD uses tax money to compile data and then sells it in a monopolistic way. The Chinese Statistical Bureau provides data more easily than the OECD. The richest countries in the world don’t have the vision to change.
  • ‘database hugging disorder’
  • we have to instil a clear division of labour between those who provide the datasets – like the World Bank, the World Health Organisation or companies themselves – those who provide new technologies to access or process them, like Google or Microsoft, and those who ‘play’ with them and give data meaning. It’s like a great concert: you need a Mozart or a Chopin to write wonderful music, then you need the instruments and finally the musicians.
Weiye Loh

Data Without Borders - 0 views

  •  
    Data is everywhere, but use of data is not. So many of our efforts are centered around making money or getting people to buy more things, and this is understandable; however, there are neglected areas that could actually have a huge impact on the way we live. Jake Porway, a data scientist at The New York Times, has a proposition for you, tentatively called Data Without Borders. [T]here are lots of NGOs and non-profits out there doing wonderful things for the world, from rehabilitating criminals, to battling hunger, to providing clean drinking water. However, they're increasingly finding themselves with more and more data about their practices, their clients, and their missions that they don't have the resources or budgets to analyze. At the same time, the data/dev communities love hacking together weekend projects where we play with new datasets or build helpful scripts, but they usually just culminate in a blog post or some Twitter buzz. Wouldn't it be rad if we could get these two sides together?
Weiye Loh

Skepticblog » Global Warming Skeptic Changes His Tune - by Doing the Science ... - 0 views

  • To the global warming deniers, Muller had been an important scientific figure with good credentials who had expressed doubt about the temperature data used to track the last few decades of global warming. Muller was influenced by Anthony Watts, a former TV weatherman (not a trained climate scientist) and blogger who has argued that the data set is mostly from large cities, where the “urban heat island” effect might bias the overall pool of worldwide temperature data. Climate scientists have pointed out that they have accounted for this possible effect already, but Watts and Muller were unconvinced. With $150,000 (25% of their funding) from the Koch brothers (the nation’s largest supporters of climate denial research), as well as the Getty Foundation (their wealth largely based on oil money) and other funding sources, Muller set out to reanalyze all the temperature data by setting up the Berkeley Earth Surface Temperature Project.
  • Although only 2% of the data were analyzed by last month, the Republican climate deniers in Congress called him to testify in their March 31 hearing to attack global warming science, expecting him to give them scientific data supporting their biases. To their dismay, Muller behaved like a real scientist and not an ideologue—he followed his data and told them the truth, not what they wanted to hear. Muller pointed out that his analysis of the data set almost exactly tracked what the National Oceanographic and Atmospheric Administration (NOAA), the Goddard Institute of Space Science (GISS), and the Hadley Climate Research Unit at the University of East Anglia in the UK had already published (see figure).
  • Muller testified before the House Committee that: The Berkeley Earth Surface Temperature project was created to make the best possible estimate of global temperature change using as complete a record of measurements as possible and by applying novel methods for the estimation and elimination of systematic biases. We see a global warming trend that is very similar to that previously reported by the other groups. The world temperature data has sufficient integrity to be used to determine global temperature trends. Despite potential biases in the data, methods of analysis can be used to reduce bias effects well enough to enable us to measure long-term Earth temperature changes. Data integrity is adequate. Based on our initial work at Berkeley Earth, I believe that some of the most worrisome biases are less of a problem than I had previously thought.
  • ...4 more annotations...
  • The right-wing ideologues were sorely disappointed, and reacted viciously in the political sphere by attacking their own scientist, but Muller’s scientific integrity overcame any biases he might have harbored at the beginning. He “called ‘em as he saw ‘em” and told truth to power.
  • it speaks well of the scientific process when a prominent skeptic like Muller does his job properly and admits that his original biases were wrong. As reported in the Los Angeles Times : Ken Caldeira, an atmospheric scientist at the Carnegie Institution for Science, which contributed some funding to the Berkeley effort, said Muller’s statement to Congress was “honorable” in recognizing that “previous temperature reconstructions basically got it right…. Willingness to revise views in the face of empirical data is the hallmark of the good scientific process.”
  • This is the essence of the scientific method at its best. There may be biases in our perceptions, and we may want to find data that fits our preconceptions about the world, but if science is done properly, we get a real answer, often one we did not expect or didn’t want to hear. That’s the true test of when science is giving us a reality check: when it tells us “an inconvenient truth”, something we do not like, but is inescapable if one follows the scientific method and analyzes the data honestly.
  • Sit down before fact as a little child, be prepared to give up every preconceived notion, follow humbly wherever and to whatever abysses nature leads, or you shall learn nothing.
Weiye Loh

McKinsey & Company - Clouds, big data, and smart assets: Ten tech-enabled business tren... - 0 views

  • 1. Distributed cocreation moves into the mainstreamIn the past few years, the ability to organise communities of Web participants to develop, market, and support products and services has moved from the margins of business practice to the mainstream. Wikipedia and a handful of open-source software developers were the pioneers. But in signs of the steady march forward, 70 per cent of the executives we recently surveyed said that their companies regularly created value through Web communities. Similarly, more than 68m bloggers post reviews and recommendations about products and services.
  • for every success in tapping communities to create value, there are still many failures. Some companies neglect the up-front research needed to identify potential participants who have the right skill sets and will be motivated to participate over the longer term. Since cocreation is a two-way process, companies must also provide feedback to stimulate continuing participation and commitment. Getting incentives right is important as well: cocreators often value reputation more than money. Finally, an organisation must gain a high level of trust within a Web community to earn the engagement of top participants.
  • 2. Making the network the organisation In earlier research, we noted that the Web was starting to force open the boundaries of organisations, allowing nonemployees to offer their expertise in novel ways. We called this phenomenon "tapping into a world of talent." Now many companies are pushing substantially beyond that starting point, building and managing flexible networks that extend across internal and often even external borders. The recession underscored the value of such flexibility in managing volatility. We believe that the more porous, networked organisations of the future will need to organise work around critical tasks rather than molding it to constraints imposed by corporate structures.
  • ...10 more annotations...
  • 3. Collaboration at scale Across many economies, the number of people who undertake knowledge work has grown much more quickly than the number of production or transactions workers. Knowledge workers typically are paid more than others, so increasing their productivity is critical. As a result, there is broad interest in collaboration technologies that promise to improve these workers' efficiency and effectiveness. While the body of knowledge around the best use of such technologies is still developing, a number of companies have conducted experiments, as we see in the rapid growth rates of video and Web conferencing, expected to top 20 per cent annually during the next few years.
  • 4. The growing ‘Internet of Things' The adoption of RFID (radio-frequency identification) and related technologies was the basis of a trend we first recognised as "expanding the frontiers of automation." But these methods are rudimentary compared with what emerges when assets themselves become elements of an information system, with the ability to capture, compute, communicate, and collaborate around information—something that has come to be known as the "Internet of Things." Embedded with sensors, actuators, and communications capabilities, such objects will soon be able to absorb and transmit information on a massive scale and, in some cases, to adapt and react to changes in the environment automatically. These "smart" assets can make processes more efficient, give products new capabilities, and spark novel business models. Auto insurers in Europe and the United States are testing these waters with offers to install sensors in customers' vehicles. The result is new pricing models that base charges for risk on driving behavior rather than on a driver's demographic characteristics. Luxury-auto manufacturers are equipping vehicles with networked sensors that can automatically take evasive action when accidents are about to happen. In medicine, sensors embedded in or worn by patients continuously report changes in health conditions to physicians, who can adjust treatments when necessary. Sensors in manufacturing lines for products as diverse as computer chips and pulp and paper take detailed readings on process conditions and automatically make adjustments to reduce waste, downtime, and costly human interventions.
  • 5. Experimentation and big data Could the enterprise become a full-time laboratory? What if you could analyse every transaction, capture insights from every customer interaction, and didn't have to wait for months to get data from the field? What if…? Data are flooding in at rates never seen before—doubling every 18 months—as a result of greater access to customer data from public, proprietary, and purchased sources, as well as new information gathered from Web communities and newly deployed smart assets. These trends are broadly known as "big data." Technology for capturing and analysing information is widely available at ever-lower price points. But many companies are taking data use to new levels, using IT to support rigorous, constant business experimentation that guides decisions and to test new products, business models, and innovations in customer experience. In some cases, the new approaches help companies make decisions in real time. This trend has the potential to drive a radical transformation in research, innovation, and marketing.
  • Using experimentation and big data as essential components of management decision making requires new capabilities, as well as organisational and cultural change. Most companies are far from accessing all the available data. Some haven't even mastered the technologies needed to capture and analyse the valuable information they can access. More commonly, they don't have the right talent and processes to design experiments and extract business value from big data, which require changes in the way many executives now make decisions: trusting instincts and experience over experimentation and rigorous analysis. To get managers at all echelons to accept the value of experimentation, senior leaders must buy into a "test and learn" mind-set and then serve as role models for their teams.
  • 6. Wiring for a sustainable world Even as regulatory frameworks continue to evolve, environmental stewardship and sustainability clearly are C-level agenda topics. What's more, sustainability is fast becoming an important corporate-performance metric—one that stakeholders, outside influencers, and even financial markets have begun to track. Information technology plays a dual role in this debate: it is both a significant source of environmental emissions and a key enabler of many strategies to mitigate environmental damage. At present, information technology's share of the world's environmental footprint is growing because of the ever-increasing demand for IT capacity and services. Electricity produced to power the world's data centers generates greenhouse gases on the scale of countries such as Argentina or the Netherlands, and these emissions could increase fourfold by 2020. McKinsey research has shown, however, that the use of IT in areas such as smart power grids, efficient buildings, and better logistics planning could eliminate five times the carbon emissions that the IT industry produces.
  • 7. Imagining anything as a service Technology now enables companies to monitor, measure, customise, and bill for asset use at a much more fine-grained level than ever before. Asset owners can therefore create services around what have traditionally been sold as products. Business-to-business (B2B) customers like these service offerings because they allow companies to purchase units of a service and to account for them as a variable cost rather than undertake large capital investments. Consumers also like this "paying only for what you use" model, which helps them avoid large expenditures, as well as the hassles of buying and maintaining a product.
  • In the IT industry, the growth of "cloud computing" (accessing computer resources provided through networks rather than running software or storing data on a local computer) exemplifies this shift. Consumer acceptance of Web-based cloud services for everything from e-mail to video is of course becoming universal, and companies are following suit. Software as a service (SaaS), which enables organisations to access services such as customer relationship management, is growing at a 17 per cent annual rate. The biotechnology company Genentech, for example, uses Google Apps for e-mail and to create documents and spreadsheets, bypassing capital investments in servers and software licenses. This development has created a wave of computing capabilities delivered as a service, including infrastructure, platform, applications, and content. And vendors are competing, with innovation and new business models, to match the needs of different customers.
  • 8. The age of the multisided business model Multisided business models create value through interactions among multiple players rather than traditional one-on-one transactions or information exchanges. In the media industry, advertising is a classic example of how these models work. Newspapers, magasines, and television stations offer content to their audiences while generating a significant portion of their revenues from third parties: advertisers. Other revenue, often through subscriptions, comes directly from consumers. More recently, this advertising-supported model has proliferated on the Internet, underwriting Web content sites, as well as services such as search and e-mail (see trend number seven, "Imagining anything as a service," earlier in this article). It is now spreading to new markets, such as enterprise software: Spiceworks offers IT-management applications to 950,000 users at no cost, while it collects advertising from B2B companies that want access to IT professionals.
  • 9. Innovating from the bottom of the pyramid The adoption of technology is a global phenomenon, and the intensity of its usage is particularly impressive in emerging markets. Our research has shown that disruptive business models arise when technology combines with extreme market conditions, such as customer demand for very low price points, poor infrastructure, hard-to-access suppliers, and low cost curves for talent. With an economic recovery beginning to take hold in some parts of the world, high rates of growth have resumed in many developing nations, and we're seeing companies built around the new models emerging as global players. Many multinationals, meanwhile, are only starting to think about developing markets as wellsprings of technology-enabled innovation rather than as traditional manufacturing hubs.
  • 10. Producing public good on the grid The role of governments in shaping global economic policy will expand in coming years. Technology will be an important factor in this evolution by facilitating the creation of new types of public goods while helping to manage them more effectively. This last trend is broad in scope and draws upon many of the other trends described above.
Weiye Loh

Unhappy meal: Data retention bill could lure sex predators into McDonalds, libraries - 0 views

  • mandatory data retention legislation. The bill that they have proposed requires that Internet Service Providers, such as Comcast and Time Warner, save records of the IP addresses they assign to their customers for a period of 18 months.
  • Data retention is a controversial topic and loudly opposed by the privacy community. To counter such criticism, the bill's authors have cunningly (and shamelessly) named it the Protecting Children from Internet Pornographers Act of 2011. This of course means that anyone who opposes data retention must go on record as opposing measures to catch sexual predators.
  • The bill includes a curious exception to the retention requirements: it doesn't apply to wireless data providers, such as AT&T and Verizon, or operators of public WiFi networks, such as Starbucks and McDonalds. When questioned about this, a Republican committee staffer told CNET in May that the wireless loophole was added because wireless networks are designed in such a way that IP addresses are assigned to multiple users or accounts and they are "not technologically capable of retaining the type of data that law enforcement needs because that's not how their system works."
  • ...1 more annotation...
  • This explanation is completely bogus. Wireless providers, like wireline broadband providers, are quite capable of retaining logs of the IP addresses they temporarily issue to their customers. Many wireless providers, such as Sprint and Verizon, already retain IP logs for at least a year. The true explanation for the loophole is, I believe, that the wireless carriers have powerful and remarkably effective lobbyists.
  •  
    In this opinion piece, a cybersecurity researcher argues that loopholes in a new data retention bill push those wanting to use the 'Net anonymously into cafes, libraries, and fast food restaurants. The following op-ed does not necessarily represent the opinions of Ars Technica.
Weiye Loh

When information is power, these are the questions we should be asking | Online Journal... - 0 views

  • “There is absolutely no empiric evidence that shows that anyone actually uses the accounts produced by public bodies to make any decision. There is no group of principals analogous to investors. There are many lists of potential users of the accounts. The Treasury, CIPFA (the UK public sector accounting body) and others have said that users might include the public, taxpayers, regulators and oversight bodies. I would be prepared to put up a reward for anyone who could prove to me that any of these people have ever made a decision based on the financial reports of a public body. If there are no users of the information then there is no point in making the reports better. If there are no users more technically correct reports do nothing to improve the understanding of public finances. In effect all that better reports do is legitimise the role of professional accountants in the accountability process.
  • raw data – and the ability to interrogate that – should instead be made available because (quoting Anthony Hopwood): “Those with the power to determine what enters into organisational accounts have the means to articulate and diffuse their values and concerns, and subsequently to monitor, observe and regulate the actions of those that are now accounted for.”
  • Data is not just some opaque term; something for geeks: it’s information: the raw material we deal in as journalists. Knowledge. Power. The site of a struggle for control. And considering it’s a site that journalists have always fought over, it’s surprisingly placid as we enter one of the most important ages in the history of information control.
  • ...1 more annotation...
  • 3 questions to ask of any transparency initiative: If information is to be published in a database behind a form, then it’s hidden in plain sight. It cannot be easily found by a journalist, and only simple questions will be answered. If information is to be published in PDFs or JPEGs, or some format that you need proprietary software to see, then it cannot be easily be questioned by a journalist If you will have to pass a test to use the information, then obstacles will be placed between the journalist and that information The next time an organisation claims that they are opening up their information, tick those questions off. (If you want more, see Gurstein’s list of 7 elements that are needed to make effective use of open data).
  •  
    control of information still represents the exercise of power, and how shifts in that control as a result of the transparency/open data/linked data agenda are open to abuse, gaming, or spin.
Weiye Loh

Turning Privacy "Threats" Into Opportunities - Esther Dyson - Project Syndicate - 0 views

  • ost disclosure statements are not designed to be read; they are designed to be clicked on. But some companies actually want their customers to read and understand the statements. They don’t want customers who might sue, and, just in case, they want to be able to prove that the customers did understand the risks. So the leaders in disclosure statements right now tend to be financial and health-care companies – and also space-travel and extreme-sports vendors. They sincerely want to let their customers know what they are getting into, because a regretful customer is a vengeful one. That means making disclosure statements readable. I would suggest turning them into a quiz. The user would not simply click a single button, but would have to select the right button for each question. For example: What are my chances of dying in space? A) 5% B) 30% C) 1-4% (the correct answer, based on experience so far; current spacecraft are believed to be safer.) Now imagine: Who can see my data? A) I can. B) XYZ Corporation. C) XYZ Corporation’s marketing partners. (Click here to see the list.) D) XYZ Corporation’s affiliates and anyone it chooses. As the customer picks answers, she gets a good idea of what is going on. In fact, if you're a marketer, why not dispense with a single right answer and let the consumer specify what she wants to have happen with her data (and corresponding privileges/access rights if necessary)? That’s much more useful than vague policy statements. Suddenly, the disclosure statement becomes a consumer application that adds value to the vendor-consumer relationship.
  • And show the data themselves rather than a description.
  • this is all very easy if you are the site with which the user communicates directly; it is more difficult if you are in the background, a third party collecting information surreptitiously. But that practice should be stopped, anyway.
  • ...4 more annotations...
  • just as they have with Facebook, users will become more familiar with the idea of setting their own privacy preferences and managing their own data. Smart vendors will learn from Facebook; the rest will lose out to competitors. Visualizing the user's information and providing an intelligible interface is an opportunity for competitive advantage.
  • I see this happening already with a number of companies, including some with which I am involved. For example, in its research surveys, 23andMe asks people questions such as how often they have headaches or whether they have ever been exposed to pesticides, and lets them see (in percentages) how other 23andMe users answer the question. This kind of information is fascinating to most people. TripIt lets you compare and match your own travel plans with those of friends. Earndit lets you compete with others to exercise more and win points and prizes.
  • Consumers increasingly expect to be able to see themselves both as individuals and in context. They will feel more comfortable about sharing data if they feel confident that they know what is shared and what is not. The online world will feel like a well-lighted place with shops, newsstands, and the like, where you can see other people and they can see you. Right now, it more often feels like lurking in a spooky alley with a surveillance camera overlooking the scene.
  • Of course, there will be “useful” data that an individual might not want to share – say, how much alcohol they buy, which diseases they have, or certain of their online searches. They will know how to keep such information discreet, just as they might close the curtains to get undressed in their hotel room after enjoying the view from the balcony. Yes, living online takes a little more thought than living offline. But it is not quite as complex once Internet-based services provide the right tools – and once awareness and control of one’s own data become a habit.
  •  
    companies see consumer data as something that they can use to target ads or offers, or perhaps that they can sell to third parties, but not as something that consumers themselves might want. Of course, this is not an entirely new idea, but most pundits on both sides - privacy advocates and marketers - don't realize that rather than protecting consumers or hiding from them, companies should be bringing them into the game. I believe that successful companies will turn personal data into an asset by giving it back to their customers in an enhanced form. I am not sure exactly how this will happen, but current players will either join this revolution or lose out.
Weiye Loh

Rod Beckstrom proposes ways to reclaim control over our online selves. - Project Syndicate - 0 views

  • As the virtual world expands, so, too, do breaches of trust and misuse of personal data. Surveillance has increased public unease – and even paranoia – about state agencies. Private companies that trade in personal data have incited the launch of a “reclaim privacy” movement. As one delegate at a recent World Economic Forum debate, noted: “The more connected we have become, the more privacy we have given up.”
  • Now that our personal data have become such a valuable asset, companies are coming under increasing pressure to develop online business models that protect rather than exploit users’ private information. In particular, Internet users want to stop companies befuddling their customers with convoluted and legalistic service agreements in order to extract and sell their data.
  • Hyper-connectivity not only creates new commercial opportunities; it also changes the way ordinary people think about their lives. The so-called FoMo (fear of missing out) syndrome reflects the anxieties of a younger generation whose members feel compelled to capture instantly everything they do and see.CommentsView/Create comment on this paragraphIronically, this hyper-connectivity has increased our insularity, as we increasingly live through our electronic devices. Neuroscientists believe that this may even have altered how we now relate to one another in the real world.
  • ...1 more annotation...
  • At the heart of this debate is the need to ensure that in a world where many, if not all, of the important details of our lives – including our relationships – exist in cyber-perpetuity, people retain, or reclaim, some level of control over their online selves. While the world of forgetting may have vanished, we can reshape the new one in a way that benefits rather than overwhelms us. Our overriding task is to construct a digital way of life that reinforces our existing sense of ethics and values, with security, trust, and fairness at its heart.
  •  
    "We must answer profound questions about the way we live. Should everyone be permanently connected to everything? Who owns which data, and how should information be made public? Can and should data use be regulated, and, if so, how? And what role should government, business, and ordinary Internet users play in addressing these issues?"
Weiye Loh

IPhone and Android Apps Breach Privacy - WSJ.com - 0 views

  • Few devices know more personal details about people than the smartphones in their pockets: phone numbers, current location, often the owner's real name—even a unique ID number that can never be changed or turned off.
  • An examination of 101 popular smartphone "apps"—games and other software applications for iPhone and Android phones—showed that 56 transmitted the phone's unique device ID to other companies without users' awareness or consent. Forty-seven apps transmitted the phone's location in some way. Five sent age, gender and other personal details to outsiders.
  • The findings reveal the intrusive effort by online-tracking companies to gather personal data about people in order to flesh out detailed dossiers on them.
  • ...24 more annotations...
  • iPhone apps transmitted more data than the apps on phones using Google Inc.'s Android operating system. Because of the test's size, it's not known if the pattern holds among the hundreds of thousands of apps available.
  • TextPlus 4, a popular iPhone app for text messaging. It sent the phone's unique ID number to eight ad companies and the phone's zip code, along with the user's age and gender, to two of them.
  • Pandora, a popular music app, sent age, gender, location and phone identifiers to various ad networks. iPhone and Android versions of a game called Paper Toss—players try to throw paper wads into a trash can—each sent the phone's ID number to at least five ad companies. Grindr, an iPhone app for meeting gay men, sent gender, location and phone ID to three ad companies.
  • iPhone maker Apple Inc. says it reviews each app before offering it to users. Both Apple and Google say they protect users by requiring apps to obtain permission before revealing certain kinds of information, such as location.
  • The Journal found that these rules can be skirted. One iPhone app, Pumpkin Maker (a pumpkin-carving game), transmits location to an ad network without asking permission. Apple declines to comment on whether the app violated its rules.
  • With few exceptions, app users can't "opt out" of phone tracking, as is possible, in limited form, on regular computers. On computers it is also possible to block or delete "cookies," which are tiny tracking files. These techniques generally don't work on cellphone apps.
  • makers of TextPlus 4, Pandora and Grindr say the data they pass on to outside firms isn't linked to an individual's name. Personal details such as age and gender are volunteered by users, they say. The maker of Pumpkin Maker says he didn't know Apple required apps to seek user approval before transmitting location. The maker of Paper Toss didn't respond to requests for comment.
  • Many apps don't offer even a basic form of consumer protection: written privacy policies. Forty-five of the 101 apps didn't provide privacy policies on their websites or inside the apps at the time of testing. Neither Apple nor Google requires app privacy policies.
  • the most widely shared detail was the unique ID number assigned to every phone.
  • On iPhones, this number is the "UDID," or Unique Device Identifier. Android IDs go by other names. These IDs are set by phone makers, carriers or makers of the operating system, and typically can't be blocked or deleted. "The great thing about mobile is you can't clear a UDID like you can a cookie," says Meghan O'Holleran of Traffic Marketplace, an Internet ad network that is expanding into mobile apps. "That's how we track everything."
  • O'Holleran says Traffic Marketplace, a unit of Epic Media Group, monitors smartphone users whenever it can. "We watch what apps you download, how frequently you use them, how much time you spend on them, how deep into the app you go," she says. She says the data is aggregated and not linked to an individual.
  • Apple and Google ad networks let advertisers target groups of users. Both companies say they don't track individuals based on the way they use apps.
  • Apple limits what can be installed on an iPhone by requiring iPhone apps to be offered exclusively through its App Store. Apple reviews those apps for function, offensiveness and other criteria.
  • Apple says iPhone apps "cannot transmit data about a user without obtaining the user's prior permission and providing the user with access to information about how and where the data will be used." Many apps tested by the Journal appeared to violate that rule, by sending a user's location to ad networks, without informing users. Apple declines to discuss how it interprets or enforces the policy.
  • Google doesn't review the apps, which can be downloaded from many vendors. Google says app makers "bear the responsibility for how they handle user information." Google requires Android apps to notify users, before they download the app, of the data sources the app intends to access. Possible sources include the phone's camera, memory, contact list, and more than 100 others. If users don't like what a particular app wants to access, they can choose not to install the app, Google says.
  • Neither Apple nor Google requires apps to ask permission to access some forms of the device ID, or to send it to outsiders. When smartphone users let an app see their location, apps generally don't disclose if they will pass the location to ad companies.
  • Lack of standard practices means different companies treat the same information differently. For example, Apple says that, internally, it treats the iPhone's UDID as "personally identifiable information." That's because, Apple says, it can be combined with other personal details about people—such as names or email addresses—that Apple has via the App Store or its iTunes music services. By contrast, Google and most app makers don't consider device IDs to be identifying information.
  • A growing industry is assembling this data into profiles of cellphone users. Mobclix, the ad exchange, matches more than 25 ad networks with some 15,000 apps seeking advertisers. The Palo Alto, Calif., company collects phone IDs, encodes them (to obscure the number), and assigns them to interest categories based on what apps people download and how much time they spend using an app, among other factors. By tracking a phone's location, Mobclix also makes a "best guess" of where a person lives, says Mr. Gurbuxani, the Mobclix executive. Mobclix then matches that location with spending and demographic data from Nielsen Co.
  • Mobclix can place a user in one of 150 "segments" it offers to advertisers, from "green enthusiasts" to "soccer moms." For example, "die hard gamers" are 15-to-25-year-old males with more than 20 apps on their phones who use an app for more than 20 minutes at a time. Mobclix says its system is powerful, but that its categories are broad enough to not identify individuals. "It's about how you track people better," Mr. Gurbuxani says.
  • four app makers posted privacy policies after being contacted by the Journal, including Rovio Mobile Ltd., the Finnish company behind the popular game Angry Birds (in which birds battle egg-snatching pigs). A spokesman says Rovio had been working on the policy, and the Journal inquiry made it a good time to unveil it.
  • Free and paid versions of Angry Birds were tested on an iPhone. The apps sent the phone's UDID and location to the Chillingo unit of Electronic Arts Inc., which markets the games. Chillingo says it doesn't use the information for advertising and doesn't share it with outsiders.
  • Some developers feel pressure to release more data about people. Max Binshtok, creator of the DailyHoroscope Android app, says ad-network executives encouraged him to transmit users' locations. Mr. Binshtok says he declined because of privacy concerns. But ads targeted by location bring in two to five times as much money as untargeted ads, Mr. Binshtok says. "We are losing a lot of revenue."
  • Apple targets ads to phone users based largely on what it knows about them through its App Store and iTunes music service. The targeting criteria can include the types of songs, videos and apps a person downloads, according to an Apple ad presentation reviewed by the Journal. The presentation named 103 targeting categories, including: karaoke, Christian/gospel music, anime, business news, health apps, games and horror movies. People familiar with iAd say Apple doesn't track what users do inside apps and offers advertisers broad categories of people, not specific individuals. Apple has signaled that it has ideas for targeting people more closely. In a patent application filed this past May, Apple outlined a system for placing and pricing ads based on a person's "web history or search history" and "the contents of a media library." For example, home-improvement advertisers might pay more to reach a person who downloaded do-it-yourself TV shows, the document says.
  • The patent application also lists another possible way to target people with ads: the contents of a friend's media library. How would Apple learn who a cellphone user's friends are, and what kinds of media they prefer? The patent says Apple could tap "known connections on one or more social-networking websites" or "publicly available information or private databases describing purchasing decisions, brand preferences," and other data. In September, Apple introduced a social-networking service within iTunes, called Ping, that lets users share music preferences with friends. Apple declined to comment.
1 - 20 of 239 Next › Last »
Showing 20 items per page