Skip to main content

Home/ DJCamp2011/ Group items tagged numbers

Rss Feed Group items tagged

Tom Johnson

The special trick that helps identify dodgy stats | Ben Goldacre | Comment is free | Th... - 0 views

  • The special trick that helps identify dodgy stats Using Benford's law, forensic statisticians can spot suspicious patterns in the raw numbers, and estimate the chances figures have been tampered with
  • The results were fun. Greece – whose economy has tanked – showed the largest and most suspicious deviation from Benford's law of any country in the euro.
  •  
    if you go to the website testingbenfordslaw.com you'll see the proportions of each leading digit from lots of real-world datasets, graphed alongside what Benford's law predicts they should be, with data from Twitter users' follower counts to the number of books in different libraries across the US
  •  
    The special trick that helps identify dodgy stats Using Benford's law, forensic statisticians can spot suspicious patterns in the raw numbers, and estimate the chances figures have been tampered with
Tom Johnson

ELAN description | The Language Archive - 0 views

  • ELAN description ELAN is a professional tool for the creation of complex annotations on video and audio resources. With ELAN a user can add an unlimited number of annotations to audio and/or video streams. An annotation can be a sentence, word or gloss, a comment, translation or a description of any feature observed in the media. Annotations can be created on multiple layers, called tiers. Tiers can be hierarchically interconnected. An annotation can either be time-aligned to the media or it can refer to other existing annotations. The textual content of annotations is always in Unicode and the transcription is stored in an XML format. ELAN provides several different views on the annotations, each view is connected and synchronized to the media playhead. Up to 4 video files can be associated with an annotation document. Each video can be integrated in the main document window or displayed in its own resizable window. ELAN delegates media playback to an existing media framework, like Windows Media Player, QuickTime or JMF (Java Media Framework). As a result a wide variety of audio and video formats is supported and high performance media playback can be achieved. ELAN is written in the Java programming language and the sources are available for non-commercial use. It runs on Windows, Mac OS X and Linux.
  •  
    ELAN description ELAN is a professional tool for the creation of complex annotations on video and audio resources. With ELAN a user can add an unlimited number of annotations to audio and/or video streams. An annotation can be a sentence, word or gloss, a comment, translation or a description of any feature observed in the media. Annotations can be created on multiple layers, called tiers. Tiers can be hierarchically interconnected. An annotation can either be time-aligned to the media or it can refer to other existing annotations. The textual content of annotations is always in Unicode and the transcription is stored in an XML format. ELAN provides several different views on the annotations, each view is connected and synchronized to the media playhead. Up to 4 video files can be associated with an annotation document. Each video can be integrated in the main document window or displayed in its own resizable window. ELAN delegates media playback to an existing media framework, like Windows Media Player, QuickTime or JMF (Java Media Framework). As a result a wide variety of audio and video formats is supported and high performance media playback can be achieved. ELAN is written in the Java programming language and the sources are available for non-commercial use. It runs on Windows, Mac OS X and Linux.
Tom Johnson

The Overview Project » Using Overview to analyze 4500 pages of documents on s... - 0 views

  • Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search.  But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents — to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
  •  
    Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search. But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents - to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
Tom Johnson

International Dataset Search - 0 views

  • International Dataset Search View View Source Description:  The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Contributor: Jinguang Zheng Contributor: Yongmei Shi Live Demo:  http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. Welcome to S2S! International OGD Catalog Search (searching 736,578 datasets)
  •  
    International Dataset Search View View Source Description: The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Jinguang Zheng Yongmei Shi Live Demo: http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. International OGD Catalog Search (searching 736,578 datasets) http://logd.tw.rpi.edu/demo/international_dataset_catalog_search
  •  
    Loads surprisingly quickly. Try entering your favorite search term in top blue box. Can use quotes to define phrases.
Tom Johnson

Reconstruction 2012 - 0 views

  •  
    "ReConstitution 2012, a fun experiment by Sosolimited, processes transcripts from the presidential debates, and recreates them with animated words and charts. Part data visualization, part experimental typography, ReConstitution 2012 is a live web app linked to the US Presidential Debates. During and after the three debates, language used by the candidates generates a live graphical map of the events. Algorithms track the psychological states of Romney and Obama and compare them to past candidates. The app allows the user to get beyond the punditry and discover the hidden meaning in the words chosen by the candidates. As you let the transcript run, numbers followed by their units (like "18 months") flash on the screen, and trigger words for emotions like positivity, negativity, and rage are highlighted yellow, blue, and red, respectively. You can also see the classifications in graph form. There are a handful of less straightforward text classifications for truthy and suicidal, which are based on linguistic studies, which in turn are based on word frequencies. These estimates are more fuzzy. So, as the creators suggest, it's best not to interpret the project as an analytical tool, and more of a fun way to look back at the debate, which it is. It's pretty fun to watch. Here's a short video from Sosolimited for more on how the application works: "
Tom Johnson

The Overview Project » Document mining shows Paul Ryan relying on the the pro... - 0 views

  •  
    Document mining shows Paul Ryan relying on the the programs he criticizes by Jonathan Stray on 11/02/2012 0 One of the jobs of a journalist is to check the record. When Congressman Paul Ryan became a vice-presidential candidate, Associated Press reporter Jack Gillum decided to examine the candidate through his own words. Hundreds of Freedom of Information requests and 9,000 pages later, Gillum wrote a story showing that Ryan has asked for money from many of the same Federal programs he has criticized as wasteful, including stimulus money and funding for alternative fuels. This would have been much more difficult without special software for journalism. In this case Gillum relied on two tools: DocumentCloud to upload, OCR, and search the documents, and Overview to automatically sort the documents into topics and visualize the contents. Both projects are previous Knight News Challenge winners. But first Gillum had to get the documents. As a member of Congress, Ryan isn't subject to the Freedom of Information Act. Instead, Gillum went to every federal agency - whose files are covered under FOIA - for copies of letters or emails that might identify Ryan's favored causes, names of any constituents who sought favors, and more. Bit by bit, the documents arrived - on paper. The stack grew over weeks, eventually piling up two feet high on Gillum's desk. Then he scanned the pages and loaded them into the AP's internal installation of DocumentCloud. The software converts the scanned pages to searchable text, but there were still 9000 pages of material. That's where Overview came in. Developed in house at the Associated Press, this open-source visualization tool processes the full text of each document and clusters similar documents together, producing a visualization that graphically shows the contents of the complete document set. "I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or ext
Tom Johnson

How to make searchable, Web-based Google charts | Poynter. - 0 views

  •  
    How to make searchable, Web-based Google charts Michelle Minkoff by Michelle Minkoff Published June 3, 2011 12:01 am Updated June 2, 2011 10:22 pm A lot of data visualization requires the technical expertise of a programmer and skills that take time and resources to develop. A rise in free tools, however, has made it easier to make interactive graphs in charts, whether you're a designer, developer, Web producer or hobbyist. The Google Visualization API, for instance, gives you options without making the work too complicated. I've created a tutorial below to help you make simple, Web-based Google charts. (You can click on any of the screenshots to go to a larger version.) In the first example, we'll craft an interactive bar chart that compares the numbers of tornado-related deaths in the United States throughout the past four years. We'll use data from the National Oceanic and Atmospheric Administration (NOAA), which can be found here. (You can download a cleaned version of this data here, formatted as a comma-delimited file, CSV.) http://www.poynter.org/how-tos/newsgathering-storytelling/126595/how-to-make-simple-web-based-google-charts
Tom Johnson

Broadcasters don't want to put campaign ad data online, so ProPublica pitches... - 0 views

  • March 22, 2012, 10:18 a.m. .newfront-body #content_div-57696 p:first-child img {display: none;}.linkbody p:first-child img {display: none;} Broadcasters don’t want to put campaign ad data online, so ProPublica pitches work-around With volunteers around the country, the news nonprofit is continuing its efforts to figure out what works and what doesn’t when it comes to crowdsourced reporting.
  •  
    Good piece on how to apply crowd-sourcing. March 22, 2012, 10:18 a.m. Television Broadcasters don't want to put campaign ad data online, so ProPublica pitches work-around With volunteers around the country, the news nonprofit is continuing its efforts to figure out what works and what doesn't when it comes to crowdsourced reporting.
1 - 8 of 8
Showing 20 items per page