Skip to main content

Home/ DJCamp2011/ Group items tagged documents

Rss Feed Group items tagged

Tom Johnson

Mr. People - Data cleaning - 1 views

  •  
    Mr. People Years ago, while trying to clean up the names of donors in campaign finance data from the Federal Election Commission, I hacked together a Perl module - loosely based on the Lingua-EN-NameParse module - to standardize names. One port to Ruby later, I've finally put together a Web front end for it. Try it out below - paste your own data in or try the sample data. To use the people Ruby gem in your own scripts, sudo gem install people, then read the documentation. Suggestions? Send them to mrpeople@ericson.net Allow couples:   Case:  Output:    Paste your names here:
Tom Johnson

Google Correlate - 0 views

  •  
    Google Correlate lets you see how your data relates to search queries Posted: 25 May 2011 11:27 AM PDT Influenza search - Google Correlate A while back, Google showed how Influenza outbreaks correlated to searches for flu-related terms with Google Flu Trends. It helped researchers and policy-makers estimate flu activity much sooner than with previous methods. Google Correlate is the evolution of Flu Trends in that now you can correlate search trends with not just flu cases, but with your own data or other search queries. The above, which you already know about, matches flu cases with searches for "treatment for flu." Similarly, the search phrase that correlates highest with "Toyota for sale" is "used Hyundai," as shown below. You can also see how your data is related geographically. For example, annual rainfall (left) strongly correlates with searches for "disney vacation package." Although, it looks like distance is a strong factor in the latter, which should be a reminder that correlation is different from causation. Google is careful to point this out in their FAQ and explanation of the tool. Nevertheless, it's fun to poke around and sometimes see the non-sensical correlations. For example, the strongest correlation with "flowingdata" is "how to scan a document," because the growth rates of both seem similar. There's also a search by drawing function. You draw a time series, and Correlate finds terms that best match that trend. In the below chart, I drew a line (blue) that had steady growth, but plateaued towards present day. What weird correlations can you find? [Google Correlate]
Tom Johnson

What's in your document? « The Reporter's Lab - 0 views

  •  
    Sarah Cohen, Duke University
Tom Johnson

Corporate Accountability Data in Influence Explorer - Sunlight Labs: Blog - 0 views

  •  
    Again, US-centric, but this might generate some ideas of what could be accomplish in your city/nation. Late yesterday we announced a bunch of new features for Influence Explorer: http://sunlightlabs.com/blog/2011/ie-corporate-accountability/ As the blog post explains, you can now find information about a corporation's EPA violations, federal advisory committee memberships, and participation in the rulemaking process -- all in one place. I wanted to highlight that last feature a bit more, though. To my knowledge, this is the first time that the full corpus of public comments submitted to regulations.gov has been available for bulk download and analysis. This isn't a coincidence: regulations.gov is built using technologies that make scraping it unusually difficult. This is unfortunate, since everyone seems to agree that federal rulemakings are gaining in importance -- both because of congressional gridlock that leaves the regulatory process as a second-best option, and because of calls to simplify the regulatory landscape as a pro-growth measure. It's an area where influence is certainly exerted -- rulemakers are obliged to review every comment -- but little attention is paid to who's flooding dockets with comments, and which directions rules are being pushed. It's taken us several months to develop a reliable solution and to obtain past rulemakings, but we now have the data in hand. We plan to do much more with this dataset, and we're hoping that others will want to dig in, too. You can find a link to the bulk download options in the post above -- the full compressed archive of extracted text and metadata is ~16GB, but we've provided options for grabbing individual agencies' or dockets' data. If anyone wants the original documents (PDFs, DOCs, etc) we can talk through how to make that happen, but as they clock in at 1.5TB we'll want to make sure folks know what they're getting into before we spend the time and bandwidth. Finally, note that we currently o
Tom Johnson

WinMerge - 0 views

  •  
    What is WinMerge? File Comparison More Screenshots… WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Tom Johnson

Annotated Excerpts of the Gates Foundation 990 Form 2009 - Document - NYTimes.com - 0 views

  • Published: May 21, 2011 Annotated Excerpts of the Gates Foundation 990 Form 2009 Nonprofit organizations have to file tax forms, known as 990s, listing each grant recipient. The following are excerpts from the Gates Foundation’s filing for 2009, the latest available, and runs for 263 pages and includes more than 3,000 items. Sam Dillon highlights a few of the more notable examples of how the foundation has increased its attention and dollars devoted to education advocacy. Related Article »
Tom Johnson

The Open Data Manual - Open Data Manual v2.0alpha documentation - 0 views

  •  
    The Open Data Manual This report discusses legal, social and technical aspects of open data. The manual can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data - why to go open, what open is, and the how to 'open' data. To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below).
Tom Johnson

How to use APIs from Twitter, Google & Facebook to find data, ideas | Poynter. - 0 views

  • How to use APIs from Twitter, Google & Facebook to find data, ideas by Katharine Jarmul Published Aug. 8, 2011 1:27 pm Updated Aug. 9, 2011 12:02 am As more and more journalists are finding, APIs are a great way to get data for your Web applications and projects. An API, or application programming interface, enables software programs to communicate with one another. (Chrys Wu wrote a helpful intro here.) To give you a better understanding of how they can help you, I’ve outlined some of the best APIs for finding content and explained how you can use open-source programming tools to glean information from them.
  •  
    How to use APIs from Twitter, Google & Facebook to find data, ideas Katharine Jarmul by Katharine Jarmul Published Aug. 8, 2011 1:27 pm Updated Aug. 9, 2011 12:02 am As more and more journalists are finding, APIs are a great way to get data for your Web applications and projects. An API, or application programming interface, enables software programs to communicate with one another. (Chrys Wu wrote a helpful intro here.) To give you a better understanding of how they can help you, I've outlined some of the best APIs for finding content and explained how you can use open-source programming tools to glean information from them.
‹ Previous 21 - 28 of 28
Showing 20 items per page