Skip to main content

Home/ DJCamp2011/ Group items tagged Analysis

Rss Feed Group items tagged

Tom Johnson

The Overview Project - 0 views

  •  
    How Overview turns Documents into Pictures by Jonathan Stray on 06/04/2012 | 0 Overview produces intricate visualizations of large document sets - beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they're plotted. There are two visualizations in the current prototype version of Overview, and both are based on document clustering.
Tom Johnson

Reactions to Osama bin Laden's death: Female and non-U.S. residents more ambivalent. Vi... - 0 views

  • Reactions to Osama bin Laden’s death: Female and non-U.S. residents more ambivalent. Via the NYT Reactions Matrix
  •  
    By Dan Nguyen Reactions to Osama bin Laden's death: Female and non-U.S. residents more ambivalent. Via the NYT Reactions Matrix By Dan Nguyen | Published: May 9, 2011 This (totally not-double-checked) analysis is a riff off of the excellent New York Times visualization (The Death of a Terrorist: A Turning Point?) of how people reacted to Osama bin Laden's death. In the days following the news, the Times asked online readers to not only write their thoughts on bin Laden's killing, but put a mark on a scatterplot graph that best described their reaction. The Times used the data to show the continuum of reactions from everyone who participated. I wanted to see how reactions differed across geographical location and gender. Includes details of his methodology (and a bit on that of the original NYT graphic)
Tom Johnson

Corporate Accountability Data in Influence Explorer - Sunlight Labs: Blog - 0 views

  •  
    Again, US-centric, but this might generate some ideas of what could be accomplish in your city/nation. Late yesterday we announced a bunch of new features for Influence Explorer: http://sunlightlabs.com/blog/2011/ie-corporate-accountability/ As the blog post explains, you can now find information about a corporation's EPA violations, federal advisory committee memberships, and participation in the rulemaking process -- all in one place. I wanted to highlight that last feature a bit more, though. To my knowledge, this is the first time that the full corpus of public comments submitted to regulations.gov has been available for bulk download and analysis. This isn't a coincidence: regulations.gov is built using technologies that make scraping it unusually difficult. This is unfortunate, since everyone seems to agree that federal rulemakings are gaining in importance -- both because of congressional gridlock that leaves the regulatory process as a second-best option, and because of calls to simplify the regulatory landscape as a pro-growth measure. It's an area where influence is certainly exerted -- rulemakers are obliged to review every comment -- but little attention is paid to who's flooding dockets with comments, and which directions rules are being pushed. It's taken us several months to develop a reliable solution and to obtain past rulemakings, but we now have the data in hand. We plan to do much more with this dataset, and we're hoping that others will want to dig in, too. You can find a link to the bulk download options in the post above -- the full compressed archive of extracted text and metadata is ~16GB, but we've provided options for grabbing individual agencies' or dockets' data. If anyone wants the original documents (PDFs, DOCs, etc) we can talk through how to make that happen, but as they clock in at 1.5TB we'll want to make sure folks know what they're getting into before we spend the time and bandwidth. Finally, note that we currently o
Tom Johnson

ELAN description | The Language Archive - 0 views

  • ELAN description ELAN is a professional tool for the creation of complex annotations on video and audio resources. With ELAN a user can add an unlimited number of annotations to audio and/or video streams. An annotation can be a sentence, word or gloss, a comment, translation or a description of any feature observed in the media. Annotations can be created on multiple layers, called tiers. Tiers can be hierarchically interconnected. An annotation can either be time-aligned to the media or it can refer to other existing annotations. The textual content of annotations is always in Unicode and the transcription is stored in an XML format. ELAN provides several different views on the annotations, each view is connected and synchronized to the media playhead. Up to 4 video files can be associated with an annotation document. Each video can be integrated in the main document window or displayed in its own resizable window. ELAN delegates media playback to an existing media framework, like Windows Media Player, QuickTime or JMF (Java Media Framework). As a result a wide variety of audio and video formats is supported and high performance media playback can be achieved. ELAN is written in the Java programming language and the sources are available for non-commercial use. It runs on Windows, Mac OS X and Linux.
  •  
    ELAN description ELAN is a professional tool for the creation of complex annotations on video and audio resources. With ELAN a user can add an unlimited number of annotations to audio and/or video streams. An annotation can be a sentence, word or gloss, a comment, translation or a description of any feature observed in the media. Annotations can be created on multiple layers, called tiers. Tiers can be hierarchically interconnected. An annotation can either be time-aligned to the media or it can refer to other existing annotations. The textual content of annotations is always in Unicode and the transcription is stored in an XML format. ELAN provides several different views on the annotations, each view is connected and synchronized to the media playhead. Up to 4 video files can be associated with an annotation document. Each video can be integrated in the main document window or displayed in its own resizable window. ELAN delegates media playback to an existing media framework, like Windows Media Player, QuickTime or JMF (Java Media Framework). As a result a wide variety of audio and video formats is supported and high performance media playback can be achieved. ELAN is written in the Java programming language and the sources are available for non-commercial use. It runs on Windows, Mac OS X and Linux.
Tom Johnson

Socrata: Open Data Cloud Solutions for Government Organizations - 0 views

  • Make it easy for your organization to publish and manage public data You can achieve your organization’s transparency goals, cost-effectively, by streamlining the data publishing process and automating maintenance and updates. Internal stakeholders, in any department or agency, with little or no technical assistance, become first-class data publishers. While administrators manage the organization’s data in one central location, offer constituents a consistent and privately-branded online experience and get real-time data consumption and citizen engagement metrics.
  •  
    Make it easy for your organization to publish and manage public data You can achieve your organization's transparency goals, cost-effectively, by streamlining the data publishing process and automating maintenance and updates. Internal stakeholders, in any department or agency, with little or no technical assistance, become first-class data publishers. While administrators manage the organization's data in one central location, offer constituents a consistent and privately-branded online experience and get real-time data consumption and citizen engagement metrics.
Tom Johnson

Mr. People - Data cleaning - 1 views

  •  
    Mr. People Years ago, while trying to clean up the names of donors in campaign finance data from the Federal Election Commission, I hacked together a Perl module - loosely based on the Lingua-EN-NameParse module - to standardize names. One port to Ruby later, I've finally put together a Web front end for it. Try it out below - paste your own data in or try the sample data. To use the people Ruby gem in your own scripts, sudo gem install people, then read the documentation. Suggestions? Send them to mrpeople@ericson.net Allow couples:   Case:  Output:    Paste your names here:
Tom Johnson

Data Science Central - 0 views

  •  
    Welcome to Data Science Central! Data Science Central is the industry's one stop resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central (DSC) provides a true community experience through social interaction, peer to peer technical support, the latest in technology, tools and trends --and even job opportunities. We look forward to hearing your feedback as we grow this community of professionals in our exciting industry during times of dramatic change.
Tom Johnson

Want to help fact-check breaking news like the Malaysian airplane disaster? Here's how ... - 0 views

  •  
    "Want to help fact-check breaking news like the Malaysian airplane disaster? Here's how and where you can do it"
Tom Johnson

Google Correlate - 0 views

  •  
    Google Correlate lets you see how your data relates to search queries Posted: 25 May 2011 11:27 AM PDT Influenza search - Google Correlate A while back, Google showed how Influenza outbreaks correlated to searches for flu-related terms with Google Flu Trends. It helped researchers and policy-makers estimate flu activity much sooner than with previous methods. Google Correlate is the evolution of Flu Trends in that now you can correlate search trends with not just flu cases, but with your own data or other search queries. The above, which you already know about, matches flu cases with searches for "treatment for flu." Similarly, the search phrase that correlates highest with "Toyota for sale" is "used Hyundai," as shown below. You can also see how your data is related geographically. For example, annual rainfall (left) strongly correlates with searches for "disney vacation package." Although, it looks like distance is a strong factor in the latter, which should be a reminder that correlation is different from causation. Google is careful to point this out in their FAQ and explanation of the tool. Nevertheless, it's fun to poke around and sometimes see the non-sensical correlations. For example, the strongest correlation with "flowingdata" is "how to scan a document," because the growth rates of both seem similar. There's also a search by drawing function. You draw a time series, and Correlate finds terms that best match that trend. In the below chart, I drew a line (blue) that had steady growth, but plateaued towards present day. What weird correlations can you find? [Google Correlate]
Tom Johnson

Shorenstein Center paper argues for collaboration in investigative reporting | Harvard ... - 0 views

  • Shorenstein Center paper argues for collaboration in investigative reporting Thursday, June 2, 2011 Sandy Rowe, former editor of The Oregonian, and Knight Fellow at the Shorenstein Center fall 2010 and spring 2011. Photograph by Martha Stewart Shorenstein Center, Harvard Kennedy School Contact: Janell Simsjanell_sims@harvard.eduhttp://www.hks.harvard.edu/presspol/index.html Media organizations may be able to perform their watchdog roles more effectively working together than apart. That is one conclusion in a new paper, “Partners of Necessity: The Case for Collaboration in Local Investigative Reporting,” authored by Sandy Rowe, former editor of Portland’s The Oregonian. The paper is based on interviews and research that Rowe conducted while serving as a Knight Fellow at the Shorenstein Center on the Press, Politics and Public Policy at Harvard Kennedy School. Rowe’s research examines the theory underpinning collaborative work and shows emerging models of collaboration that can lead to more robust investigative and accountability reporting in local and regional markets. “Growing evidence suggests that collaborations and partnerships between new and established news organizations, universities and foundations may be the overlooked key for investigative journalism to thrive at the local and state levels,” Rowe writes. “These partnerships, variously and often loosely organized, can share responsibility for content creation, generate wider distribution of stories and spread the substantial cost of accountability journalism.” Rowe was editor of The Oregonian from 1993 until January 2010. Under her leadership, the newspaper won five Pulitzer Prizes including the Gold Medal for Public Service. Rowe chairs the Board of Visitors of The Knight Fellowships at Stanford University and is a board member of the Committee to Protect Journalists. From 1984 until April 1993, Rowe was executive editor and vice president of The Virginian-Pilot and The Ledger-Star, Norfolk and Virginia Beach, Virginia. The Virginian-Pilot won the Pulitzer Prize for general news reporting under her leadership. Rowe’s year-long fellowship at the Shorenstein Center was funded by the John S. and James L. Knight Foundation. Read the full paper on the Shorenstein Center’s website.
  •  
    Shorenstein Center paper argues for collaboration in investigative reporting Thursday, June 2, 2011 Sandy Rowe, former editor of The Oregonian, and Knight Fellow at the Shorenstein Center fall 2010 and spring 2011. Photograph by Martha Stewart Shorenstein Center, Harvard Kennedy School Contact: Janell Sims janell_sims@harvard.edu http://www.hks.harvard.edu/presspol/index.html Media organizations may be able to perform their watchdog roles more effectively working together than apart. That is one conclusion in a new paper, "Partners of Necessity: The Case for Collaboration in Local Investigative Reporting," authored by Sandy Rowe, former editor of Portland's The Oregonian. The paper is based on interviews and research that Rowe conducted while serving as a Knight Fellow at the Shorenstein Center on the Press, Politics and Public Policy at Harvard Kennedy School. Rowe's research examines the theory underpinning collaborative work and shows emerging models of collaboration that can lead to more robust investigative and accountability reporting in local and regional markets. "Growing evidence suggests that collaborations and partnerships between new and established news organizations, universities and foundations may be the overlooked key for investigative journalism to thrive at the local and state levels," Rowe writes. "These partnerships, variously and often loosely organized, can share responsibility for content creation, generate wider distribution of stories and spread the substantial cost of accountability journalism." Rowe was editor of The Oregonian from 1993 until January 2010. Under her leadership, the newspaper won five Pulitzer Prizes including the Gold Medal for Public Service. Rowe chairs the Board of Visitors of The Knight Fellowships at Stanford University and is a board member of the Committee to Protect Journalists. From 1984 until April 1993, Rowe was executive editor and vice president of The Virginian-Pi
Tom Johnson

Places and Spaces :: Mapping Science - 0 views

  • Places & Spaces: Mapping Science is meant to inspire cross-disciplinary discussion on how to best track and communicate human activity and scientific progress on a global scale. It has two components: the physical part supports the close inspection of high quality reproductions of maps for display at conferences and education centers; the online counterpart provides links to a selected series of maps and their makers along with detailed explanations of how these maps work. The exhibit is a 10-year effort. Each year, 10 new maps are added resulting in 100 maps total in 2014.
  •  
    Places & Spaces: Mapping Science is meant to inspire cross-disciplinary discussion on how to best track and communicate human activity and scientific progress on a global scale. It has two components: the physical part supports the close inspection of high quality reproductions of maps for display at conferences and education centers; the online counterpart provides links to a selected series of maps and their makers along with detailed explanations of how these maps work. The exhibit is a 10-year effort. Each year, 10 new maps are added resulting in 100 maps total in 2014.
Tom Johnson

Download PowerPivot - Excel - Office.com - 0 views

  •  
    Tom Torok (NYT) writes: After years of looking down my nose at Excel because of its limitations, I have to say that I'm very impressed with Excel 2010 when used with a free Microsoft add-in called PowerPivot. http://office.microsoft.com/en-us/excel/download-powerpivot-HA101959985.aspx In a PowerPivot tutorial (link below), I imported eight tables  from several sources and joined them - yes, you can join relational data. It uses some magical data compression that allows for lightning fast sorts, filters and calculated fields. The largest table in the tutorial has about 2 million rows. A calculated field on that table took seconds. A did a pivot table on the table and the answers appeared as soon as I selected the fields. In one of  the training videos (http://www.powerpivot.com/) an MS guy works with a 101 million-record table on his laptop. It's really amazing. http://powerpivotsdr.codeplex.com/ If you install, be sure to read the prerequisites or you'll be installing and uninstalling both PowerPivot and Excel. I'm running it on a 32-bit XP machine (it won't run on a 64-bit XP but will work on Windows 7 64-bit). The tutorial is for a Windows 7 setup, but there are items in the menu bar that match the reference to the tutorial's ribbon. I noticed that if I call up an xlsx by double clicking on a file in Windows Explorer that PowerPivot is not enabled in the ribbon. If you call up a file from within Excel 2010 everything works as advertised.Regards, TT  
Tom Johnson

Benetech® :: Human Rights :: Overview - 0 views

  •  
    We are committed to equal access to technology. Our software is freely available, and anyone may share our technology and modify it to suit their needs - all without asking our permission. Benetech created Martus and Analyzer specifically for human rights data collection, coding and processing. These tools include cryptographic security features and flexible data structures that can be adapted to the needs of each human rights project. By releasing our software as open source, we participate in the technological community where tools can be audited and improved by others, as well as enabling widespread access to our ideas.
Tom Johnson

Jigsaw: Visual Analytics for Exploring and Understanding Document Collections - 0 views

  •  
    Be sure to view the video tutorial: http://www.cc.gatech.edu/gvu/ii/jigsaw/Jigsaw-tutorial.movhttp://www.cc.gatech.edu/gvu/ii/jigsaw/Jigsaw-tutorial.mov http://www.cc.gatech.edu/gvu/ii/jigsaw/views.html Jigsaw: Visual Analytics for Exploring and Understanding Document Collections System Views Jigsaw presents the individual reports in a document collection and the entities within those reports through a series of visualizations. We call these visualizations the system views. Below, we illustrate each view provided by the system and briefly describe their characteristics. Click on the individual images to see a larger version of the view. Also, a tutorial video illustrates the different views as well and the interactive behavior for each view can be seen on the video tutorial page. -tj
  •  
    Also see "The Information Interfaces Group, an HCI research group in the School of Interactive Computing at Georgia Tech, develops computing technologies that help people take advantage of information to enrich their lives. " http://www.cc.gatech.edu/gvu/ii/
Tom Johnson

Needlebase - for acquiring, integrating, cleansing, analyzing and publishing data on th... - 1 views

  • ITA Software is proud to introduce Needlebase™, a revolutionary platform for acquiring, integrating, cleansing, analyzing and publishing data on the web.  Using Needlebase through a web browser, without programmers or DBAs, your data team can easily: acquire data from multiple sources:  A simple tagging process quickly imports structured data from complex websites, XML feeds, and spreadsheets into a unified database of your design. merge, deduplicate and cleanse: Needlebase uses intelligent semantics to help you find and merge variant forms of the same record.  Your merges, edits and deletions persist even after the original data is refreshed from its source. build and publish custom data views: Use Needlebase's visual UI and powerful query language to configure exactly your desired view of the data, whether as a list, table, grid, or map.  Then, with one click, publish the data for others to see, or export a feed of the clean data to your own local database. Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything. Read on to learn more about Needlebase's capabilities and our early adopters' success stories, or watch our tutorial videos. Then sign up to get started!
  •  
    ITA Software is proud to introduce Needlebase™, a revolutionary platform for acquiring, integrating, cleansing, analyzing and publishing data on the web. Using Needlebase through a web browser, without programmers or DBAs, your data team can easily: acquire data from multiple sources: A simple tagging process quickly imports structured data from complex websites, XML feeds, and spreadsheets into a unified database of your design. merge, deduplicate and cleanse: Needlebase uses intelligent semantics to help you find and merge variant forms of the same record. Your merges, edits and deletions persist even after the original data is refreshed from its source. build and publish custom data views: Use Needlebase's visual UI and powerful query language to configure exactly your desired view of the data, whether as a list, table, grid, or map. Then, with one click, publish the data for others to see, or export a feed of the clean data to your own local database. Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything. Read on to learn more about Needlebase's capabilities and our early adopters' success stories, or watch our tutorial videos. Then sign up to get started! http://needlebase.com
Tom Johnson

An Applied Demography Toolbox - 1 views

  •  
    An Applied Demography Toolbox A collection of applied demography programs, scripts, spreadsheets and databases. If you have any questions (such as how to apply the tools to your own work), recommendations or additions, you can send a message to me (Eddie Hunsinger) at edynivn@gmail.com. If you would like to use, share or reproduce any information or ideas from the linked files, be sure to cite the respective source. Here is a neat article that gives this site some inspiration. Acknowledgments. Subscribe to new postings . Return to Eddie's homepage. http://www.demog.berkeley.edu/~eddieh/toolbox.html#MedianCalculator
  •  
    These tend to be US-centric, but there are universal tools here for statistical analysis.
Tom Johnson

Dirty Energy Money: About This Tool - 0 views

  • The Dirty Energy Money tool provides an illustration of the network of funding relationships between Dirty Energy companies and politicians. You can use the interactive network map to explore our database campaign contribution relationships. Politicians and companies are positioned by their relationships, those that are close together tend to have similar patterns of giving and receiving.
  •  
    The Dirty Energy Money tool provides an illustration of the network of funding relationships between Dirty Energy companies and politicians. You can use the interactive network map to explore our database campaign contribution relationships. Politicians and companies are positioned by their relationships, those that are close together tend to have similar patterns of giving and receiving.
Tom Johnson

10 ways to screw up your spreadsheet design | TechRepublic - 0 views

  •  
    10 ways to screw up your spreadsheet design Recommend +21 Votes 36 Comments 46Share more + By Susan Harkins June 23, 2011, 8:25 AM PDT Takeaway: How you set up a spreadsheet determines its efficiency, usability, and reliability. Avoiding these pitfalls during the design phase will save you a million headaches. Wrong references, missing values, and invalid data aren't the only things that will ruin a spreadsheet. The development process starts before you do a thing, while you're planning the design. These types of mistakes are worse than bugs because you can't troubleshoot them. All you can do is start over. Here are 10 mistakes to avoid early in the process, when you're still in the decision-making phase.
  •  
    A good list and read down into the comments; additional good tips there.
‹ Previous 21 - 39 of 39
Showing 20 items per page