Group items tagged overview - DJCamp2011

The Overview Project » Using Overview to analyze 4500 pages of documents on s... - 0 views

overview.ap.org/...y-contractors-in-iraq-analysis

opendata project documents text analysis qualitative analysis

shared by Tom Johnson on 21 Feb 12 - No Cached

Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search.  But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents — to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
...

Cancel

Tom Johnson on 21 Feb 12

Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search. But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents - to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources

<div class="cArrow"> </div><div class="cContentInner">Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search. But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents - to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources</div>

...

Cancel

The Overview Project » Document mining shows Paul Ryan relying on the the pro... - 0 views

overview.ap.org/...the-the-programs-he-criticizes

overview project qualitative analysis analysis

shared by Tom Johnson on 02 Nov 12 - No Cached

Tom Johnson on 02 Nov 12

Document mining shows Paul Ryan relying on the the programs he criticizes by Jonathan Stray on 11/02/2012 0 One of the jobs of a journalist is to check the record. When Congressman Paul Ryan became a vice-presidential candidate, Associated Press reporter Jack Gillum decided to examine the candidate through his own words. Hundreds of Freedom of Information requests and 9,000 pages later, Gillum wrote a story showing that Ryan has asked for money from many of the same Federal programs he has criticized as wasteful, including stimulus money and funding for alternative fuels. This would have been much more difficult without special software for journalism. In this case Gillum relied on two tools: DocumentCloud to upload, OCR, and search the documents, and Overview to automatically sort the documents into topics and visualize the contents. Both projects are previous Knight News Challenge winners. But first Gillum had to get the documents. As a member of Congress, Ryan isn't subject to the Freedom of Information Act. Instead, Gillum went to every federal agency - whose files are covered under FOIA - for copies of letters or emails that might identify Ryan's favored causes, names of any constituents who sought favors, and more. Bit by bit, the documents arrived - on paper. The stack grew over weeks, eventually piling up two feet high on Gillum's desk. Then he scanned the pages and loaded them into the AP's internal installation of DocumentCloud. The software converts the scanned pages to searchable text, but there were still 9000 pages of material. That's where Overview came in. Developed in house at the Associated Press, this open-source visualization tool processes the full text of each document and clusters similar documents together, producing a visualization that graphically shows the contents of the complete document set. "I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or ext

<div class="cArrow"> </div><div class="cContentInner"> Document mining shows Paul Ryan relying on the the programs he criticizes by Jonathan Stray on 11/02/2012 0 One of the jobs of a journalist is to check the record. When Congressman Paul Ryan became a vice-presidential candidate, Associated Press reporter Jack Gillum decided to examine the candidate through his own words. Hundreds of Freedom of Information requests and 9,000 pages later, Gillum wrote a story showing that Ryan has asked for money from many of the same Federal programs he has criticized as wasteful, including stimulus money and funding for alternative fuels. This would have been much more difficult without special software for journalism. In this case Gillum relied on two tools: DocumentCloud to upload, OCR, and search the documents, and Overview to automatically sort the documents into topics and visualize the contents. Both projects are previous Knight News Challenge winners. But first Gillum had to get the documents. As a member of Congress, Ryan isn't subject to the Freedom of Information Act. Instead, Gillum went to every federal agency - whose files are covered under FOIA - for copies of letters or emails that might identify Ryan's favored causes, names of any constituents who sought favors, and more. Bit by bit, the documents arrived - on paper. The stack grew over weeks, eventually piling up two feet high on Gillum's desk. Then he scanned the pages and loaded them into the AP's internal installation of DocumentCloud. The software converts the scanned pages to searchable text, but there were still 9000 pages of material. That's where Overview came in. Developed in house at the Associated Press, this open-source visualization tool processes the full text of each document and clusters similar documents together, producing a visualization that graphically shows the contents of the complete document set. "I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or ext</div>

...

Cancel

The Overview Project » VIDEO: document mining with Overview - 0 views

overview.ap.org/...-document-mining-with-overview

qualitative analysis analytic journalism analysis overview

shared by Tom Johnson on 02 Nov 12 - No Cached

Tom Johnson on 02 Nov 12

VIDEO: document mining with Overview by Jonathan Stray on 10/31/2012 0 With the release of the new, web-only version of Overview that runs in your browser, we thought it was time to make a little video showing how to use it. If that doesn't answer your questions, see also the help page, and the FAQ.

<div class="cArrow"> </div><div class="cContentInner">VIDEO: document mining with Overview by Jonathan Stray on 10/31/2012 0 With the release of the new, web-only version of Overview that runs in your browser, we thought it was time to make a little video showing how to use it. If that doesn't answer your questions, see also the help page, and the FAQ.</div>

...

Cancel

The Overview Project - 0 views

overview.ap.org

social network analysis data mining Infoviz content analysis

shared by Tom Johnson on 03 Aug 12 - No Cached

Tom Johnson on 03 Aug 12

How Overview turns Documents into Pictures by Jonathan Stray on 06/04/2012 | 0 Overview produces intricate visualizations of large document sets - beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they're plotted. There are two visualizations in the current prototype version of Overview, and both are based on document clustering.

<div class="cArrow"> </div><div class="cContentInner">How Overview turns Documents into Pictures by Jonathan Stray on 06/04/2012 | 0 Overview produces intricate visualizations of large document sets - beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they're plotted. There are two visualizations in the current prototype version of Overview, and both are based on document clustering.</div>

...

Cancel

Introduction to Infographics and Data Visualization: Knight Center's first Massive Open... - 0 views

knightcenter.utexas.edu/...-first-massive-open-online-cou

data visualization visualization knight infographics

shared by Tom Johnson on 08 Oct 12 - No Cached

ntroduction to Infographics and Data Visualization: Knight Center's first Massive Open Online Course Registration is now open for the Knight Center's first MOOC (Massive Open Online Course). The course will formally begin on Sunday, October 28, 2012 through Saturday, December 8, 2012. Below are course details and how to register. The introductory area of the course is now available to enrolled students. The introductory area includes access to the course syllabus and the introductory overview video for the course. Course Dates:  Sunday, October 28, 2012 - Saturday, December 8, 2012 Course Language:  English Instructor:  Alberto Cairo Course Objectives:  • How to analyze and critique infographics and visualizations in newspapers, books, TV, etc., and how to propose alternatives that would improve them. • How to plan for data-based storytelling through charts, maps, and diagrams. • How to design infographics and visualizations that are not just attractive but, above all, informative, deep, and accurate. • The rules of graphic design and of interaction design, applied to infographics and visualizations. • Optional: How to use Adobe Illustrator to create infographics.
...

Cancel

Tom Johnson on 08 Oct 12

ntroduction to Infographics and Data Visualization: Knight Center's first Massive Open Online Course "Bookmark Registration is now open for the Knight Center's first MOOC (Massive Open Online Course). The course will formally begin on Sunday, October 28, 2012 through Saturday, December 8, 2012. Below are course details and how to register. The introductory area of the course is now available to enrolled students. The introductory area includes access to the course syllabus and the introductory overview video for the course. Course Dates: Sunday, October 28, 2012 - Saturday, December 8, 2012 Course Language: English Instructor: Alberto Cairo Course Objectives: * How to analyze and critique infographics and visualizations in newspapers, books, TV, etc., and how to propose alternatives that would improve them. * How to plan for data-based storytelling through charts, maps, and diagrams. * How to design infographics and visualizations that are not just attractive but, above all, informative, deep, and accurate. * The rules of graphic design and of interaction design, applied to infographics and visualizations. * Optional: How to use Adobe Illustrator to create infographics.

<div class="cArrow"> </div><div class="cContentInner">ntroduction to Infographics and Data Visualization: Knight Center's first Massive Open Online Course "Bookmark Registration is now open for the Knight Center's first MOOC (Massive Open Online Course). The course will formally begin on Sunday, October 28, 2012 through Saturday, December 8, 2012. Below are course details and how to register. The introductory area of the course is now available to enrolled students. The introductory area includes access to the course syllabus and the introductory overview video for the course. Course Dates: Sunday, October 28, 2012 - Saturday, December 8, 2012 Course Language: English Instructor: Alberto Cairo Course Objectives: * How to analyze and critique infographics and visualizations in newspapers, books, TV, etc., and how to propose alternatives that would improve them. * How to plan for data-based storytelling through charts, maps, and diagrams. * How to design infographics and visualizations that are not just attractive but, above all, informative, deep, and accurate. * The rules of graphic design and of interaction design, applied to infographics and visualizations. * Optional: How to use Adobe Illustrator to create infographics. </div>

...

Cancel

Statistical Reasoning I - 0 views

ocw.jhsph.edu/...index

statistics analytic journalism analysis visualization

shared by Tom Johnson on 20 Jul 12 - No Cached

Tom Johnson on 20 Jul 12

Statistical Reasoning 1 http://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticalReasoning1/coursePage/index/ Most people could probably use a bit of a refresher on statistical reasoning and its methods, and this free course from Johns Hopkins University is a great way to get started on the road back to statistical literacy. The course was originally taught by John McGready and provides "a broad overview of biostatistical methods and concepts used in the public health sciences." Users will find that the home page includes links to the course syllabus, schedule, lecture materials, readings, and additional assignments. The Lecture Materials area includes course notes from the seven modules here. The topics include "Describing Data," "An Introduction to Hypothesis Testing," and "When Time Is of Interest: The Case for Survival Analysis." Visitors can also take advantage of the assignments, which correspond to the readings and the lecture materials. The site is completed by the Other Resources area, which includes a special lecture on the software package Stata and a flowchart designed to help students learn how to choose the correct statistical procedure for the task at hand. [KMG]

<div class="cArrow"> </div><div class="cContentInner">Statistical Reasoning 1 <a href="http://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticalReasoning1/coursePage/index/" rel="nofollow" target="_blank">http://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticalReasoning1/coursePage/index/</a> Most people could probably use a bit of a refresher on statistical reasoning and its methods, and this free course from Johns Hopkins University is a great way to get started on the road back to statistical literacy. The course was originally taught by John McGready and provides "a broad overview of biostatistical methods and concepts used in the public health sciences." Users will find that the home page includes links to the course syllabus, schedule, lecture materials, readings, and additional assignments. The Lecture Materials area includes course notes from the seven modules here. The topics include "Describing Data," "An Introduction to Hypothesis Testing," and "When Time Is of Interest: The Case for Survival Analysis." Visitors can also take advantage of the assignments, which correspond to the readings and the lecture materials. The site is completed by the Other Resources area, which includes a special lecture on the software package Stata and a flowchart designed to help students learn how to choose the correct statistical procedure for the task at hand. [KMG] </div>

...

Cancel

http://theyrule.net - 1 views

theyrule.net

visualization politics corporate networks open data

shared by Tom Johnson on 22 Oct 11 - Cached

Tom Johnson on 22 Oct 11

They Rule Overview They Rule aims to provide a glimpse of some of the relationships of the US ruling class. It takes as its focus the boards of some of the most powerful U.S. companies, which share many of the same directors. Some individuals sit on 5, 6 or 7 of the top 1000 companies. It allows users to browse through these interlocking directories and run searches on the boards and companies. A user can save a map of connections complete with their annotations and email links to these maps to others. They Rule is a starting point for research about these powerful individuals and corporations. Context A few companies control much of the economy and oligopolies exert control in nearly every sector of the economy. The people who head up these companies swap on and off the boards from one company to another, and in and out of government committees and positions. These people run the most powerful institutions on the planet, and we have almost no say in who they are. This is not a conspiracy, they are proud to rule, yet these connections of power are not always visible to the public eye. Karl Marx once called this ruling class a 'band of hostile brothers.' They stand against each other in the competitve struggle for the continued accumulation of their capital, but they stand together as a family supporting their interests in perpetuating the profit system as whole. Protecting this system can require the cover of a 'legitimate' force - and this is the role that is played by the state. An understanding of this system can not be gleaned from looking at the inter-personal relations of this class alone, but rather how they stand in relation to other classes in society. Hopefully They Rule will raise larger questions about the structure of our society and in whose benefit it is run. The Data We do not claim that this data is 100% accurate at all times. Corporate directors have a habit of dying, quitting boards, joining new ones and most frustratingly passing on their name

<div class="cArrow"> </div><div class="cContentInner">They Rule Overview They Rule aims to provide a glimpse of some of the relationships of the US ruling class. It takes as its focus the boards of some of the most powerful U.S. companies, which share many of the same directors. Some individuals sit on 5, 6 or 7 of the top 1000 companies. It allows users to browse through these interlocking directories and run searches on the boards and companies. A user can save a map of connections complete with their annotations and email links to these maps to others. They Rule is a starting point for research about these powerful individuals and corporations. Context A few companies control much of the economy and oligopolies exert control in nearly every sector of the economy. The people who head up these companies swap on and off the boards from one company to another, and in and out of government committees and positions. These people run the most powerful institutions on the planet, and we have almost no say in who they are. This is not a conspiracy, they are proud to rule, yet these connections of power are not always visible to the public eye. Karl Marx once called this ruling class a 'band of hostile brothers.' They stand against each other in the competitve struggle for the continued accumulation of their capital, but they stand together as a family supporting their interests in perpetuating the profit system as whole. Protecting this system can require the cover of a 'legitimate' force - and this is the role that is played by the state. An understanding of this system can not be gleaned from looking at the inter-personal relations of this class alone, but rather how they stand in relation to other classes in society. Hopefully They Rule will raise larger questions about the structure of our society and in whose benefit it is run. The Data We do not claim that this data is 100% accurate at all times. Corporate directors have a habit of dying, quitting boards, joining new ones and most frustratingly passing on their name</div>

...

Cancel
mneuman on 23 Oct 11

I think this data must be very useful to the people in Occupy Wall Street

<div class="cArrow"> </div><div class="cContentInner">I think this data must be very useful to the people in Occupy Wall Street</div>

...

Cancel

Benetech® :: Human Rights :: Overview - 0 views

www.benetech.org/human_rights

open data government data transparency non-profits Analysis human rights

shared by Tom Johnson on 18 Jul 11 - No Cached

Tom Johnson on 18 Jul 11

We are committed to equal access to technology. Our software is freely available, and anyone may share our technology and modify it to suit their needs - all without asking our permission. Benetech created Martus and Analyzer specifically for human rights data collection, coding and processing. These tools include cryptographic security features and flexible data structures that can be adapted to the needs of each human rights project. By releasing our software as open source, we participate in the technological community where tools can be audited and improved by others, as well as enabling widespread access to our ideas.

<div class="cArrow"> </div><div class="cContentInner">We are committed to equal access to technology. Our software is freely available, and anyone may share our technology and modify it to suit their needs - all without asking our permission. Benetech created Martus and Analyzer specifically for human rights data collection, coding and processing. These tools include cryptographic security features and flexible data structures that can be adapted to the needs of each human rights project. By releasing our software as open source, we participate in the technological community where tools can be audited and improved by others, as well as enabling widespread access to our ideas.</div>

...

Cancel

Group items tagged