Skip to main content

Home/ DJCamp2011/ Group items tagged analysis

Rss Feed Group items tagged

Tom Johnson

T-LAB Tools for Text Analysis - 0 views

  •  
    The all-in-one software for Content Analysis and Text Mining Hello We are pleased to announce the release of T-LAB 8.0. This version represents a major change in the usability and the effectiveness of our software for text analysis. The most significant improvements concern the integration of bottom-up (i.e. unsupervised) methods for exploratory text analysis with top-down (i.e. supervised) approaches for the automated classification of textual units like words, sentences, paragraphs and documents. Among other things, this means that - besides discovering emerging patterns of words and themes from texts - the users can now easily build, apply and validate their models (e.g. dictionaries of categories or pre-existing manual categorizations) both for classical content analysis and for sentiment analysis. For this purpose several T-LAB functionalities have been expanded and a new ergonomic and powerful tool named 'Dictionary-Based Classification' has been added. No specific dictionaries have been built in; however, with some minor re-formatting, lots of resources available over the Internet and customized word lists can be quickly imported. Last but not least, in order to meet the needs of many customers, temporary licenses of the software are now on sale; moreover, without any time limit, the trial mode of the software now allows you to analyse your own texts up to 20 kb in txt format, each of which can include up to 20 short documents. To learn more, use the following link http://www.tlab.it/en/80news.php The Demo, the User's Manual and the Quick Introduction are available at http://www.tlab.it/en/download.php Kind Regards The T-LAB Team web: http://www.tlab.it/ e-mail: info@tlab.it
Tom Johnson

The Overview Project » Using Overview to analyze 4500 pages of documents on s... - 0 views

  • Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search.  But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents — to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
  •  
    Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search. But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents - to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
Tom Johnson

Software for Content Analysis - 0 views

  •  
    "Software for Content Analysis: Links to external sites The list below provides links to web sites where one can find information (often including purchasing information) regarding content analysis software as well as other types of software that are often utilized by content analysts. The list was last updated in December 2008. Some links may change. You might also find Will Lowe's Review of Software for Content Analysis useful. "
Tom Johnson

8 cool tools for data analysis, visualization and presentation - Computerworld - 0 views

  •  
    I came back from last year's National Institute for Computer-Assisted Reporting (NICAR) conference with 22 free tools for data visualization and analysis -- most of which are still popular and worth a look. At this year's conference, I learned about other free (or at least inexpensive) tools for data analysis and presentation. Want to see all the tools from last year and 2012? For quick reference, check out our chart listing all 30 free data visualization and analysis tools. Like that previous group of 22 tools, these range from easy enough for a beginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiring hands-on coding). Here are eight of the best:
Tom Johnson

Data Without Borders | Connecting data science and non-profits in the service of humanity. - 0 views

  • Data Without Borders seeks to match non-profits in need of data analysis with freelance and pro bono data scientists who can work to help them with data collection, analysis, visualization, or decision support.
  •  
    Data Without Borders seeks to match non-profits in need of data analysis with freelance and pro bono data scientists who can work to help them with data collection, analysis, visualization, or decision support.
  •  
    A good resource to extend the intellectual power and reach of your newsroom.
Tom Johnson

MDA Analytics - 0 views

  •  
    An interesting example of yet another "next generation" data analysis and presentation tool. You can see the demos at http://www.lavastorm.com/ Emphasis is on visualizing the data analytic method while doing the analysis.
Tom Johnson

DIVA-GIS | DIVA-GIS: free, simple & effective - 0 views

  • DIVA-GIS DIVA-GIS is a free computer program for mapping and geographic data analysis (a geographic information system (GIS). With DIVA-GIS you can make maps of the world, or of a very small area, using, for example, state boundaries, rivers, a satellite image, and the locations of sites where an animal species was observed. We also provide free spatial data for the whole world that you can use in DIVA-GIS or other programs. You can use the discussion forum to ask questions, report problems, or make suggestions. Or contact us, and read the blog entries for the latest news. But first download the program and read the documentation. DIVA-GIS is particularly useful for mapping and analyzing biodiversity data, such as the distribution of species, or other 'point-distributions'. It reads and write standard data formats such as ESRI shapefiles, so interoperability is not a problem. DIVA-GIS runs on Windows and (with minor effort) on Mac OSX (see instructions). You can use the program to analyze data, for example by making grid (raster) maps of the distribution of biological diversity, to find areas that have high, low, or complementary levels of diversity. And you can also map and query climate data. You can predict species distributions using the BIOCLIM or DOMAIN models.
  •  
    DIVA-GIS DIVA-GIS is a free computer program for mapping and geographic data analysis (a geographic information system (GIS). With DIVA-GIS you can make maps of the world, or of a very small area, using, for example, state boundaries, rivers, a satellite image, and the locations of sites where an animal species was observed. We also provide free spatial data for the whole world that you can use in DIVA-GIS or other programs. You can use the discussion forum to ask questions, report problems, or make suggestions. Or contact us, and read the blog entries for the latest news. But first download the program and read the documentation. DIVA-GIS is particularly useful for mapping and analyzing biodiversity data, such as the distribution of species, or other 'point-distributions'. It reads and write standard data formats such as ESRI shapefiles, so interoperability is not a problem. DIVA-GIS runs on Windows and (with minor effort) on Mac OSX (see instructions). You can use the program to analyze data, for example by making grid (raster) maps of the distribution of biological diversity, to find areas that have high, low, or complementary levels of diversity. And you can also map and query climate data. You can predict species distributions using the BIOCLIM or DOMAIN models.
  •  
    DIVA-GIS DIVA-GIS is a free computer program for mapping and geographic data analysis (a geographic information system (GIS). With DIVA-GIS you can make maps of the world, or of a very small area, using, for example, state boundaries, rivers, a satellite image, and the locations of sites where an animal species was observed. We also provide free spatial data for the whole world that you can use in DIVA-GIS or other programs. You can use the discussion forum to ask questions, report problems, or make suggestions. Or contact us, and read the blog entries for the latest news. But first download the program and read the documentation. DIVA-GIS is particularly useful for mapping and analyzing biodiversity data, such as the distribution of species, or other 'point-distributions'. It reads and write standard data formats such as ESRI shapefiles, so interoperability is not a problem. DIVA-GIS runs on Windows and (with minor effort) on Mac OSX (see instructions). You can use the program to analyze data, for example by making grid (raster) maps of the distribution of biological diversity, to find areas that have high, low, or complementary levels of diversity. And you can also map and query climate data. You can predict species distributions using the BIOCLIM or DOMAIN models.
Tom Johnson

Michelle Minkoff » Learning to love…grep (let the computer search text for you) - 0 views

  • Blog Learning to love…grep (let the computer search text for you) Posted by Michelle Minkoff on Aug 9, 2012 in Blog, Uncategorized | No Comments I’ve gotten into the habit of posting daily learnings on Twitter, but some things require a more in-depth reminder. I also haven’t done as much paying as forward as I’d like (but I’m having a TON of fun!  and dealing with health problems!  but mostly fun!) I’d like to try to start posting more helpful tips here, partially as a notebook for myself, and partially to help others with similar issues. Today’s problem: I needed to search for a few lines of text, which could be contained in any one of nine files with 100,000 lines each. Opening all of the files took a very long time on my computer, not to mention executing a search. Enter the “grep” command in Terminal, that allows you to quickly search files using the power of the computer.
  •  
    Blog Learning to love…grep (let the computer search text for you) Posted by Michelle Minkoff on Aug 9, 2012 in Blog, Uncategorized | No Comments I've gotten into the habit of posting daily learnings on Twitter, but some things require a more in-depth reminder. I also haven't done as much paying as forward as I'd like (but I'm having a TON of fun! and dealing with health problems! but mostly fun!) I'd like to try to start posting more helpful tips here, partially as a notebook for myself, and partially to help others with similar issues. Today's problem: I needed to search for a few lines of text, which could be contained in any one of nine files with 100,000 lines each. Opening all of the files took a very long time on my computer, not to mention executing a search. Enter the "grep" command in Terminal, that allows you to quickly search files using the power of the computer.
  •  
    An easy to use method for content analysis
Tom Johnson

Reporters' Lab // How a conference taught me I know nothing - 0 views

  •  
    Some good pointers here, especially for data retrieval and analysis:
  •  
    Some good pointers here, especially for data retrieval and analysis:
Tom Johnson

Interactive Dynamics for Visual Analysis - - 0 views

  •  
    A taxonomy of tools that support the fluent and flexible use of visualizations Jeffrey Heer, Stanford University Ben Shneiderman, University of Maryland, College Park The increasing scale and availability of digital data provides an extraordinary resource for informing public policy, scientific discovery, business strategy, and even our personal lives. To get the most out of such data, however, users must be able to make sense of it: to pursue questions, uncover patterns of interest, and identify (and potentially correct) errors. In concert with data-management systems and statistical algorithms, analysis requires contextualized human judgments regarding the domain-specific significance of the clusters, trends, and outliers discovered in data.
Tom Johnson

Data Visualization Platform, Weave, Now Open Source | Government In The Lab - 0 views

  •  
    Data Visualization Platform, Weave, Now Open Source Logo Open Source Initiative Image via Wikipedia Civic Commons, Contributors (Karl Fogel, Author) With more and more civic data becoming available and accessible, the challenge grows for policy makers and citizens to leverage that data for better decision-making. It is often difficult to understand context and perform analysis. "Weave", however, helps. A web-based data visualization tool, Weave enables users to explore, analyze, visualize and disseminate data online from any location at any time. We saw tremendous potential in the platform and have been helping open-source the software, advising on community engagement strategy and licensing. This week, we were excited to see the soft launch of the Weave 1.0 Beta, which went open-source on Wednesday, June 15. Weave is the result of a broad partnership: it was developed by the Institute for Visualization and Perception Research at the University of Massachussetts Lowell, with support from the Open Indicators Consortium, which is made up of over ten municipal, regional, and state member organizations. This consortium will probably expand now that Weave is open source, leading hopefully to greater collaboration, more development, and further innovation on this important platform. Early-adopter data geeks should give it a spin. One of Weave's key features is high-speed interactivity and responsiveness, which is somewhat unusual in web-based visualization software; try out the demo sites or watch the video below. Our congratulations and thanks to the Weave team! As city management is increasingly data-driven, so data analysis and visualization tools will continue to be an important part of every city manager's toolkit. We are excited to see this evolving toolkit enter the civic commons. http://govinthelab.com/data-visualization-platform-weave-now-open-source
Tom Johnson

Resources - Data and Software - Capturing human rights data in Analyzer - 0 views

  • Capturing human rights data in Analyzer Human rights groups collect data containing details of human right abuses from various sources, including medical records, newspaper articles, witness testimonies, letters, interviews, and official reports and documents. Analyzer can be used to capture this data for analysis. Data is coded according to the "Who did what to whom" model and entered into the capture set of Analyzer. Data about the source of the information is entered in the source tab, shown in Figure 1. (Note: the data in the figures shown here, unless otherwise indicated, are from a sample, not an actual, dataset.)
  •  
    Human rights groups collect data containing details of human right abuses from various sources, including medical records, newspaper articles, witness testimonies, letters, interviews, and official reports and documents. Analyzer can be used to capture this data for analysis. Data is coded according to the "Who did what to whom" model and entered into the capture set of Analyzer. Data about the source of the information is entered in the source tab, shown in Figure 1. (Note: the data in the figures shown here, unless otherwise indicated, are from a sample, not an actual, dataset.)
Tom Johnson

Statistical Reasoning I - 0 views

  •  
    Statistical Reasoning 1 http://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticalReasoning1/coursePage/index/ Most people could probably use a bit of a refresher on statistical reasoning and its methods, and this free course from Johns Hopkins University is a great way to get started on the road back to statistical literacy. The course was originally taught by John McGready and provides "a broad overview of biostatistical methods and concepts used in the public health sciences." Users will find that the home page includes links to the course syllabus, schedule, lecture materials, readings, and additional assignments. The Lecture Materials area includes course notes from the seven modules here. The topics include "Describing Data," "An Introduction to Hypothesis Testing," and "When Time Is of Interest: The Case for Survival Analysis." Visitors can also take advantage of the assignments, which correspond to the readings and the lecture materials. The site is completed by the Other Resources area, which includes a special lecture on the software package Stata and a flowchart designed to help students learn how to choose the correct statistical procedure for the task at hand. [KMG]
Tom Johnson

mapping texts/texas - 0 views

  •  
    Assessing Language Patterns: A Look at Texas Newspapers, 1829-2008 This visualization plots the language patterns embedded in 232,567 pages of historical Texas newspapers, as they evolved over time and space. For any date range and location, you can browse the most common words (word counts), named entities (people, places, etc), and highly correlated words (topic models). [ About Mapping Texts ]
Tom Johnson

Reconstruction 2012 - 0 views

  •  
    "ReConstitution 2012, a fun experiment by Sosolimited, processes transcripts from the presidential debates, and recreates them with animated words and charts. Part data visualization, part experimental typography, ReConstitution 2012 is a live web app linked to the US Presidential Debates. During and after the three debates, language used by the candidates generates a live graphical map of the events. Algorithms track the psychological states of Romney and Obama and compare them to past candidates. The app allows the user to get beyond the punditry and discover the hidden meaning in the words chosen by the candidates. As you let the transcript run, numbers followed by their units (like "18 months") flash on the screen, and trigger words for emotions like positivity, negativity, and rage are highlighted yellow, blue, and red, respectively. You can also see the classifications in graph form. There are a handful of less straightforward text classifications for truthy and suicidal, which are based on linguistic studies, which in turn are based on word frequencies. These estimates are more fuzzy. So, as the creators suggest, it's best not to interpret the project as an analytical tool, and more of a fun way to look back at the debate, which it is. It's pretty fun to watch. Here's a short video from Sosolimited for more on how the application works: "
Tom Johnson

The Overview Project » VIDEO: document mining with Overview - 0 views

  •  
    VIDEO: document mining with Overview by Jonathan Stray on 10/31/2012 0 With the release of the new, web-only version of Overview that runs in your browser, we thought it was time to make a little video showing how to use it. If that doesn't answer your questions, see also the help page, and the FAQ.
Tom Johnson

The Overview Project » Document mining shows Paul Ryan relying on the the pro... - 0 views

  •  
    Document mining shows Paul Ryan relying on the the programs he criticizes by Jonathan Stray on 11/02/2012 0 One of the jobs of a journalist is to check the record. When Congressman Paul Ryan became a vice-presidential candidate, Associated Press reporter Jack Gillum decided to examine the candidate through his own words. Hundreds of Freedom of Information requests and 9,000 pages later, Gillum wrote a story showing that Ryan has asked for money from many of the same Federal programs he has criticized as wasteful, including stimulus money and funding for alternative fuels. This would have been much more difficult without special software for journalism. In this case Gillum relied on two tools: DocumentCloud to upload, OCR, and search the documents, and Overview to automatically sort the documents into topics and visualize the contents. Both projects are previous Knight News Challenge winners. But first Gillum had to get the documents. As a member of Congress, Ryan isn't subject to the Freedom of Information Act. Instead, Gillum went to every federal agency - whose files are covered under FOIA - for copies of letters or emails that might identify Ryan's favored causes, names of any constituents who sought favors, and more. Bit by bit, the documents arrived - on paper. The stack grew over weeks, eventually piling up two feet high on Gillum's desk. Then he scanned the pages and loaded them into the AP's internal installation of DocumentCloud. The software converts the scanned pages to searchable text, but there were still 9000 pages of material. That's where Overview came in. Developed in house at the Associated Press, this open-source visualization tool processes the full text of each document and clusters similar documents together, producing a visualization that graphically shows the contents of the complete document set. "I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or ext
Tom Johnson

Mining of Massive Datasets - 0 views

  •  
    Mining of Massive Datasets The book has now been published by Cambridge University Press. A hardcopy can be obtained Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3. --- Anand Rajaraman (@anand_raj) and Jeff Ullman Downloads Download the Complete Book (340 pages, approximately 2MB) Download chapters of the book: Preface and Table of Contents Chapter 1 Data Mining Chapter 2 Large-Scale File Systems and Map-Reduce Chapter 3 Finding Similar Items Chapter 4 Mining Data Streams Chapter 5 Link Analysis Chapter 6 Frequent Itemsets Chapter 7 Clustering Chapter 8 Advertising on the Web Chapter 9 Recommendation Systems Index
Tom Johnson

Politilines - 0 views

  •  
    Visualizing the words used in the 2011-2012 Republican Primary debates. The method: We collected transcripts from the American Presidency Project at UCSB, categorized them by hand, then ranked lemmatized word-phrases (or n-grams) by their frequency of use. Word-phrases can be made of up to five words. Our ranking agorithm accounts for things such as exclusive word-phrases - meaning, it won't count "United States" twice if it's used in a higher n-gram such as "President of the United States." While still in beta, the mini-app is responsive and easy to use. The next challenge, I think, is to really show what everyone talked about. For example, click on education and you see Newt Gingrich, Ron Paul, and Rick Perry brought those up. Then roll over the names to see the words each candidate used related to that topic. You get some sense of content, but it's still hard to decipher what each actually said about education.
Tom Johnson

Palantir- Our Work - What We Do - 0 views

  •  
    WHAT WE DO We build software that allows organizations to make sense of massive amounts of disparate data. We solve the technical problems, so they can solve the human ones. Combating terrorism. Prosecuting crimes. Fighting fraud. Eliminating waste. From Silicon Valley to your doorstep, we deploy our data fusion platforms against the hardest problems we can find, wherever we are needed most.
1 - 20 of 39 Next ›
Showing 20 items per page