Skip to main content

Home/ DJCamp2011/ Group items tagged text

Rss Feed Group items tagged

Tom Johnson

National Science Foundation Helps Fund scrible, A New Web Annotation Tool/Per... - 0 views

  • INFOdocket Information Industry News + New Web Sites and Tools From Gary Price and Shirl Kennedy National Science Foundation Helps Fund scrible, A New Web Annotation Tool/Personal Web Cache + Video Demo Posted on May 12, 2011 by Gary D. Price scrible (pronounced scribble) launched about a week ago and you can learn more (free to register and use) here. The company has received a $500,000 grant from the National Science Foundation. From Venture Beat: The company lets users do three things: Save articles and pages so they’re available if the original goes offline; richly annotate online content using tools reminiscent of Word (highlighter, sticky note, etc.), and share annotated pages privately with others. scrible is free and will continue to be free to all users (125MB of storage space). A premium edition is also planned but features (aside from a larger storage quota) have not been announced. Robert Scoble has posted a video demo of scrible with the CEO of of the company, Victor Karkar, doing the “driving.” scrible sounds a lot like Diigo without the mobile access options. It also sounds similar (minus the markup features) to Pinboard. Pinboard does charge $9.97 for a lifetime membership with almost all features (there are many with new ones are debut regularly). For an extra $25/year all of the material you’ve bookmarked is cached by Pinboard. Cached pages look great INCLUDING PDF files. Pinboard is extremely fast and has a very low learning curve. Think Delicious and then add a ton of useful tools to it. Pinboard also provides mobile access to your saved bookmarks and cached documents. Finally, when used responsibly (aka abused) there are no storage space quotas. Which service do you prefer or does each service have a niche depending on the work you’re doing? What other tools to you use? Hat Tips and Thanks: @NspireD2 and @New Media Consortium Share this: Share Share Tagged: Annotation Tools, Diigo, Pinboard, scrible Posted in: Personal Archiving, Web To
  • INFOdocket Information Industry News + New Web Sites and Tools From Gary Price and Shirl Kennedy National Science Foundation Helps Fund scrible, A New Web Annotation Tool/Personal Web Cache + Video Demo Posted on May 12, 2011 by Gary D. Price scrible (pronounced scribble) launched about a week ago and you can learn more (free to register and use) here. The company has received a $500,000 grant from the National Science Foundation. From Venture Beat: The company lets users do three things: Save articles and pages so they’re available if the original goes offline; richly annotate online content using tools reminiscent of Word (highlighter, sticky note, etc.), and share annotated pages privately with others. scrible is free and will continue to be free to all users (125MB of storage space). A premium edition is also planned but features (aside from a larger storage quota) have not been announced. Robert Scoble has posted a video demo of scrible with the CEO of of the company, Victor Karkar, doing the “driving.” scrible sounds a lot like Diigo without the mobile access options. It also sounds similar (minus the markup features) to Pinboard. Pinboard does charge $9.97 for a lifetime membership with almost all features (there are many with new ones are debut regularly). For an extra $25/year all of the material you’ve bookmarked is cached by Pinboard. Cached pages look great INCLUDING PDF files. Pinboard is extremely fast and has a very low learning curve. Think Delicious and then add a ton of useful tools to it. Pinboard also provides mobile access to your saved bookmarks and cached documents. Finally, when used responsibly (aka abused) there are no storage space quotas. Which service do you prefer or does each service have a niche depending on the work you’re doing? What other tools to you use? Hat Tips and Thanks: @NspireD2 and @New Media Consortium Share this: Share Share Tagged: Annotation Tools, Diigo, Pinboard, scrible Posted in: Personal Archiving, Web Tools
  •  
    " INFOdocket Information Industry News + New Web Sites and Tools From Gary Price and Shirl Kennedy National Science Foundation Helps Fund scrible, A New Web Annotation Tool/Personal Web Cache + Video Demo Posted on May 12, 2011 by Gary D. Price scrible (pronounced scribble) launched about a week ago and you can learn more (free to register and use) here. The company has received a $500,000 grant from the National Science Foundation. From Venture Beat: The company lets users do three things: Save articles and pages so they're available if the original goes offline; richly annotate online content using tools reminiscent of Word (highlighter, sticky note, etc.), and share annotated pages privately with others. scrible is free and will continue to be free to all users (125MB of storage space). A premium edition is also planned but features (aside from a larger storage quota) have not been announced. Robert Scoble has posted a video demo of scrible with the CEO of of the company, Victor Karkar, doing the "driving." scrible sounds a lot like Diigo without the mobile access options. It also sounds similar (minus the markup features) to Pinboard. Pinboard does charge $9.97 for a lifetime membership with almost all features (there are many with new ones are debut regularly). For an extra $25/year all of the material you've bookmarked is cached by Pinboard. Cached pages look great INCLUDING PDF files. Pinboard is extremely fast and has a very low learning curve. Think Delicious and then add a ton of useful tools to it. Pinboard also provides mobile access to your saved bookmarks and cached documents. Finally, when used responsibly (aka abused) there are no storage space quotas. Which service do you prefer or does each service have a niche depending on the work you're doing? What other tools to you use? Hat Tips and Thanks: @NspireD2 and @New Media Consortium Share this: Share Tagged: Annotation Tools, Diigo, Pinboard, scrible Posted in: P
Tom Johnson

Timeline JS - Beautifully crafted timelines that are easy, and intuitive to use. - 0 views

  • Document History TimelineJS can pull in media from different sources. It has built in support for: Twitter, Flickr, Google Maps, YouTube, Vimeo, Dailymotion, Wikipedia, SoundCloud and more media types in the future. Creating one is as easy as filling in a Google spreadsheet or as detailed as JSON. Tips and tricks to best utilize TimelineJS. Keep it short, and write each event as a part of a larger narrative. Pick stories that have a strong chronological narrative. It does not work well for stories that need to jump around in the timeline. Include events that build up to major occurrences. Not just the major events. Sign up for Updates Get updates, tips and news by email. No Spam. Subscribe var fnames = new Array();var ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='NAME';ftypes[1]='text'; try { var jqueryLoaded=jQuery; jqueryLoaded=true; } catch(err) { var jqueryLoaded=false; } var head= document.getElementsByTagName('head')[0]; if (!jqueryLoaded) { var script = document.createElement('script'); script.type = 'text/javascript'; script.src = 'http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js'; head.appendChild(script); if (script.readyState && script.onload!==null){ script.onreadystatechange= function () { if (this.readyState == 'complete') mce_preload_check(); } } } var script = document.createElement('script'); script.type = 'text/javascript'; script.src = 'http://downloads.mailchimp.com/js/jquery.form-n-validate.js'; head.appendChild(script); var err_style = ''; try{ err_style = mc_custom_error_style; } catch(e){ err_style = '#mc_embed_signup input.mce_inline_error{border-color:#6B0505;} #mc_embed_signup div.mce_inline_error{margin: 0 0 1em 0; padding: 5px 10px; background-color:#6B0505; font-weight: bold; z-index: 1; color:#fff;}'; } var head= document.getElementsByTagName('head')[0]; var style= document.createElement('style'); style.type= 'text/css'; if (style.styleSheet) { style.styleSheet.cssText = err_style; } else { style.appendChild(document.createTextNode(err_style)); } head.appendChild(style); setTimeout('mce_preload_check();', 250); var mce_preload_checks = 0; function mce_preload_check(){ if (mce_preload_checks>40) return; mce_preload_checks++; try { var jqueryLoaded=jQuery; } catch(err) { setTimeout('mce_preload_check();', 250); return; } try { var validatorLoaded=jQuery("#fake-form").validate({}); } catch(err) { setTimeout('mce_preload_check();', 250); return; } mce_init_form(); } function mce_init_form(){ jQuery(document).ready( function($) { var options = { errorClass: 'mce_inline_error', errorElement: 'div', onkeyup: function(){}, onfocusout:function(){}, onblur:function(){} }; var mce_validator = $("#mc-embedded-subscribe-form").validate(options); $("#mc-embedded-subscribe-form").unbind('submit');//remove the validator so we can get into beforeSubmit on the ajaxform, which then calls the validator options = { url: 'http://verite.us4.list-manage2.com/subscribe/post-json?u=7cc197123f5f6d3b8dc4e176f&id=d7f2b5d664&c=?', type: 'GET', dataType: 'json', contentType: "application/json; charset=utf-8", beforeSubmit: function(){ $('#mce_tmp_error_msg').remove(); $('.datefield','#mc_embed_signup').each( function(){ var txt = 'filled'; var fields = new Array(); var i = 0; $(':text', this).each( function(){ fields[i] = this; i++; }); $(':hidden', this).each( function(){ var bday = false; if (fields.length == 2){ bday = true; fields[2] = {'value':1970};//trick birthdays into having years } if ( fields[0].value=='MM' && fields[1].value=='DD' && (fields[2].value=='YYYY' || (bday && fields[2].value==1970) ) ){ this.value = ''; } else if ( fields[0].value=='' && fields[1].value=='' && (fields[2].value=='' || (bday && fields[2].value==1970) ) ){ this.value = ''; } else { if (/\[day\]/.test(fields[0].name)){ this.value = fields[1].value+'/'+fields[0].value+'/'+fields[2].value; } else { this.value = fields[0].value+'/'+fields[1].value+'/'+fields[2].value; } } }); }); return mce_validator.form(); }, success: mce_success_cb }; $('#mc-embedded-subscribe-form').ajaxForm(options); }); } function mce_success_cb(resp){ $('#mce-success-response').hide(); $('#mce-error-response').hide(); if (resp.result=="success"){ $('#mce-'+resp.result+'-response').show(); $('#mce-'+resp.result+'-response').html(resp.msg); $('#mc-embedded-subscribe-form').each(function(){ this.reset(); }); } else { var index = -1; var msg; try { var parts = resp.msg.split(' - ',2); if (parts[1]==undefined){ msg = resp.msg; } else { i = parseInt(parts[0]); if (i.toString() == parts[0]){ index = parts[0]; msg = parts[1]; } else { index = -1; msg = resp.msg; } } } catch(e){ index = -1; msg = resp.msg; } try{ if (index== -1){ $('#mce-'+resp.result+'-response').show(); $('#mce-'+resp.result+'-response').html(msg); } else { err_id = 'mce_tmp_error_msg'; html = ' '+msg+''; var input_id = '#mc_embed_signup'; var f = $(input_id); if (ftypes[index]=='address'){
  •  
    Document History TimelineJS can pull in media from different sources. It has built in support for: Twitter, Flickr, Google Maps, YouTube, Vimeo, Dailymotion, Wikipedia, SoundCloud and more media types in the future. Creating one is as easy as filling in a Google spreadsheet or as detailed as JSON. Tips and tricks to best utilize TimelineJS. Keep it short, and write each event as a part of a larger narrative. Pick stories that have a strong chronological narrative. It does not work well for stories that need to jump around in the timeline. Include events that build up to major occurrences. Not just the major events. Sign up for Updates Get updates, tips and news by email. No Spam. Download Coming Soon Changelog Issues The project is hosted on GitHub, the largest code host in the world. We encourage you to contribute to the project and we value your feedback. You can report bugs and discuss features on the issues page, or ask a question on our Google Group TimelineJS Download View on GitHub Google Group Wordpress Plugin Download View on GitHub This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. http://www.gnu.org/licenses/ Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA. TimelineJS was created and built by VéritéCo, as a project of the Knight News Innovation Lab Stay connected with us on twitter Examples
Tom Johnson

T-LAB Tools for Text Analysis - 0 views

  •  
    The all-in-one software for Content Analysis and Text Mining Hello We are pleased to announce the release of T-LAB 8.0. This version represents a major change in the usability and the effectiveness of our software for text analysis. The most significant improvements concern the integration of bottom-up (i.e. unsupervised) methods for exploratory text analysis with top-down (i.e. supervised) approaches for the automated classification of textual units like words, sentences, paragraphs and documents. Among other things, this means that - besides discovering emerging patterns of words and themes from texts - the users can now easily build, apply and validate their models (e.g. dictionaries of categories or pre-existing manual categorizations) both for classical content analysis and for sentiment analysis. For this purpose several T-LAB functionalities have been expanded and a new ergonomic and powerful tool named 'Dictionary-Based Classification' has been added. No specific dictionaries have been built in; however, with some minor re-formatting, lots of resources available over the Internet and customized word lists can be quickly imported. Last but not least, in order to meet the needs of many customers, temporary licenses of the software are now on sale; moreover, without any time limit, the trial mode of the software now allows you to analyse your own texts up to 20 kb in txt format, each of which can include up to 20 short documents. To learn more, use the following link http://www.tlab.it/en/80news.php The Demo, the User's Manual and the Quick Introduction are available at http://www.tlab.it/en/download.php Kind Regards The T-LAB Team web: http://www.tlab.it/ e-mail: info@tlab.it
Tom Johnson

Michelle Minkoff » Learning to love…grep (let the computer search text for you) - 0 views

  • Blog Learning to love…grep (let the computer search text for you) Posted by Michelle Minkoff on Aug 9, 2012 in Blog, Uncategorized | No Comments I’ve gotten into the habit of posting daily learnings on Twitter, but some things require a more in-depth reminder. I also haven’t done as much paying as forward as I’d like (but I’m having a TON of fun!  and dealing with health problems!  but mostly fun!) I’d like to try to start posting more helpful tips here, partially as a notebook for myself, and partially to help others with similar issues. Today’s problem: I needed to search for a few lines of text, which could be contained in any one of nine files with 100,000 lines each. Opening all of the files took a very long time on my computer, not to mention executing a search. Enter the “grep” command in Terminal, that allows you to quickly search files using the power of the computer.
  •  
    Blog Learning to love…grep (let the computer search text for you) Posted by Michelle Minkoff on Aug 9, 2012 in Blog, Uncategorized | No Comments I've gotten into the habit of posting daily learnings on Twitter, but some things require a more in-depth reminder. I also haven't done as much paying as forward as I'd like (but I'm having a TON of fun! and dealing with health problems! but mostly fun!) I'd like to try to start posting more helpful tips here, partially as a notebook for myself, and partially to help others with similar issues. Today's problem: I needed to search for a few lines of text, which could be contained in any one of nine files with 100,000 lines each. Opening all of the files took a very long time on my computer, not to mention executing a search. Enter the "grep" command in Terminal, that allows you to quickly search files using the power of the computer.
  •  
    An easy to use method for content analysis
Tom Johnson

International Dataset Search - 0 views

  • International Dataset Search View View Source Description:  The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Contributor: Jinguang Zheng Contributor: Yongmei Shi Live Demo:  http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. Welcome to S2S! International OGD Catalog Search (searching 736,578 datasets)
  •  
    International Dataset Search View View Source Description: The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Jinguang Zheng Yongmei Shi Live Demo: http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. International OGD Catalog Search (searching 736,578 datasets) http://logd.tw.rpi.edu/demo/international_dataset_catalog_search
  •  
    Loads surprisingly quickly. Try entering your favorite search term in top blue box. Can use quotes to define phrases.
Tom Johnson

The Overview Project » Document mining shows Paul Ryan relying on the the pro... - 0 views

  •  
    Document mining shows Paul Ryan relying on the the programs he criticizes by Jonathan Stray on 11/02/2012 0 One of the jobs of a journalist is to check the record. When Congressman Paul Ryan became a vice-presidential candidate, Associated Press reporter Jack Gillum decided to examine the candidate through his own words. Hundreds of Freedom of Information requests and 9,000 pages later, Gillum wrote a story showing that Ryan has asked for money from many of the same Federal programs he has criticized as wasteful, including stimulus money and funding for alternative fuels. This would have been much more difficult without special software for journalism. In this case Gillum relied on two tools: DocumentCloud to upload, OCR, and search the documents, and Overview to automatically sort the documents into topics and visualize the contents. Both projects are previous Knight News Challenge winners. But first Gillum had to get the documents. As a member of Congress, Ryan isn't subject to the Freedom of Information Act. Instead, Gillum went to every federal agency - whose files are covered under FOIA - for copies of letters or emails that might identify Ryan's favored causes, names of any constituents who sought favors, and more. Bit by bit, the documents arrived - on paper. The stack grew over weeks, eventually piling up two feet high on Gillum's desk. Then he scanned the pages and loaded them into the AP's internal installation of DocumentCloud. The software converts the scanned pages to searchable text, but there were still 9000 pages of material. That's where Overview came in. Developed in house at the Associated Press, this open-source visualization tool processes the full text of each document and clusters similar documents together, producing a visualization that graphically shows the contents of the complete document set. "I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or ext
Tom Johnson

mapping texts/texas - 0 views

  •  
    Assessing Language Patterns: A Look at Texas Newspapers, 1829-2008 This visualization plots the language patterns embedded in 232,567 pages of historical Texas newspapers, as they evolved over time and space. For any date range and location, you can browse the most common words (word counts), named entities (people, places, etc), and highly correlated words (topic models). [ About Mapping Texts ]
Tom Johnson

Introduction to Infographics and Data Visualization: Knight Center's first Massive Open... - 0 views

  • ntroduction to Infographics and Data Visualization: Knight Center's first Massive Open Online Course Registration is now open for the Knight Center's first MOOC (Massive Open Online Course). The course will formally begin on Sunday, October 28, 2012 through Saturday, December 8, 2012. Below are course details and how to register. The introductory area of the course is now available to enrolled students. The introductory area includes access to the course syllabus and the introductory overview video for the course. Course Dates:  Sunday, October 28, 2012 - Saturday, December 8, 2012 Course Language:  English Instructor:  Alberto Cairo Course Objectives:  • How to analyze and critique infographics and visualizations in newspapers, books, TV, etc., and how to propose alternatives that would improve them. • How to plan for data-based storytelling through charts, maps, and diagrams. • How to design infographics and visualizations that are not just attractive but, above all, informative, deep, and accurate. • The rules of graphic design and of interaction design, applied to infographics and visualizations. • Optional: How to use Adobe Illustrator to create infographics.
  •  
    ntroduction to Infographics and Data Visualization: Knight Center's first Massive Open Online Course "Bookmark Registration is now open for the Knight Center's first MOOC (Massive Open Online Course). The course will formally begin on Sunday, October 28, 2012 through Saturday, December 8, 2012. Below are course details and how to register. The introductory area of the course is now available to enrolled students. The introductory area includes access to the course syllabus and the introductory overview video for the course. Course Dates: Sunday, October 28, 2012 - Saturday, December 8, 2012 Course Language: English Instructor: Alberto Cairo Course Objectives: * How to analyze and critique infographics and visualizations in newspapers, books, TV, etc., and how to propose alternatives that would improve them. * How to plan for data-based storytelling through charts, maps, and diagrams. * How to design infographics and visualizations that are not just attractive but, above all, informative, deep, and accurate. * The rules of graphic design and of interaction design, applied to infographics and visualizations. * Optional: How to use Adobe Illustrator to create infographics.
Tom Johnson

Reconstruction 2012 - 0 views

  •  
    "ReConstitution 2012, a fun experiment by Sosolimited, processes transcripts from the presidential debates, and recreates them with animated words and charts. Part data visualization, part experimental typography, ReConstitution 2012 is a live web app linked to the US Presidential Debates. During and after the three debates, language used by the candidates generates a live graphical map of the events. Algorithms track the psychological states of Romney and Obama and compare them to past candidates. The app allows the user to get beyond the punditry and discover the hidden meaning in the words chosen by the candidates. As you let the transcript run, numbers followed by their units (like "18 months") flash on the screen, and trigger words for emotions like positivity, negativity, and rage are highlighted yellow, blue, and red, respectively. You can also see the classifications in graph form. There are a handful of less straightforward text classifications for truthy and suicidal, which are based on linguistic studies, which in turn are based on word frequencies. These estimates are more fuzzy. So, as the creators suggest, it's best not to interpret the project as an analytical tool, and more of a fun way to look back at the debate, which it is. It's pretty fun to watch. Here's a short video from Sosolimited for more on how the application works: "
Tom Johnson

The Open Data Handbook - Open Data Manual - 0 views

  • The Open Data Handbook¶ This handbook discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data – why to go open, what open is, and the how to ‘open’ data. To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below). We warmly welcome comments on the text and will incorporate feedback as we go forward. We also welcome contributions or suggestions for additional sections and areas to examine.
  • The Open Data Handbook¶ This handbook discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data – why to go open, what open is, and the how to ‘open’ data. To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below). We warmly welcome comments on the text and will incorporate feedback as we go forward. We also welcome contributions or suggestions for additional sections and areas to examine.
  •  
    "The Open Data Handbook This handbook discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data - why to go open, what open is, and the how to 'open' data. To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below). We warmly welcome comments on the text and will incorporate feedback as we go forward. We also welcome contributions or suggestions for additional sections and areas to examine."
Tom Johnson

The Overview Project » Using Overview to analyze 4500 pages of documents on s... - 0 views

  • Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search.  But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents — to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
  •  
    Using Overview to analyze 4500 pages of documents on security contractors in Iraq by Jonathan Stray on 02/21/2012 0 This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis. The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search. But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents - to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve. The reporting was a multi-stage process: Splitting the massive PDFs into individual documents and extracting the text Exploration and subject tagging with the Overview prototype Random sampling to estimate the frequency of certain types of events Followup and comparison with other sources
Tom Johnson

Data journalism at the Guardian: what is it and how do we do it? | News | guardian.co.uk - 0 views

  • Data journalism at the Guardian: what is it and how do we do it? Simon Rogers: Our 10 point guide to data journalism and how it's changing Share  reddit this omnitracker.omniTrackEVarEvent( 12, 16, 'News: Reddit', 'click', '.reddit a' ); Comments (2) Data journalism. What is it and how is it changing? Photograph: Alamy Here's an interesting thing: data journalism is becoming part of the establishment. Not in an Oxbridge elite kind of way (although here's some data on that) but in the way it is becoming the industry standard.Two years ago, when we launched the Datablog, all this was new. People still asked if getting stories from data was really journalism and not everyone had seen Adrian Holovaty's riposte. But once you've had MPs expenses and Wikileaks, the startling thing is that no-one asks those questions anymore. Instead, they want to know, "how do we do it?"
  •  
    Data journalism at the Guardian: what is it and how do we do it? Simon Rogers: Our 10 point guide to data journalism and how it's changing Share reddit this Comments (2) Data abstract Data journalism. What is it and how is it changing? Photograph: Alamy Here's an interesting thing: data journalism is becoming part of the establishment. Not in an Oxbridge elite kind of way (although here's some data on that) but in the way it is becoming the industry standard. Two years ago, when we launched the Datablog, all this was new. People still asked if getting stories from data was really journalism and not everyone had seen Adrian Holovaty's riposte. But once you've had MPs expenses and Wikileaks, the startling thing is that no-one asks those questions anymore. Instead, they want to know, "how do we do it?"
Tom Johnson

SpeakerText | Transcription, Captions, Interactive Transcripts - 0 views

  • How It Works SpeakerText combines artificial and human intelligence to offer low-cost, high-quality video transcription. Sign up for an account Import your video library (we currently support Ooyala, Brightcove, YouTube, Vimeo, SoundCloud, Wistia and Blip.tv), or add your videos one-by-one. Choose which videos you want to transcribe. Check out and pay. SpeakerText sends you an email when your transcripts are finished. Download your transcripts as text or XML files from SpeakerText OR install CaptionBox and download your transcripts as HTML code to place on your website. We guarantee that your transcripts will get back to you in less than 72 hours and be of the highest quality. Give it a try now!
  •  
    How It Works SpeakerText combines artificial and human intelligence to offer low-cost, high-quality video transcription. Sign up for an account Import your video library (we currently support Ooyala, Brightcove, YouTube, Vimeo, SoundCloud, Wistia and Blip.tv), or add your videos one-by-one. Choose which videos you want to transcribe. Check out and pay. SpeakerText sends you an email when your transcripts are finished. Download your transcripts as text or XML files from SpeakerText OR install CaptionBox and download your transcripts as HTML code to place on your website. Guarantee We guarantee that your transcripts will get back to you in less than 72 hours and be of the highest quality. Give it a try now! http://speakertext.com
  •  
    This is the first I've heard of a tool like this doing a creditable job. I suspect there is some machine transcription going on, but then the first pass is sent to India or Jamaica to be polished. Here's an example of how the NYTimes used this tool: http://www.nytimes.com/interactive/2009/01/20/us/politics/20090120_INAUGURAL_ANALYSIS.html
Tom Johnson

RegExr: Free Online RegEx Testing Tool - 0 views

  • gExr is an online tool for editing and testing Regular Expressions (RegExp / RegEx). It provides a simple interface to enter RegEx expressions, and visualize matches in real-time editable source text. It also provides a handy RegExp snippet sidebar with descriptions and usage examples to make it easier to learn Regular Expressions through trial and error. It isn’t as powerful as a product like RegExBuddy, but it has the advantage of being online and free. I will be releasing a free desktop version for Mac OSX and Windows built with AIR in the next day or two. So far this has only taken a day of developmen
  •  
    "RegExr is an online tool for editing and testing Regular Expressions (RegExp / RegEx). It provides a simple interface to enter RegEx expressions, and visualize matches in real-time editable source text. It also provides a handy RegExp snippet sidebar with descriptions and usage examples to make it easier to learn Regular Expressions through trial and error. It isn't as powerful as a product like RegExBuddy, but it has the advantage of being online and free. I will be releasing a free desktop version for Mac OSX and Windows built with AIR in the next day or two. So far this has only taken a day of development, and the main app is only 150 lines of code. Flex 3 makes this kind of app so darn simple to put together."
Tom Johnson

FusionTablesLayer Builder - 0 views

  • FusionTablesLayer Builder This wizard helps you create the HTML for a map with a FusionTablesLayer and search element (either text-based search or select menu). After creating your map, you can copy and paste the HTML code in the textarea below to display the map on your own website! Please submit bug reports here: Issue Tracker
  •  
    FusionTablesLayer Builder This wizard helps you create the HTML for a map with a FusionTablesLayer and search element (either text-based search or select menu). After creating your map, you can copy and paste the HTML code in the textarea below to display the map on your own website! Please submit bug reports here: Issue Tracker
  •  
    Click on the "Add another feature" drop-down to add additional layer or search box
Tom Johnson

Graphviz - Graph Visualization Software - 0 views

  • Welcome to Graphviz  Available translations:Belorussian What is Graphviz? Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics,  software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.   Features The Graphviz layout programs take descriptions of graphs in a simple text language, and make diagrams in useful formats, such as images and SVG for web pages, PDF or Postscript for inclusion in other documents; or display in an interactive graph browser. (Graphviz also supports GXL, an XML dialect.)  Graphviz has many useful features for concrete diagrams, such as options for colors, fonts, tabular node layouts, line styles, hyperlinks, rolland custom shapes.
  •  
    Welcome to Graphviz Available translations:Belorussian What is Graphviz? Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains. Features The Graphviz layout programs take descriptions of graphs in a simple text language, and make diagrams in useful formats, such as images and SVG for web pages, PDF or Postscript for inclusion in other documents; or display in an interactive graph browser. (Graphviz also supports GXL, an XML dialect.) Graphviz has many useful features for concrete diagrams, such as options for colors, fonts, tabular node layouts, line styles, hyperlinks, rolland custom shapes.
Tom Johnson

WinMerge - 0 views

  •  
    What is WinMerge? File Comparison More Screenshots… WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Tom Johnson

Public sector needs to improve quality of information, warns Eurim | Guardian Governmen... - 0 views

  • Public sector needs to improve quality of information, warns Eurim Parliamentary group gives cautious welcome to the EU's plans to open up more public sector data reddit this omnitracker.omniTrackEVarEvent( 12, 16, 'Guardian Government Computing: Reddit', 'click', '.reddit a' ); Comments (0) Sade Laja Guardian Professional, Monday 19 December 2011 07.08 EST Article history Sharing data on public services could have serious consequences unless the material has been valued, maintained and protected and the original reasons for its collection have been taken into account, the Information Society Alliance (Eurim), has warned. In a report on the quality of public sector information, the group says that the drive to put central and local government data online, open to public scrutiny, has revealed the long standing problems with quality that lie behind the reluctance of some departments and agencies to trust one another's data. It adds that it is important that decisions on spending cuts are based on good quality information.
  •  
    Sharing data on public services could have serious consequences unless the material has been valued, maintained and protected and the original reasons for its collection have been taken into account, the Information Society Alliance (Eurim), has warned. In a report on the quality of public sector information, the group says that the drive to put central and local government data online, open to public scrutiny, has revealed the long standing problems with quality that lie behind the reluctance of some departments and agencies to trust one another's data. It adds that it is important that decisions on spending cuts are based on good quality information.
  •  
    An important article. Please read.
Tom Johnson

SchemaSpy - 0 views

  • SchemaSpyGraphical Database Schema Metadata Browser Sample Output FAQ Download Release Notes Support SchemaSpy John Currier Recent Donors: Anonymous monocongo chervitz Do you hate starting on a new project and having to try to figure out someone else's idea of a database? Or are you in QA and the developers expect you to understand all the relationships in their schema? If so then this tool's for you. SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It's also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
  •  
    SchemaSpy Graphical Database Schema Metadata Browser SourceForge.net Sample Output FAQ Download Release Notes Support SchemaSpy John Currier Recent Donors: Anonymous monocongoProject Donor chervitzProject DonorAccepting Donations Support SchemaSpy Do you hate starting on a new project and having to try to figure out someone else's idea of a database? Or are you in QA and the developers expect you to understand all the relationships in their schema? If so then this tool's for you. SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It's also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
Tom Johnson

Storytelling with Maps - 0 views

  •  
    "About story maps Story maps combine intelligent Web maps with Web applications and templates that incorporate text, multimedia, and interactive functions. Story maps inform, educate, entertain, and inspire people about a wide variety of topics."
1 - 20 of 36 Next ›
Showing 20 items per page