Skip to main content

Home/ DJCamp2011/ Group items tagged data cleaning

Rss Feed Group items tagged

Tom Johnson

Intro to cleaning data | Knight Center - Berkeley - 0 views

  • Understanding how to clean  data is an important skill every reporter needs. Demographic, financial and other data is available on a city, county, state and national level in the United States. But understanding how to take a large data file and distill it into a usable form can be daunting. In this tutorial, you'll learn how spreadsheets work, basic data-cleaning workflow and how to use formulas and functions to clean data. This is a general tutorial and it doesn't delve deeply into one program. We'll use Microsoft Excel but most of the same techniques work in Google Spreadsheets and other programs.
  •  
    Understanding how to clean data is an important skill every reporter needs. Demographic, financial and other data is available on a city, county, state and national level in the United States. But understanding how to take a large data file and distill it into a usable form can be daunting. In this tutorial, you'll learn how spreadsheets work, basic data-cleaning workflow and how to use formulas and functions to clean data. This is a general tutorial and it doesn't delve deeply into one program. We'll use Microsoft Excel but most of the same techniques work in Google Spreadsheets and other programs.
Tom Johnson

Needlebase - for acquiring, integrating, cleansing, analyzing and publishing data on th... - 1 views

  • ITA Software is proud to introduce Needlebase™, a revolutionary platform for acquiring, integrating, cleansing, analyzing and publishing data on the web.  Using Needlebase through a web browser, without programmers or DBAs, your data team can easily: acquire data from multiple sources:  A simple tagging process quickly imports structured data from complex websites, XML feeds, and spreadsheets into a unified database of your design. merge, deduplicate and cleanse: Needlebase uses intelligent semantics to help you find and merge variant forms of the same record.  Your merges, edits and deletions persist even after the original data is refreshed from its source. build and publish custom data views: Use Needlebase's visual UI and powerful query language to configure exactly your desired view of the data, whether as a list, table, grid, or map.  Then, with one click, publish the data for others to see, or export a feed of the clean data to your own local database. Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything. Read on to learn more about Needlebase's capabilities and our early adopters' success stories, or watch our tutorial videos. Then sign up to get started!
  •  
    ITA Software is proud to introduce Needlebase™, a revolutionary platform for acquiring, integrating, cleansing, analyzing and publishing data on the web. Using Needlebase through a web browser, without programmers or DBAs, your data team can easily: acquire data from multiple sources: A simple tagging process quickly imports structured data from complex websites, XML feeds, and spreadsheets into a unified database of your design. merge, deduplicate and cleanse: Needlebase uses intelligent semantics to help you find and merge variant forms of the same record. Your merges, edits and deletions persist even after the original data is refreshed from its source. build and publish custom data views: Use Needlebase's visual UI and powerful query language to configure exactly your desired view of the data, whether as a list, table, grid, or map. Then, with one click, publish the data for others to see, or export a feed of the clean data to your own local database. Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything. Read on to learn more about Needlebase's capabilities and our early adopters' success stories, or watch our tutorial videos. Then sign up to get started! http://needlebase.com
Tom Johnson

When Maps Shouldn't Be Maps « Matthew Ericson - ericson.net - 0 views

  • « Illustrator MultiExporter script: Now with JPG and EPS When Maps Shouldn’t Be Maps View full interactive map on nytimes.com » Often, when you get data that is organized by geography — say, for example, food stamp rates in every county, high school graduation rates in every state, election results in every House district, racial and ethnic distributions in each census tract — the impulse is since the data CAN be mapped, the best way to present the data MUST be a map. You plug the data into ArcView, join it up with a shapefile, export to Illustrator, clean up the styles and voilà! Instant graphic ready to be published. And in many cases, that’s the right call.
  •  
    Matthew Ericson « Illustrator MultiExporter script: Now with JPG and EPS When Maps Shouldn't Be Maps View full interactive map on nytimes.com » Often, when you get data that is organized by geography - say, for example, food stamp rates in every county, high school graduation rates in every state, election results in every House district, racial and ethnic distributions in each census tract - the impulse is since the data CAN be mapped, the best way to present the data MUST be a map. You plug the data into ArcView, join it up with a shapefile, export to Illustrator, clean up the styles and voilà! Instant graphic ready to be published. And in many cases, that's the right call.
Tom Johnson

Data Science Central - 0 views

  •  
    Welcome to Data Science Central! Data Science Central is the industry's one stop resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central (DSC) provides a true community experience through social interaction, peer to peer technical support, the latest in technology, tools and trends --and even job opportunities. We look forward to hearing your feedback as we grow this community of professionals in our exciting industry during times of dramatic change.
Tom Johnson

Mr. People - Data cleaning - 1 views

  •  
    Mr. People Years ago, while trying to clean up the names of donors in campaign finance data from the Federal Election Commission, I hacked together a Perl module - loosely based on the Lingua-EN-NameParse module - to standardize names. One port to Ruby later, I've finally put together a Web front end for it. Try it out below - paste your own data in or try the sample data. To use the people Ruby gem in your own scripts, sudo gem install people, then read the documentation. Suggestions? Send them to mrpeople@ericson.net Allow couples:   Case:  Output:    Paste your names here:
Tom Johnson

How to make searchable, Web-based Google charts | Poynter. - 0 views

  •  
    How to make searchable, Web-based Google charts Michelle Minkoff by Michelle Minkoff Published June 3, 2011 12:01 am Updated June 2, 2011 10:22 pm A lot of data visualization requires the technical expertise of a programmer and skills that take time and resources to develop. A rise in free tools, however, has made it easier to make interactive graphs in charts, whether you're a designer, developer, Web producer or hobbyist. The Google Visualization API, for instance, gives you options without making the work too complicated. I've created a tutorial below to help you make simple, Web-based Google charts. (You can click on any of the screenshots to go to a larger version.) In the first example, we'll craft an interactive bar chart that compares the numbers of tornado-related deaths in the United States throughout the past four years. We'll use data from the National Oceanic and Atmospheric Administration (NOAA), which can be found here. (You can download a cleaned version of this data here, formatted as a comma-delimited file, CSV.) http://www.poynter.org/how-tos/newsgathering-storytelling/126595/how-to-make-simple-web-based-google-charts
Tom Johnson

MDA Analytics - 0 views

  •  
    An interesting example of yet another "next generation" data analysis and presentation tool. You can see the demos at http://www.lavastorm.com/ Emphasis is on visualizing the data analytic method while doing the analysis.
Tom Johnson

Searchable Map Template with Google Fusion Tables - 0 views

  •  
    Searchable Map Template with Google Fusion Tables Turn a spreadsheet in to a searchable map You want to put your data on a searchable, filterable map. This is a free, open source tool to help you do it. Features clean, full screen layout new mobile and tablet friendly using responsive design address search (with variable radius) geolocation (find me!) new RESTful URLs for sharing searches results count (using Google's Fusion Tables API) ability to easily add additional search filters (checkboxes, sliders, etc) all done with HTML, CSS and Javascript - no server side code required Technologies used Google Fusion Tables (useful resources) Google Maps API V3 jQuery jQuery Address Twitter Bootstrap Note: This template is now supports the Fusion Tables v1 API. For more info on this, see their migration guide
Tom Johnson

Future Journalism Project - Jonathan Stray of the Associated Press on... - 0 views

  • Jonathan Stray of the Associated Press on investigating thousands (or millions) of documents by visualizing clusters. Presentation is from February 2011 at the National Institute of Computer-Assisted Reporting. Visualizations built with multidimensional scaling algorithm Glimmer.
  •  
    Jonathan Stray of the Associated Press on investigating thousands (or millions) of documents by visualizing clusters. Presentation is from February 2011 at the National Institute of Computer-Assisted Reporting. Visualizations built with multidimensional scaling algorithm Glimmer.
Tom Johnson

RegExr: Free Online RegEx Testing Tool - 0 views

  • gExr is an online tool for editing and testing Regular Expressions (RegExp / RegEx). It provides a simple interface to enter RegEx expressions, and visualize matches in real-time editable source text. It also provides a handy RegExp snippet sidebar with descriptions and usage examples to make it easier to learn Regular Expressions through trial and error. It isn’t as powerful as a product like RegExBuddy, but it has the advantage of being online and free. I will be releasing a free desktop version for Mac OSX and Windows built with AIR in the next day or two. So far this has only taken a day of developmen
  •  
    "RegExr is an online tool for editing and testing Regular Expressions (RegExp / RegEx). It provides a simple interface to enter RegEx expressions, and visualize matches in real-time editable source text. It also provides a handy RegExp snippet sidebar with descriptions and usage examples to make it easier to learn Regular Expressions through trial and error. It isn't as powerful as a product like RegExBuddy, but it has the advantage of being online and free. I will be releasing a free desktop version for Mac OSX and Windows built with AIR in the next day or two. So far this has only taken a day of development, and the main app is only 150 lines of code. Flex 3 makes this kind of app so darn simple to put together."
1 - 10 of 10
Showing 20 items per page