Skip to main content

Home/ DJCamp2011/ Group items tagged datasets

Rss Feed Group items tagged

Tom Johnson

International Dataset Search - 0 views

  • International Dataset Search View View Source Description:  The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Contributor: Jinguang Zheng Contributor: Yongmei Shi Live Demo:  http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. Welcome to S2S! International OGD Catalog Search (searching 736,578 datasets)
  •  
    International Dataset Search View View Source Description: The TWC International Open Government Dataset Catalog (IOGDC) is a linked data application based on metadata scraped from an increasing number of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPAQRL endpoint and made available for download. The TWC IOGDC demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDC highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide. Warning: This demo will crash IE7 and IE8. Contributor: Eric Rozell Jinguang Zheng Yongmei Shi Live Demo: http://logd.tw.rpi.edu/demo/international_dataset_catalog_search Notes: This is an experimental demo and some queries may take longer time to response (30 ~60 seconds). Please referesh this page if the demo is not loaded. Our metadata model can be accessed here . Procedure to getting and publishing metadata is described here . The RDF dump of the datasets can be downloaded here. International OGD Catalog Search (searching 736,578 datasets) http://logd.tw.rpi.edu/demo/international_dataset_catalog_search
  •  
    Loads surprisingly quickly. Try entering your favorite search term in top blue box. Can use quotes to define phrases.
Tom Johnson

Mining of Massive Datasets - 0 views

  •  
    Mining of Massive Datasets The book has now been published by Cambridge University Press. A hardcopy can be obtained Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3. --- Anand Rajaraman (@anand_raj) and Jeff Ullman Downloads Download the Complete Book (340 pages, approximately 2MB) Download chapters of the book: Preface and Table of Contents Chapter 1 Data Mining Chapter 2 Large-Scale File Systems and Map-Reduce Chapter 3 Finding Similar Items Chapter 4 Mining Data Streams Chapter 5 Link Analysis Chapter 6 Frequent Itemsets Chapter 7 Clustering Chapter 8 Advertising on the Web Chapter 9 Recommendation Systems Index
Tom Johnson

Resources - Data and Software - Capturing human rights data in Analyzer - 0 views

  • Capturing human rights data in Analyzer Human rights groups collect data containing details of human right abuses from various sources, including medical records, newspaper articles, witness testimonies, letters, interviews, and official reports and documents. Analyzer can be used to capture this data for analysis. Data is coded according to the "Who did what to whom" model and entered into the capture set of Analyzer. Data about the source of the information is entered in the source tab, shown in Figure 1. (Note: the data in the figures shown here, unless otherwise indicated, are from a sample, not an actual, dataset.)
  •  
    Human rights groups collect data containing details of human right abuses from various sources, including medical records, newspaper articles, witness testimonies, letters, interviews, and official reports and documents. Analyzer can be used to capture this data for analysis. Data is coded according to the "Who did what to whom" model and entered into the capture set of Analyzer. Data about the source of the information is entered in the source tab, shown in Figure 1. (Note: the data in the figures shown here, unless otherwise indicated, are from a sample, not an actual, dataset.)
Tom Johnson

Open Data Directory - 0 views

  • A free search engine for data sets published by governments, private companies and other organizations. It now indexes 255180 datasets from many sources.
  •  
    A free search engine for data sets published by governments, private companies and other organizations. It now indexes 255,180 datasets from many sources.
Tom Johnson

Beautiful but Terrible Pyramids: Tableau Edition - The Excel Charts Blog - 0 views

  • Beautiful but Terrible Pyramids: Tableau Edition by Jorge Camoes on July 12, 2011 // Well, here is my first chart in Tableau, finally! After publishing my experiments with population pyramids (using Excel), I thought I could try Tableau Public with the same dataset from the US Census Bureau. Here is the result. I never really played before with Tableau Public and it took my less than an hour to upload the data and make this chart, without reading a manual or watching a tutorial (changing line colors was the hard part). It says a lot about its usability.
  •  
    Beautiful but Terrible Pyramids: Tableau Edition by Jorge Camoes on July 12, 2011 Well, here is my first chart in Tableau, finally! After publishing my experiments with population pyramids (using Excel), I thought I could try Tableau Public with the same dataset from the US Census Bureau. Here is the result. I never really played before with Tableau Public and it took my less than an hour to upload the data and make this chart, without reading a manual or watching a tutorial (changing line colors was the hard part). It says a lot about its usability. http://www.excelcharts.com/blog/beautiful-but-terrible-pyramids-tableau-edition/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+JCCharts+%28Excel+Charts+Blog%29
  •  
    Select your favorite nation. Note how this could be used to illustrate population changes for a single nation over time or nation-to-nation comparisons.
Tom Johnson

Visualization contests around the corner - 0 views

  •  
    Visualization contests around the corner May 25, 2011 to Contests | Comments (3) The best way to learn how to visualize data is to grab a dataset and see what you can do with it. You can read as many tips and tricks as you want, but you're not going to get any better until you actually try. Contests are a fun way to do this. Participate So here are a handful of visualization contests to get your hands dirty. Hey you might even win a couple of thousand dollars. Not that money matters to you, because as well all know, learning is your reward. Hacking Education - A contest for developers and data crunchers. DonorsChoose.org has inspired $80 million in giving from 400,000 donors, helping 165,000 teachers at 43,000 schools, and the donation site has opened up this data. Can do you do something with it? Deadline: June 30, 2011. Data In Sight - A hands-on competition in San Francisco's SoMa district with surprise data sources. Some talks, lunch, dinner, and a 24-hour hackathon. Event date: June 24, 2011 (better to register your team early). Tableau Interactive Viz Contest - This one is coming up the quickest, but is the most straightforward. Plus, you get a t-shirt just for entering. Grab some business, finance, or real estate data and go to town with Tableau Public. Deadline: June 3, 2011.
Tom Johnson

Europeana Linked Open Data - 0 views

  • Europeana Linked Open Data The data.europeana.eu pilot is part of Europeana's ongoing effort of making its metadata available as Linked Open Data on the Web. It allows others to access metadata collected from Europeana providers, via standard Web technologies, enrich this metadata and give this enriched metadata back to the providers. Links between Europeana resources and other resources in the Linked Data Web will enable discovery of semantically related resources, as, say, when two artworks are created by artists who are related to each other. The data is represented in the Europeana Data Model (EDM) and the described resources are addressable and dereferencable by their URIs - for instance, http://data.europeana.eu/item/09404/C3C50BD0958EE18ECE1B8F93780DC84D8273664F leads either to an HTML page on the Europeana portal for the object it identifies or to raw, machine-processable data on this object. Disclaimer: data.europeana.eu is currently in pilot stage, and can thus be changed at any moment! Your feedback is more than welcome, and may lead to updates in the prototype service. What's in here for you? data.europeana.eu currently contains metadata on 3.5 million texts, images, videos and sounds gathered by Europeana. These objects come from content providers who have reacted early and positively to Europeana's initiative of promoting more open data and new data exchange agreements. These collections come from 10 direct Europeana providers encompassing around 300 cultural institutions from 17 countries. They cover a great variety of heritage objects, such as this 18th-century view of a German landscape from the Polish National Museum in Warsaw, or Neil Robson's memories of the herring business from the Tyne and Wear Archives & Museums. For more information, see our datasets page.
Tom Johnson

World Bank World Development Indicators - BuzzData - 0 views

  •  
    The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates. Complete dataset available from: http://data.worldbank.org/data-catalog/world-development-indicators
Tom Johnson

The special trick that helps identify dodgy stats | Ben Goldacre | Comment is free | Th... - 0 views

  • The special trick that helps identify dodgy stats Using Benford's law, forensic statisticians can spot suspicious patterns in the raw numbers, and estimate the chances figures have been tampered with
  • The results were fun. Greece – whose economy has tanked – showed the largest and most suspicious deviation from Benford's law of any country in the euro.
  •  
    if you go to the website testingbenfordslaw.com you'll see the proportions of each leading digit from lots of real-world datasets, graphed alongside what Benford's law predicts they should be, with data from Twitter users' follower counts to the number of books in different libraries across the US
  •  
    The special trick that helps identify dodgy stats Using Benford's law, forensic statisticians can spot suspicious patterns in the raw numbers, and estimate the chances figures have been tampered with
Tom Johnson

Corporate Accountability Data in Influence Explorer - Sunlight Labs: Blog - 0 views

  •  
    Again, US-centric, but this might generate some ideas of what could be accomplish in your city/nation. Late yesterday we announced a bunch of new features for Influence Explorer: http://sunlightlabs.com/blog/2011/ie-corporate-accountability/ As the blog post explains, you can now find information about a corporation's EPA violations, federal advisory committee memberships, and participation in the rulemaking process -- all in one place. I wanted to highlight that last feature a bit more, though. To my knowledge, this is the first time that the full corpus of public comments submitted to regulations.gov has been available for bulk download and analysis. This isn't a coincidence: regulations.gov is built using technologies that make scraping it unusually difficult. This is unfortunate, since everyone seems to agree that federal rulemakings are gaining in importance -- both because of congressional gridlock that leaves the regulatory process as a second-best option, and because of calls to simplify the regulatory landscape as a pro-growth measure. It's an area where influence is certainly exerted -- rulemakers are obliged to review every comment -- but little attention is paid to who's flooding dockets with comments, and which directions rules are being pushed. It's taken us several months to develop a reliable solution and to obtain past rulemakings, but we now have the data in hand. We plan to do much more with this dataset, and we're hoping that others will want to dig in, too. You can find a link to the bulk download options in the post above -- the full compressed archive of extracted text and metadata is ~16GB, but we've provided options for grabbing individual agencies' or dockets' data. If anyone wants the original documents (PDFs, DOCs, etc) we can talk through how to make that happen, but as they clock in at 1.5TB we'll want to make sure folks know what they're getting into before we spend the time and bandwidth. Finally, note that we currently o
Tom Johnson

http://theyrule.net - 1 views

  •  
    They Rule Overview They Rule aims to provide a glimpse of some of the relationships of the US ruling class. It takes as its focus the boards of some of the most powerful U.S. companies, which share many of the same directors. Some individuals sit on 5, 6 or 7 of the top 1000 companies. It allows users to browse through these interlocking directories and run searches on the boards and companies. A user can save a map of connections complete with their annotations and email links to these maps to others. They Rule is a starting point for research about these powerful individuals and corporations. Context A few companies control much of the economy and oligopolies exert control in nearly every sector of the economy. The people who head up these companies swap on and off the boards from one company to another, and in and out of government committees and positions. These people run the most powerful institutions on the planet, and we have almost no say in who they are. This is not a conspiracy, they are proud to rule, yet these connections of power are not always visible to the public eye. Karl Marx once called this ruling class a 'band of hostile brothers.' They stand against each other in the competitve struggle for the continued accumulation of their capital, but they stand together as a family supporting their interests in perpetuating the profit system as whole. Protecting this system can require the cover of a 'legitimate' force - and this is the role that is played by the state. An understanding of this system can not be gleaned from looking at the inter-personal relations of this class alone, but rather how they stand in relation to other classes in society. Hopefully They Rule will raise larger questions about the structure of our society and in whose benefit it is run. The Data We do not claim that this data is 100% accurate at all times. Corporate directors have a habit of dying, quitting boards, joining new ones and most frustratingly passing on their name
  •  
    I think this data must be very useful to the people in Occupy Wall Street
Tom Johnson

Open Data Cook Book - 0 views

  •  
    Open Data Cook BookMaking Open Data Accessible for EveryoneAbout the Cook BookThe open data cook book is collecting recipes for ways to find and use open data, particularly open data of social value - such as open government data, or open data for campaigners and charities. Working with data can seem scary. But it doesn't have to be. There are many different ways to make data useful - and lots of different gadgets to help you. Take a look at the growing list of cook book recipes to find simple step by step ideas for making use of open data. RecipesYou can find a list of the recipes so far here. Drafts, ideas and notesIn the cooks notebook you can find draft notes on using different datasets and sketches that might develop into recipes in future. Get InvolvedFind out how to get involved here or jump right in and create a recipe. Tweet with the #opendatacookbook tag, or bookmark content on del.icio.us 'opendatacookbook ' to share with the project. Join the mailing list to discuss developments. UpdateAfter a brief experiment with Drupal as a CMS for the cook book - we've switched to DokuWiki for a bit to make compiling a list of recipes a lot easier before we work out the best way to run the Cook Book.
1 - 12 of 12
Showing 20 items per page