Skip to main content

Home/ Open Intelligence / Web 3X (Social + Mobile)/ Group items tagged dataset

Rss Feed Group items tagged

Dan R.D.

Infochimps - The Promise of Big Data [27Apr10] - 0 views

  • While the two sets are among those available for no charge, Infochimps sells the more in-depth and extensive datasets it’s derived from Twitter, proving there’s a continued marketplace for this sort of information. For $300 you can buy the dataset containing an hour-by-hour breakdown of the occurence of hashtags, URLs, and smileys in the 1.6 billion tweets created between March 2006 and March 2010. For $250 you can purchase a dataset extracted from those same 1.6 billion tweets with all mentions of stock tokens and related keywords.While information from Twitter has been culled to assess which websites we’ll like and which movies will perform well, analysis from Twitter and from the expanding social graph is really just beginning. Like, for example, the ability to track the time and mention of stock names. The new dataset of stock information offered by Infochimps hopes to demonstrate to the financial industry what the music and film industry already know: big data is a powerful prediction tool.See more at www.readwriteweb.com
  •  
    Austin TX based Infochimps are flinging some interesting Twitter derived datasets into the marketplace. If their work helps the financial industry then we're bound to see a "Big Data Industry" emerge right beside it.
Marc-Alexandre Gagnon

New 5 Billion Page Web Index with Page Rank Now Available for Free from Common Crawl Fo... - 0 views

  • A freely accessible index of 5 billion web pages, their page rank, their link graphs and other metadata, hosted on Amazon EC2, was announced today by the Common Crawl Foundation. "It is crucial [in] our information-based society that Web crawl data be open and accessible to anyone who desires to utilize it," writes Foundation director Lisa Green on the organization's blog.
  • The Foundation is an organization dedicated to leveraging the falling costs of crawling and storage for the benefit of "individuals, academic groups, small start-ups, big companies, governments and nonprofits." It's lead by Gilad Elbaz, the forefather of Google AdSense and the CEO of data platform startup Factual. Joining Elbaz on the Foundation board is internet public domain champion Carl Malamud and semantic web serial entrepreneur Nova Spivack. Director Lisa Green came to the Foundation by way of Creative Commons.
  • The Foundation explains the scope of the project thusly. "Common Crawl is a Web Scale crawl, and as such, each version of our crawl contains billions of documents from the various sites that we are successfully able to crawl. This dataset can be tens of terabytes in size, making transfer of the crawl to interested third parties costly and impractical. In addition to this, performing data processing operations on a dataset this large requires parallel processing techniques, and a potentially large computer cluster. "Luckily for us, Amazon's EC2/S3 cloud computing infrastructure provides us with both a theoretically unlimited storage capacity coupled with localized access to an elastic compute cloud."
  • ...2 more annotations...
  • The organization was formed three years ago, just now started talking about itself publicly and believes that free access to all this information could lead to "a new wave of innovation, education and research."
  • Open Web Advocate James Walker agrees: "An openly accessible archive of the web - that's not owned and controlled by Google - levels the playing field pretty significantly for research and innovation."
D'coda Dcoda

Enipedia - Energy Industry Data - Data Packages - CKAN - the Data Hub - 0 views

shared by D'coda Dcoda on 11 Jun 11 - No Cached
  • Source: http://enipedia.tudelft.nl Enipedia is an active exploration into the applications of wikis and the semantic web for energy and industry issues. Through this we seek to create a collaborative environment for discussion, while also providing the tools that allow for data from different sources to be connected, queried, and visualized from different perspectives
  •  
    includes list of all known formats and datasets for Enipedia
1 - 3 of 3
Showing 20 items per page