Skip to main content

/ Future of the Web/ Group items tagged hadoop

Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

The New York Times Archives + Amazon Web Services = TimesMachine - Open - Code - New Yo... - 0 views

open.blogs.nytimes.com/...azon-web-services-timesmachine

amazon web services hadoop javascript ny times pdf sgml tiff times machine

shared by Paul Merrell on 29 May 08 - Cached

TimesMachine is a collection of full-page image scans of the newspaper from 1851–1922 (i.e., the public domain archives). Organized chronologically and navigated by a simple calendar interface, TimesMachine provides a unique way to traverse the historical archives of The New York Times.
...

Cancel
Using Amazon Web Services, Hadoop and our own code, we ingested 405,000 very large TIFF images, 3.3 million articles in SGML and 405,000 xml files mapping articles to rectangular regions in the TIFF’s. This data was converted to a more web-friendly 810,000 PNG images (thumbnails and full images) and 405,000 JavaScript files — all of it ready to be assembled into a TimesMachine. By leveraging the power of AWS and Hadoop, we were able to utilize hundreds of machines concurrently and process all the data in less than 36 hours.
...

Cancel

anonymous on 23 Sep 13

Like this http://www.hdfilmsaati.net Film,dvd,download,free download,product... ppc,adword,adsense,amazon,clickbank,osell,bookmark,dofollow,edu,gov,ads,linkwell,traffic,scor,serp,goggle,bing,yahoo.ads,ads network,ads goggle,bing,quality links,link best,ptr,cpa,bpa

<div class="cArrow"> </div><div class="cContentInner">Like this <a href="http://www.hdfilmsaati.net" rel="nofollow" target="_blank">http://www.hdfilmsaati.net</a> Film,dvd,download,free download,product... ppc,adword,adsense,amazon,clickbank,osell,bookmark,dofollow,edu,gov,ads,linkwell,traffic,scor,serp,goggle,bing,yahoo.ads,ads network,ads goggle,bing,quality links,link best,ptr,cpa,bpa</div>

...

Cancel

Startup Crunches 100 Terabytes of Data in a Record 23 Minutes | WIRED - 0 views

www.wired.com/...rabytes-data-record-23-minutes

startup data record transfer Terabytes minutes Big Data

shared by Gonzalo San Gil, PhD. on 14 Oct 14 - No Cached

Gonzalo San Gil, PhD. on 14 Oct 14

"There's a new record holder in the world of "big data." On Friday, Databricks-a startup spun out of the University California, Berkeley-announced that it has sorted 100 terabytes of data in a record 23 minutes using a number-crunching tool called Spark, eclipsing the previous record held by Yahoo and the popular big-data tool Hadoop."

<div class="cArrow"> </div><div class="cContentInner">"There's a new record holder in the world of "big data." On Friday, Databricks-a startup spun out of the University California, Berkeley-announced that it has sorted 100 terabytes of data in a record 23 minutes using a number-crunching tool called Spark, eclipsing the previous record held by Yahoo and the popular big-data tool Hadoop."</div>

...

Cancel

1 - 2 of 2

Showing 20▼ items per page

Related searches