Skip to main content

Home/ Open Web/ Group items tagged Hadoop

Rss Feed Group items tagged

Gary Edwards

Matt On Stuff: Hadoop For The Rest Of Us - 0 views

  •  
    Excellent Hadoop/Hive explanation.  Hat tip to Matt Asay for the link.  I eft a comment on Matt's blog questioning the consequences of the Oracle vs. Google Android lawsuit, and the possible enforcement of the Java API copyright claim against Hadoop/Hive.  Based on this explanation of Hadoop/Hive, i'm wondering if Oracle is making a move to claim the entire era of Big Data Cloud Computing?  To understand why, it's first necessary to read Matt the Hadoople's explanation.   kill shot excerpt: "You've built your Hadoop job, and have successfully processed the data. You've generated some structured output, and that resides on HDFS. Naturally you want to run some reports, so you load your data into a MySQL or an Oracle database. Problem is, the data is large. In fact it's so large that when you try to run a query against the table you've just created, your database begins to cry. If you listen to its sobs, you'll probably hear "I was built to process Megabytes, maybe Gigabytes of data. Not Terabytes. Not Perabytes. That's not my job. I was built in the 80's and 90's, back when floppy drives were used. Just leave me alone". "This is where Hive comes to the rescue. Hive lets you run an SQL statement against structured data stored on HDFS. When you issue an SQL query, it parses it, and translates it into a Java Map/Reduce job, which is then executed on your data. Although Hive does some optimizations, in general it just goes record by record against all your data. This means that it's relatively slow - a typical Hive query takes 5 or 10 minutes to complete, depending on how much data you have. However, that's what makes it effective. Unlike a relational database, you don't waste time on query optimization, adding indexes, etc. Instead, what keeps the processing time down is the fact that the query is run on all machines in your Hadoop cluster, and the scalability is taken care of for you." "Hive is extremely useful in data-warehousing kind of scenarios. You would
Gary Edwards

9 Open Source Big Data Technologies Set to Change the Web - 0 views

  •  
    Good run down of new big data technologies for Cloud Computing. The technologies are arranged in context beginning with Apache Hadoop and including HBase and MongoDb. Nice reference article.
Gary Edwards

Google's Chrome Browser Sprouts Programming Kit of the Future "Node.js" | Wired Enterpr... - 1 views

  •  
    Good article describing Node.js.  The Node.js Summitt is taking place in San Francisco on Jan 24th - 25th.  http://goo.gl/AhZTD I'm wondering if anyone has used Node.js to create real time Cloud ready compound documents?  Replacing MSOffice OLE-ODBC-ActiveX heavy productivity documents, forms and reports with Node.js event widgets, messages and database connections?  I'm thinking along the lines of a Lotus Notes alternative with a Node.js enhanced version of EverNote on the front end, and Node.js-Hadoop productivity platform on the server side? Might have to contact Stephen O'Grady on this.  He is a featured speaker at the conference. excerpt: At first, Chito Manansala (Visa & Sabre) built his Internet transaction processing systems using the venerable Java programming language. But he has since dropped Java and switched to what is widely regarded as The Next Big Thing among Silicon Valley developers. He switched to Node. Node is short for Node.js, a new-age programming platform based on a software engine at the heart of Google's Chrome browser. But it's not a browser technology. It's meant to help build software that sits on a distant server somewhere, feeding an application to your PC or smartphone, and it's particularly suited to systems like the one Chito Manansala is building - systems that juggle scads of information streaming to and from other sources. In other words, it's suited to the modern internet. Two years ago, Node was just another open source project. But it has since grown into the development platform of the moment. At Yahoo!, Node underpins "Manhattan," a fledgling online service for building and hosting mobile applications. Microsoft is offering Node atop Windows Azure, its online service for building and hosting a much beefier breed of business application. And Sabre is just one of a host of big names using the open source platform to erect applications on their own servers. Node is based on the Javascript engine at th
1 - 3 of 3
Showing 20 items per page