Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

Matt On Stuff: Hadoop For The Rest Of Us - 0 views

mattonstuff.blogspot.com/...hadoop-for-rest-of-us.html

Hadoop_Hive Java-API Oracle Google Map_Reduce Cloud Cloud-Productivity-Platform

shared by Gary Edwards on 11 May 12 - No Cached

Gary Edwards on 11 May 12

Excellent Hadoop/Hive explanation. Hat tip to Matt Asay for the link. I eft a comment on Matt's blog questioning the consequences of the Oracle vs. Google Android lawsuit, and the possible enforcement of the Java API copyright claim against Hadoop/Hive. Based on this explanation of Hadoop/Hive, i'm wondering if Oracle is making a move to claim the entire era of Big Data Cloud Computing? To understand why, it's first necessary to read Matt the Hadoople's explanation. kill shot excerpt: "You've built your Hadoop job, and have successfully processed the data. You've generated some structured output, and that resides on HDFS. Naturally you want to run some reports, so you load your data into a MySQL or an Oracle database. Problem is, the data is large. In fact it's so large that when you try to run a query against the table you've just created, your database begins to cry. If you listen to its sobs, you'll probably hear "I was built to process Megabytes, maybe Gigabytes of data. Not Terabytes. Not Perabytes. That's not my job. I was built in the 80's and 90's, back when floppy drives were used. Just leave me alone". "This is where Hive comes to the rescue. Hive lets you run an SQL statement against structured data stored on HDFS. When you issue an SQL query, it parses it, and translates it into a Java Map/Reduce job, which is then executed on your data. Although Hive does some optimizations, in general it just goes record by record against all your data. This means that it's relatively slow - a typical Hive query takes 5 or 10 minutes to complete, depending on how much data you have. However, that's what makes it effective. Unlike a relational database, you don't waste time on query optimization, adding indexes, etc. Instead, what keeps the processing time down is the fact that the query is run on all machines in your Hadoop cluster, and the scalability is taken care of for you." "Hive is extremely useful in data-warehousing kind of scenarios. You would

<div class="cArrow"> </div><div class="cContentInner">Excellent Hadoop/Hive explanation. Hat tip to Matt Asay for the link. I eft a comment on Matt's blog questioning the consequences of the Oracle vs. Google Android lawsuit, and the possible enforcement of the Java API copyright claim against Hadoop/Hive. Based on this explanation of Hadoop/Hive, i'm wondering if Oracle is making a move to claim the entire era of Big Data Cloud Computing? To understand why, it's first necessary to read Matt the Hadoople's explanation. kill shot excerpt: "You've built your Hadoop job, and have successfully processed the data. You've generated some structured output, and that resides on HDFS. Naturally you want to run some reports, so you load your data into a MySQL or an Oracle database. Problem is, the data is large. In fact it's so large that when you try to run a query against the table you've just created, your database begins to cry. If you listen to its sobs, you'll probably hear "I was built to process Megabytes, maybe Gigabytes of data. Not Terabytes. Not Perabytes. That's not my job. I was built in the 80's and 90's, back when floppy drives were used. Just leave me alone". "This is where Hive comes to the rescue. Hive lets you run an SQL statement against structured data stored on HDFS. When you issue an SQL query, it parses it, and translates it into a Java Map/Reduce job, which is then executed on your data. Although Hive does some optimizations, in general it just goes record by record against all your data. This means that it's relatively slow - a typical Hive query takes 5 or 10 minutes to complete, depending on how much data you have. However, that's what makes it effective. Unlike a relational database, you don't waste time on query optimization, adding indexes, etc. Instead, what keeps the processing time down is the fact that the query is run on all machines in your Hadoop cluster, and the scalability is taken care of for you." "Hive is extremely useful in data-warehousing kind of scenarios. You would</div>

...

Cancel

1 - 1 of 1

Showing 20▼ items per page

Group items tagged

Matt On Stuff: Hadoop For The Rest Of Us - 0 views

Related searches