Group items tagged bigdata - Arquitectura?

Storm, distributed and fault-tolerant realtime computation - 0 views

storm-project.net

development realtime bigdata big-data distributed streaming

shared by Pablo Lalloni on 21 Jan 13 - No Cached

Data Modeling for NoSQL - 0 views

www.infoq.com/...data-modeling-mongodb

data modeling nosql mongodb bigdata big-data development

shared by Pablo Lalloni on 14 May 13 - No Cached

Pablo Lalloni on 14 May 13

"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."

<div class="cArrow"> </div><div class="cContentInner">"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."</div>

...

Cancel

http://res.infoq.com/downloads/pdfdownloads/presentations/QConSF2012-TonyTam-Datamodeli... - 0 views

res.infoq.com/...rdocumentorienteddatabases.pdf

data modeling nosql bigdata big-data development mongodb

shared by Pablo Lalloni on 14 May 13 - No Cached

Pablo Lalloni on 14 May 13

Data Modeling with NoSQL (slides)

<div class="cArrow"> </div><div class="cContentInner">Data Modeling with NoSQL (slides)</div>

...

Cancel

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines | hadoopnew - Ya... - 0 views

developer.yahoo.com/...-processing-pipelines-444.html

pig sql hive hadoop bigdata

shared by Pablo Lalloni on 30 May 13 - No Cached

State of the Hadoop ecosystem in early 2013 - Adaltas - 0 views

www.adaltas.com/...hadoop-2013-ecosystem

hadoop bigdata

shared by Pablo Lalloni on 31 May 13 - No Cached

Iteratees in Big Data at Klout « Klout Engineering - 0 views

engineering.klout.com/...iteratees-in-big-data-at-klout

development programming scala iteratees play! functional-programming bigdata

shared by Pablo Lalloni on 05 Feb 13 - No Cached

shark - 0 views

github.com/wiki

development programming bigdata big-data distributed-computing cloud-computing spark hive

shared by Pablo Lalloni on 05 Aug 13 - No Cached

Pablo Lalloni on 05 Aug 13

"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones."

<div class="cArrow"> </div><div class="cContentInner">"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones."</div>

...

Cancel

Hama - a general BSP framework on top of Hadoop - 0 views

hama.apache.org

development programming bsp hadoop cloud-computing distributed-computing bigdata big-data

shared by Pablo Lalloni on 05 Aug 13 - No Cached

Pablo Lalloni on 05 Aug 13

"Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"

<div class="cArrow"> </div><div class="cContentInner">"Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"</div>

...

Cancel

GravityLabs/HPaste - 0 views

github.com/HPaste

development programming scala library hbase hadoop big-data bigdata mapreduce map-reduce

shared by Pablo Lalloni on 17 Oct 13 - No Cached

Pablo Lalloni on 17 Oct 13

"HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"

<div class="cArrow"> </div><div class="cContentInner">"HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"</div>

...

Cancel

Do you know Big Data? - 0 views

DIIGO_FILE_HOME/9xbh/dxg6

development cloud-computing distributed-computing compute-cloud bigdata big-data

shared by Pablo Lalloni on 05 Oct 14 - No Cached

Facebook open sources its SQL-on-Hadoop engine, and the web rejoices - Tech News and An... - 0 views

gigaom.com/...op-engine-and-the-web-rejoices

development hadoop bigdata sql facebook hive presto

shared by Pablo Lalloni on 08 Nov 13 - No Cached

Big Data Benchmark - 0 views

amplab.cs.berkeley.edu/benchmark

benchmark development bigdata hive redshift shark impala sql

shared by Pablo Lalloni on 08 Nov 13 - No Cached

andypetrella/spark-notebook - 0 views

github.com/...spark-notebook

development data-science bigdata spark tools

shared by Pablo Lalloni on 05 Oct 15 - No Cached

Pablo Lalloni on 05 Oct 15

"The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, http://spark-notebook.io."

<div class="cArrow"> </div><div class="cContentInner">"The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, <a href="http://spark-notebook.io" rel="nofollow" target="_blank">http://spark-notebook.io</a>."</div>

...

Cancel

Big Data is Scaling BI and Analytics - 2 views

www.information-management.com/...-and-analytics-10021093-1.html

hadoop hdfs avro hbase chukwa business-intelligence bigdata map-reduce big-data

shared by carlosmiranda on 23 Sep 11 - No Cached

Pablo Lalloni liked it

Pablo Lalloni on 24 Sep 11

Excelente artículo. Habría que distribuirlo por unas cuantas oficinas.

<div class="cArrow"> </div><div class="cContentInner">Excelente artículo. Habría que distribuirlo por unas cuantas oficinas.</div>

...

Cancel

Apache Phoenix - 0 views

phoenix.incubator.apache.org

development programming database bigdata sql phoenix hbase

shared by Pablo Lalloni on 22 May 14 - No Cached

Pablo Lalloni on 22 May 14

"Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. "

<div class="cArrow"> </div><div class="cContentInner">"Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. "</div>

...

Cancel

Presto | Distributed SQL Query Engine for Big Data - 0 views

prestodb.io

development programming opensource database presto bigdata sql facebook

shared by Pablo Lalloni on 22 May 14 - No Cached

Pablo Lalloni on 22 May 14

"Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook."

<div class="cArrow"> </div><div class="cContentInner">"Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook."</div>

...

Cancel

Questioning the Lambda Architecture - O'Reilly Radar - 0 views

radar.oreilly.com/...g-the-lambda-architecture.html

big-data architecture lambda stream-processing bigdata development

shared by Pablo Lalloni on 05 Nov 14 - No Cached

Informe s/ BigData en el gobierno de UK - 1 views

www.gov.uk/...nologies_Big_Data_report_1.pdf

government big-data uk trends opportunities case-study policy

shared by Pablo Lalloni on 09 Jan 15 - No Cached

Pablo Lalloni on 09 Jan 15

"1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."

<div class="cArrow"> </div><div class="cContentInner">"1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."</div>

...

Cancel

Apache Phoenix - 0 views

phoenix.apache.org/index.html

development tools cloud-computing programming bigdata hadoop hbase distributed-computing sql jdbc

shared by Pablo Lalloni on 04 Sep 14 - No Cached

Pablo Lalloni on 04 Sep 14

"Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows."

<div class="cArrow"> </div><div class="cContentInner">"Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows."</div>

...

Cancel

Shark - Lightning Fast Data Warehouse System - 0 views

shark.cs.berkeley.edu

hive spark bigdata hadoop warehouse data development tools cloud-computing distributed-computing infrastructure

shared by Pablo Lalloni on 04 Jun 13 - No Cached

Pablo Lalloni on 04 Jun 13

"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."

<div class="cArrow"> </div><div class="cContentInner">"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."</div>

...

Cancel

Group items tagged

Storm, distributed and fault-tolerant realtime computation - 0 views

Data Modeling for NoSQL - 0 views

http://res.infoq.com/downloads/pdfdownloads/presentations/QConSF2012-TonyTam-Datamodeli... - 0 views

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines | hadoopnew - Ya... - 0 views

State of the Hadoop ecosystem in early 2013 - Adaltas - 0 views

Iteratees in Big Data at Klout « Klout Engineering - 0 views

shark - 0 views

Hama - a general BSP framework on top of Hadoop - 0 views

GravityLabs/HPaste - 0 views

Do you know Big Data? - 0 views

Facebook open sources its SQL-on-Hadoop engine, and the web rejoices - Tech News and An... - 0 views

Big Data Benchmark - 0 views

andypetrella/spark-notebook - 0 views

Big Data is Scaling BI and Analytics - 2 views

Apache Phoenix - 0 views

Presto | Distributed SQL Query Engine for Big Data - 0 views

Questioning the Lambda Architecture - O'Reilly Radar - 0 views

Informe s/ BigData en el gobierno de UK - 1 views

Apache Phoenix - 0 views

Shark - Lightning Fast Data Warehouse System - 0 views

Related searches