Skip to main content

Home/ Arquitectura?/ Group items tagged bigdata

Rss Feed Group items tagged

Pablo Lalloni

Data Modeling for NoSQL - 0 views

  •  
    "Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."
Pablo Lalloni

http://res.infoq.com/downloads/pdfdownloads/presentations/QConSF2012-TonyTam-Datamodeli... - 0 views

  •  
    Data Modeling with NoSQL (slides)
Pablo Lalloni

shark - 0 views

  •  
    "Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones."
Pablo Lalloni

Hama - a general BSP framework on top of Hadoop - 0 views

  •  
    "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
Pablo Lalloni

GravityLabs/HPaste - 0 views

  •  
    "HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"
Pablo Lalloni

andypetrella/spark-notebook - 0 views

  •  
    "The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, http://spark-notebook.io."
carlosmiranda

Big Data is Scaling BI and Analytics - 2 views

  •  
    Excelente artículo. Habría que distribuirlo por unas cuantas oficinas.
Pablo Lalloni

Apache Phoenix - 0 views

  •  
    "Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. "
Pablo Lalloni

Presto | Distributed SQL Query Engine for Big Data - 0 views

  •  
    "Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook."
Pablo Lalloni

Informe s/ BigData en el gobierno de UK - 1 views

  •  
    "1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."
Pablo Lalloni

Apache Phoenix - 0 views

  •  
    "Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows."
Pablo Lalloni

Shark - Lightning Fast Data Warehouse System - 0 views

  •  
    "Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."
1 - 20 of 26 Next ›
Showing 20 items per page