Skip to main content

Home/ Arquitectura?/ Group items tagged big-data

Rss Feed Group items tagged

Pablo Lalloni

Informe s/ BigData en el gobierno de UK - 1 views

  •  
    "1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."
munyeco

BIG DATA APPLICATIONS Fast Data: Big Data Evolved - White Paper - 0 views

  •  
    There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:
Pablo Lalloni

Ferry | Big Data Development Environment Using Docker - 0 views

  •  
    "Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI."
Pablo Lalloni

Big Data Poster - Data Science Central - 0 views

  •  
    "A great resource (PDF document) about big data, originally posted on CTOvision.com."
Pablo Lalloni

Ferry | Big Data Development Environment Using Docker - 0 views

  •  
    "Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI."
Pablo Lalloni

Data Modeling for NoSQL - 0 views

  •  
    "Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."
Pablo Lalloni

pachyderm/pachyderm - 0 views

  •  
    "Pachyderm is a complete data analytics solution that lets you efficiently store and analyze your data using containers. We offer the scalability and broad functionality of Hadoop, with the ease of use of Docker."
carlosmiranda

Big Data is Scaling BI and Analytics - 2 views

  •  
    Excelente artículo. Habría que distribuirlo por unas cuantas oficinas.
Pablo Lalloni

nathanmarz/cascalog · GitHub - 0 views

  •  
    "Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."
Pablo Lalloni

elasticsearch/elasticsearch-hadoop - 0 views

  •  
    "Read and write data to/from Elasticsearch within Hadoop/MapReduce libraries. Automatically converts data to/from JSON. Supports MapReduce, Cascading, Hive and Pig."
Pablo Lalloni

shark - 0 views

  •  
    "Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones."
Pablo Lalloni

Hama - a general BSP framework on top of Hadoop - 0 views

  •  
    "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
Pablo Lalloni

Presto | Distributed SQL Query Engine for Big Data - 0 views

  •  
    "Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook."
Pablo Lalloni

Luna - 0 views

  •  
    Luna is a data processing and visualization environment built on a principle that people need an immediate connection to what they are building. It provides an ever-growing library of highly tailored, domain specific components and an extensible framework for building new ones.
Pablo Lalloni

http://res.infoq.com/downloads/pdfdownloads/presentations/QConSF2012-TonyTam-Datamodeli... - 0 views

  •  
    Data Modeling with NoSQL (slides)
Pablo Lalloni

Log(Graph): A Near-Optimal High-Performance Graph Representation - 0 views

  •  
    big-data graph graph-processing architecture development programming
Pablo Lalloni

GravityLabs/HPaste - 0 views

  •  
    "HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"
1 - 20 of 35 Next ›
Showing 20 items per page