Skip to main content

Home/ Arquitectura?/ Group items tagged hadoop

Rss Feed Group items tagged

Pablo Lalloni

Motivations for Apache Hadoop Security | Hortonworks - 0 views

  •  
    "The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls that only allow employees access. Instead, the motivation was simply that security would allow us to use Hadoop more effectively to pool resources between disjointed groups. Larger clusters are much cheaper to operate and require fewer copies of duplicated data."
Pablo Lalloni

Data Connector for Oracle and Hadoop - Unstructured and Structured Data Transfers - 0 views

  •  
    "Data Connector for Oracle and Hadoop is the fastest, most scalable way to transfer data between Oracle and Hadoop. "
Pablo Lalloni

Quest Data Connectors - Cloudera Support - 0 views

  •  
    "Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."
Pablo Lalloni

Cloudbreak - 1 views

  • Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud. The main features of Docker are: Lightweight, portable Build once, run anywhere VM - without the overhead of a VM Each virtualised application includes not only the application and the necessary binaries and libraries, but also an entire guest operating system The Docker Engine container comprises just the application and its dependencies. It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers. Containers are isolated It can be automated and scripted
    • Pablo Lalloni
       
      Probablemente la mejor descripción corta de docker que he leído en solo un párrafo y una lista de features. Deberíamos usarla. 
  •  
    "Cloudbreak is a RESTful Hadoop as a Service API. Once it is deployed in your favourite servlet container exposes a REST API allowing to span up Hadoop clusters of arbitrary sizes on your selected cloud provider. Provisioning Hadoop has never been easier. Cloudbreak is built on the foundation of cloud providers API (Amazon AWS, Microsoft Azure, Google Cloud Compute...), Apache Ambari, Docker containers, Serf and dnsmasq."
Pablo Lalloni

Hadoop Operations - 3 views

  •  
    If you've been tasked with the job of maintaining large and complex Hadoop clusters, or are about to be, this book is a must. You'll learn the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.
Pablo Lalloni

The Growth of Hadoop - Wikibon - 0 views

  •  
    Relevamiento y comparativa de las distribuciones de Hadoop disponibles hasta Agosto 2012.
Pablo Lalloni

twitter/scalding · GitHub - 0 views

  •  
    "Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs."
Pablo Lalloni

The Role of Delegation Tokens in Apache Hadoop Security | Hortonworks - 0 views

  •  
    Delegation tokens play a critical part in Apache Hadoop security, and understanding their design and use is important for comprehending Hadoop's security model.
Pablo Lalloni

elasticsearch/elasticsearch-hadoop - 0 views

  •  
    "Read and write data to/from Elasticsearch within Hadoop/MapReduce libraries. Automatically converts data to/from JSON. Supports MapReduce, Cascading, Hive and Pig."
Pablo Lalloni

Hama - a general BSP framework on top of Hadoop - 0 views

  •  
    "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
Pablo Lalloni

dnafrance/vagrant-hadoop-spark-cluster - 0 views

  •  
    "Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1"
Pablo Lalloni

NICTA/scoobi · GitHub - 0 views

  •  
    "A Scala productivity framework for Hadoop."
Pablo Lalloni

nathanmarz/cascalog · GitHub - 0 views

  •  
    "Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."
Pablo Lalloni

cloudera/cdk - 0 views

  •  
    "The Cloudera Development Kit, or CDK for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem. The goals of the CDK are: Codify expert patterns and practices for building data-oriented systems and applications. Let developers focus on business logic, not plumbing or infrastructure. Provide smart defaults for platform choices. Support piecemeal adoption via loosely-coupled modules."
Pablo Lalloni

Introducing Scoobi and Scalding: Scala DSLs for Hadoop MapReduce * myNoSQL - 0 views

  •  
    Buenas diapositivas introduciendo scalding y scoobi.
Pablo Lalloni

kevinweil/elephant-bird - 0 views

  •  
    "Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code."
1 - 20 of 51 Next › Last »
Showing 20 items per page