Skip to main content

Home/ Arquitectura?/ Group items tagged hdfs

Rss Feed Group items tagged

Pablo Lalloni

The HDF Group - Why use HDF? - 0 views

  •  
    "HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
Pablo Lalloni

lyda/hdfs-docker-registry Repository | Docker Hub Registry - Repositories of Docker Images - 3 views

  •  
    "This is an HDFS based docker-registry."
Pablo Lalloni

Hama - a general BSP framework on top of Hadoop - 0 views

  •  
    "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
Pablo Lalloni

cloudera/cdk - 0 views

  •  
    "The Cloudera Development Kit, or CDK for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem. The goals of the CDK are: Codify expert patterns and practices for building data-oriented systems and applications. Let developers focus on business logic, not plumbing or infrastructure. Provide smart defaults for platform choices. Support piecemeal adoption via loosely-coupled modules."
carlosmiranda

Big Data is Scaling BI and Analytics - 2 views

  •  
    Excelente artículo. Habría que distribuirlo por unas cuantas oficinas.
Pablo Lalloni

Cloudera Connector for Qlikview Download - Cloudera Support - 0 views

  •  
    "The Cloudera Connector for Qlikview enables your Enterprise's power users to access Hadoop data through the Qlikview 11.2. The driver achieves this by translating Open Database Connectivity (ODBC) calls from Qlikview into HiveQL queries. The driver supports CDH 4.1."
1 - 11 of 11
Showing 20 items per page