Skip to main content

Home/ Arquitectura?/ Group items tagged fast-data

Rss Feed Group items tagged

Pablo Lalloni

The HDF Group - Why use HDF? - 0 views

  •  
    "HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
munyeco

BIG DATA APPLICATIONS Fast Data: Big Data Evolved - White Paper - 0 views

  •  
    There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:
Pablo Lalloni

Quest Data Connectors - Cloudera Support - 0 views

  •  
    "Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."
Pablo Lalloni

Data Modeling for NoSQL - 0 views

  •  
    "Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."
Pablo Lalloni

Shark - Lightning Fast Data Warehouse System - 0 views

  •  
    "Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."
Pablo Lalloni

GravityLabs/HPaste - 0 views

  •  
    "HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"
Pablo Lalloni

Marc Lehmann's "LibLZF" - 0 views

  •  
    "LibLZF is a very small data compression library. It consists of only two .c and two .h files and is very easy to incorporate into your own programs. The compression algorithm is very, very fast, yet still written in portable C."
munyeco

Service Discovery & Orchestration With Mesos and Consul | My Tech Musings and Stuff I W... - 4 views

  • Joel, we chose consul for a few reasons. First, I wanted a service discovery solution that could work with our legacy architectures as well as any new projects we run on mesos. In addition, I wanted a way to bootstrap the mesos cluster setup/configuration (masters and slaves) such that when they are provisioned, they will be auto-configured using data in consul. Think zk values, quorum, etc. I’ll be working on a solution for this very soon. Lastly, I really like how consul supports health-checks, which we will leverage heavily to ensure that only “healthy” services are actually registered. Like you mentioned, consul is very fast in updating the service info and that is very important as well. Hope that helps, -Phil
1 - 9 of 9
Showing 20 items per page