Group items tagged fast-data - Arquitectura?

The HDF Group - Why use HDF? - 0 views

www.hdfgroup.org/why_hdf

development programming data-storage data-analysis data-manipulation library

shared by Pablo Lalloni on 06 Apr 13 - No Cached

Pablo Lalloni on 06 Apr 13

"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."

<div class="cArrow"> </div><div class="cContentInner">"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."</div>

...

Cancel

BIG DATA APPLICATIONS Fast Data: Big Data Evolved - White Paper - 0 views

info.typesafe.com/ta-Big-Data-Evolved-WP_LP.html

white paper

shared by munyeco on 17 Sep 15 - No Cached

munyeco on 17 Sep 15

There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:

<div class="cArrow"> </div><div class="cContentInner">There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you: </div>

...

Cancel

Quest Data Connectors - Cloudera Support - 0 views

ccp.cloudera.com/...Quest+Data+Connectors

apache hadoop connectors development tools operations cloud-computing oracle infrastructure

shared by Pablo Lalloni on 10 Apr 13 - No Cached

Pablo Lalloni on 10 Apr 13

"Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."

<div class="cArrow"> </div><div class="cContentInner">"Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."</div>

...

Cancel

Data Modeling for NoSQL - 0 views

www.infoq.com/...data-modeling-mongodb

data modeling nosql mongodb bigdata big-data development

shared by Pablo Lalloni on 14 May 13 - No Cached

Pablo Lalloni on 14 May 13

"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."

<div class="cArrow"> </div><div class="cContentInner">"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."</div>

...

Cancel

Shark - Lightning Fast Data Warehouse System - 0 views

shark.cs.berkeley.edu

hive spark bigdata hadoop warehouse data development tools cloud-computing distributed-computing infrastructure

shared by Pablo Lalloni on 04 Jun 13 - No Cached

Pablo Lalloni on 04 Jun 13

"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."

<div class="cArrow"> </div><div class="cContentInner">"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."</div>

...

Cancel

GravityLabs/HPaste - 0 views

github.com/HPaste

development programming scala library hbase hadoop big-data bigdata mapreduce map-reduce

shared by Pablo Lalloni on 17 Oct 13 - No Cached

Pablo Lalloni on 17 Oct 13

"HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"

<div class="cArrow"> </div><div class="cContentInner">"HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals: Provide a strong, clear syntax for querying and filtration Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler! Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere. A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables. Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job. Use Scala's type system to its advantage--the compiler should verify the integrity of the schema. Be a verbose DSL--minimize boilerplate code, but be human readable!"</div>

...

Cancel

Marc Lehmann's "LibLZF" - 0 views

oldhome.schmorp.de/...liblzf.html

development programming library compression

shared by Pablo Lalloni on 04 Jun 13 - Cached

Pablo Lalloni on 04 Jun 13

"LibLZF is a very small data compression library. It consists of only two .c and two .h files and is very easy to incorporate into your own programs. The compression algorithm is very, very fast, yet still written in portable C."

<div class="cArrow"> </div><div class="cContentInner">"LibLZF is a very small data compression library. It consists of only two .c and two .h files and is very easy to incorporate into your own programs. The compression algorithm is very, very fast, yet still written in portable C."</div>

...

Cancel

Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job - 1 views

info.lightbend.com/gine-for-the-job-register.html

architecture akka spark kafka data-streaming streaming fast-data fastdata stream-processing

shared by Pablo Lalloni on 29 Mar 18 - No Cached

Service Discovery & Orchestration With Mesos and Consul | My Tech Musings and Stuff I W... - 4 views

philzim.com/...stration-with-mesos-and-consul

shared by munyeco on 02 Dec 14 - No Cached

Joel, we chose consul for a few reasons. First, I wanted a service discovery solution that could work with our legacy architectures as well as any new projects we run on mesos. In addition, I wanted a way to bootstrap the mesos cluster setup/configuration (masters and slaves) such that when they are provisioned, they will be auto-configured using data in consul. Think zk values, quorum, etc. I’ll be working on a solution for this very soon. Lastly, I really like how consul supports health-checks, which we will leverage heavily to ensure that only “healthy” services are actually registered. Like you mentioned, consul is very fast in updating the service info and that is very important as well. Hope that helps, -Phil
...

Cancel

Group items tagged