Group items tagged data-processing - Arquitectura?

The HDF Group - Why use HDF? - 0 views

www.hdfgroup.org/why_hdf

development programming data-storage data-analysis data-manipulation library

shared by Pablo Lalloni on 06 Apr 13 - No Cached

Pablo Lalloni on 06 Apr 13

"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."

<div class="cArrow"> </div><div class="cContentInner">"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."</div>

...

Cancel

Baratine | a distributed in-memory Java service platform - 0 views

baratine.io

distributed java development programming library service distributed-computing

shared by Pablo Lalloni on 11 Sep 14 - No Cached

Pablo Lalloni on 11 Sep 14

"Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."

<div class="cArrow"> </div><div class="cContentInner">"Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."</div>

...

Cancel

Rationale - Datomic - 0 views

www.datomic.com/rationale.html

development cloud-computing database distributed-computing

shared by Pablo Lalloni on 01 Aug 13 - No Cached

Pablo Lalloni on 01 Aug 13

"Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."

<div class="cArrow"> </div><div class="cContentInner">"Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."</div>

...

Cancel

Graphite - Scalable Realtime Graphing - Graphite - 0 views

graphite.wikidot.com/start

development tools operations performance monitoring instrumentation infrastructure

shared by Pablo Lalloni on 21 May 13 - Cached

Pablo Lalloni on 21 May 13

What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.

<div class="cArrow"> </div><div class="cContentInner">What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.</div>

...

Cancel

kiama - A Scala library for language processing - Google Project Hosting - 0 views

code.google.com/kiama

scala library language-processing programming development jvm

shared by Pablo Lalloni on 21 Aug 12 - No Cached

Pablo Lalloni on 21 Aug 12

"Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."

<div class="cArrow"> </div><div class="cContentInner">"Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."</div>

...

Cancel

nathanmarz/cascalog · GitHub - 0 views

github.com/cascalog

distributed-computing hadoop library programming development cloud-computing java clojure jvm

shared by Pablo Lalloni on 04 Apr 13 - No Cached

Pablo Lalloni on 04 Apr 13

"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."

<div class="cArrow"> </div><div class="cContentInner">"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."</div>

...

Cancel

Luna - 0 views

www.luna-lang.org

big-data development data-processing data-pipeline architecture programming

shared by Pablo Lalloni on 29 Sep 18 - No Cached

Pablo Lalloni on 29 Sep 18

Luna is a data processing and visualization environment built on a principle that people need an immediate connection to what they are building. It provides an ever-growing library of highly tailored, domain specific components and an extensible framework for building new ones.

<div class="cArrow"> </div><div class="cContentInner">Luna is a data processing and visualization environment built on a principle that people need an immediate connection to what they are building. It provides an ever-growing library of highly tailored, domain specific components and an extensible framework for building new ones.</div>

...

Cancel

Apache Flink: Scalable Batch and Stream Data Processing - 1 views

flink.apache.org/index.html

development programming streaming data scala java distributed-computing

shared by Pablo Lalloni on 28 Nov 15 - No Cached

Pablo Lalloni on 28 Nov 15

"Apache Flink is an open source platform for distributed stream and batch data processing."

<div class="cArrow"> </div><div class="cContentInner">"Apache Flink is an open source platform for distributed stream and batch data processing."</div>

...

Cancel

Hama - a general BSP framework on top of Hadoop - 0 views

hama.apache.org

development programming bsp hadoop cloud-computing distributed-computing bigdata big-data

shared by Pablo Lalloni on 05 Aug 13 - No Cached

Pablo Lalloni on 05 Aug 13

"Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"

<div class="cArrow"> </div><div class="cContentInner">"Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"</div>

...

Cancel

A Deeper Look at Reactive Streams with Akka Streams 1.0 and Slick 3.0 - Free E-Books | ... - 0 views

www.typesafe.com/...akka-streams-1-0-and-slick-3-0

development programming scala akka stream-processing jvm akka-streams

shared by Pablo Lalloni on 06 Jul 15 - No Cached

Pablo Lalloni on 06 Jul 15

"Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure-with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java."

<div class="cArrow"> </div><div class="cContentInner">"Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure-with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java."</div>

...

Cancel

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines | hadoopnew - Ya... - 0 views

developer.yahoo.com/...-processing-pipelines-444.html

pig sql hive hadoop bigdata

shared by Pablo Lalloni on 30 May 13 - No Cached

Nux - Overview - 0 views

acs.lbl.gov/nux

xml xom programming development jvm stax xpath xquery jaxb streaming

shared by Pablo Lalloni on 30 Mar 12 - Cached

Pablo Lalloni on 30 Mar 12

Nux is an open-source Java toolkit making efficient and powerful XML processing easy. It is geared towards embedded use in high-throughput XML messaging middleware such as large-scale Peer-to-Peer infrastructures, message queues, publish-subscribe and matchmaking systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc.

<div class="cArrow"> </div><div class="cContentInner">Nux is an open-source Java toolkit making efficient and powerful XML processing easy. It is geared towards embedded use in high-throughput XML messaging middleware such as large-scale Peer-to-Peer infrastructures, message queues, publish-subscribe and matchmaking systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc.</div>

...

Cancel

Log(Graph): A Near-Optimal High-Performance Graph Representation - 0 views

people.csail.mit.edu/...loggraph.pdf

shared by Pablo Lalloni on 29 Sep 18 - No Cached

Pablo Lalloni on 29 Sep 18

big-data graph graph-processing architecture development programming

<div class="cArrow"> </div><div class="cContentInner">big-data graph graph-processing architecture development programming</div>

...

Cancel

Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job - 1 views

info.lightbend.com/gine-for-the-job-register.html

architecture akka spark kafka data-streaming streaming fast-data fastdata stream-processing

shared by Pablo Lalloni on 29 Mar 18 - No Cached

Questioning the Lambda Architecture - O'Reilly Radar - 0 views

radar.oreilly.com/...g-the-lambda-architecture.html

big-data architecture lambda stream-processing bigdata development

shared by Pablo Lalloni on 05 Nov 14 - No Cached

Cloudbreak - 1 views

sequenceiq.com/cloudbreak

development cloud-computing hadoop devops docker distributed-computing

shared by Pablo Lalloni on 14 Aug 14 - No Cached

Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud. The main features of Docker are: Lightweight, portable Build once, run anywhere VM - without the overhead of a VM Each virtualised application includes not only the application and the necessary binaries and libraries, but also an entire guest operating system The Docker Engine container comprises just the application and its dependencies. It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers. Containers are isolated It can be automated and scripted
- Pablo Lalloni on 14 Aug 14
  
  Probablemente la mejor descripción corta de docker que he leído en solo un párrafo y una lista de features. Deberíamos usarla.
  
  <div class="cArrow"> </div><div class="cContentInner">Probablemente la mejor descripción corta de docker que he leído en solo un párrafo y una lista de features. Deberíamos usarla. </div>
  
  ...
  
  Cancel
...

Cancel

Pablo Lalloni on 14 Aug 14

"Cloudbreak is a RESTful Hadoop as a Service API. Once it is deployed in your favourite servlet container exposes a REST API allowing to span up Hadoop clusters of arbitrary sizes on your selected cloud provider. Provisioning Hadoop has never been easier. Cloudbreak is built on the foundation of cloud providers API (Amazon AWS, Microsoft Azure, Google Cloud Compute...), Apache Ambari, Docker containers, Serf and dnsmasq."

<div class="cArrow"> </div><div class="cContentInner">"Cloudbreak is a RESTful Hadoop as a Service API. Once it is deployed in your favourite servlet container exposes a REST API allowing to span up Hadoop clusters of arbitrary sizes on your selected cloud provider. Provisioning Hadoop has never been easier. Cloudbreak is built on the foundation of cloud providers API (Amazon AWS, Microsoft Azure, Google Cloud Compute...), Apache Ambari, Docker containers, Serf and dnsmasq."</div>

...

Cancel

Group items tagged