Skip to main content

Home/ Arquitectura?/ Group items tagged data-processing

Rss Feed Group items tagged

Pablo Lalloni

The HDF Group - Why use HDF? - 0 views

  •  
    "HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
Pablo Lalloni

Baratine | a distributed in-memory Java service platform - 0 views

  •  
    "Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."
Pablo Lalloni

Rationale - Datomic - 0 views

  •  
    "Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."
Pablo Lalloni

Graphite - Scalable Realtime Graphing - Graphite - 0 views

  •  
    What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.
Pablo Lalloni

kiama - A Scala library for language processing - Google Project Hosting - 0 views

  •  
    "Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."
Pablo Lalloni

nathanmarz/cascalog · GitHub - 0 views

  •  
    "Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."
Pablo Lalloni

Luna - 0 views

  •  
    Luna is a data processing and visualization environment built on a principle that people need an immediate connection to what they are building. It provides an ever-growing library of highly tailored, domain specific components and an extensible framework for building new ones.
Pablo Lalloni

Apache Flink: Scalable Batch and Stream Data Processing - 1 views

  •  
    "Apache Flink is an open source platform for distributed stream and batch data processing."
Pablo Lalloni

Hama - a general BSP framework on top of Hadoop - 0 views

  •  
    "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are: * Supports message passing paradigm style of application development * Provides a flexible, simple, and easy-to-use small APIs * Enables to perform better than MPI for communication-intensive applications * Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
Pablo Lalloni

A Deeper Look at Reactive Streams with Akka Streams 1.0 and Slick 3.0 - Free E-Books | ... - 0 views

  •  
    "Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure-with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java."
Pablo Lalloni

Nux - Overview - 0 views

  •  
    Nux is an open-source Java toolkit making efficient and powerful XML processing easy. It is geared towards embedded use in high-throughput XML messaging middleware such as large-scale Peer-to-Peer infrastructures, message queues, publish-subscribe and matchmaking systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc.
Pablo Lalloni

Log(Graph): A Near-Optimal High-Performance Graph Representation - 0 views

  •  
    big-data graph graph-processing architecture development programming
Pablo Lalloni

Cloudbreak - 1 views

  • Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud. The main features of Docker are: Lightweight, portable Build once, run anywhere VM - without the overhead of a VM Each virtualised application includes not only the application and the necessary binaries and libraries, but also an entire guest operating system The Docker Engine container comprises just the application and its dependencies. It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers. Containers are isolated It can be automated and scripted
    • Pablo Lalloni
       
      Probablemente la mejor descripción corta de docker que he leído en solo un párrafo y una lista de features. Deberíamos usarla. 
  •  
    "Cloudbreak is a RESTful Hadoop as a Service API. Once it is deployed in your favourite servlet container exposes a REST API allowing to span up Hadoop clusters of arbitrary sizes on your selected cloud provider. Provisioning Hadoop has never been easier. Cloudbreak is built on the foundation of cloud providers API (Amazon AWS, Microsoft Azure, Google Cloud Compute...), Apache Ambari, Docker containers, Serf and dnsmasq."
1 - 16 of 16
Showing 20 items per page