What is Graphite?
Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
Who should use Graphite?
Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.
"Package mangos is an implementation in pure Go of the SP ("Scalability Protocols") messaging system. This makes heavy use of go channels, internally, but it can operate on systems that lack support for cgo."
Ø The socket library that acts as a concurrency framework.
Ø Faster than TCP, for clustered products and supercomputing.
Ø Carries messages across inproc, IPC, TCP, and multicast.
Ø Connect N-to-N via fanout, pubsub, pipeline, request-reply.
Ø Asynch I/O for scalable multicore message-passing apps.
Ø Large and active open source community.
Ø 30+ languages including C, C++, Java, .NET, Python.
Ø Most OSes including Linux, Windows, OS X.
Ø LGPL free software with full commercial support from iMatix.
"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."
"LZ4 is a very fast lossless compression algorithm, providing compression speed at 300 MB/s per core, scalable with multi-cores CPU. It also features an extremely fast decoder, with speeds up and beyond 1GB/s per core, typically reaching RAM speed limits on multi-core systems."
The Eventsourced library adds scalable actor state persistence and at-least-once message delivery guarantees to Akka. With Eventsourced, stateful actors:
- Persist received messages by appending them to a log (journal)
- Project received messages to derive current state
- Usually hold current state in memory (memory image)
- Recover current (or past) state by replaying received messages (during normal application start or after crashes)
- Never persist current state directly (except optional state snapshots for recovery time optimization)
"Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale."
"Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals."
"Pachyderm is a complete data analytics solution that lets you efficiently store and analyze your data using containers. We offer the scalability and broad functionality of Hadoop, with the ease of use of Docker."
"Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.
Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are:
* Supports message passing paradigm style of application development
* Provides a flexible, simple, and easy-to-use small APIs
* Enables to perform better than MPI for communication-intensive applications
* Guarantees impossibility of deadlocks or collisions in the communication mechanisms"
"Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures.
It does this by:
Bringing declarative data manipulation into the application, and the data with it
Getting time, process and perception right
Process (writes) require coordination
Perception (reads) require none
The past doesn't change
Leveraging immutability, and a sound model of state
Datomic has:
ACID Transactions
Joins
A sound data model
A logical query language - Datalog
Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting:
Hierarchy
Multi-valued attributes
Minimal schema
Reliable operation on unreliable, ephemeral cloud instances
Time
Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."
"Everything at Google, from Search to Gmail, is packaged and run in a Linux container," explained Eric Brewer, vice president of infrastructure at the Internet search giant, in announcing the open sourcing of Kubernetes. "Each week we launch more than 2 billion container instances across our global data centers, and the power of containers has enabled both more reliable services and higher, more-efficient scalability."
"Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure. With simple API calls, you can launch and stop container-enabled applications, query the complete state of your cluster, and access many familiar features like security groups, Elastic Load Balancing, EBS volumes, and IAM roles. You can use Amazon ECS to schedule the placement of containers across your cluster based on your resource needs and availability requirements. You can also integrate your own scheduler or third-party schedulers to meet business or application specific requirements."
Esto podríamos usarlo para solucionar el pendiente de tener estadísticas en tiempo real de los ambientes en producción.
Quizás incluso es aplicable a la solución de agentes de monitoreo.