The Eventsourced library adds scalable actor state persistence and at-least-once message delivery guarantees to Akka. With Eventsourced, stateful actors:
- Persist received messages by appending them to a log (journal)
- Project received messages to derive current state
- Usually hold current state in memory (memory image)
- Recover current (or past) state by replaying received messages (during normal application start or after crashes)
- Never persist current state directly (except optional state snapshots for recovery time optimization)
"The Cloudera Connector for Qlikview enables your Enterprise's power users to access Hadoop data through the Qlikview 11.2. The driver achieves this by translating Open Database Connectivity (ODBC) calls from Qlikview into HiveQL queries. The driver supports CDH 4.1."
"Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle.
Attributes:
Transfer data to and from Oracle up to 5 times faster than Sqoop alone.
Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order.
Reduces overhead on the Oracle instance:
Upwards of 80% reduction in CPU consumption.
Up to 95% reduction in IO time.
Allows other Oracle workloads to simultaneously run seamlessly without disruption.
SLA-driven commercial support available when used as a part of Cloudera Enterprise."
"The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls that only allow employees access. Instead, the motivation was simply that security would allow us to use Hadoop more effectively to pool resources between disjointed groups. Larger clusters are much cheaper to operate and require fewer copies of duplicated data."
Delegation tokens play a critical part in Apache Hadoop security, and understanding their design and use is important for comprehending Hadoop's security model.
SubScript, a way to extend common programming languages aimed to ease event handling and concurrency. Typical application areas are GUI controllers, text processing applications and discrete event simulations. SubScript is based on a mathematical concurrency theory named Algebra of Communicating Processes (ACP).
ACP is a 30-year-old branch of mathematics, as solid as numeric algebra and as Boolean algebra. In fact, you can regard ACP as an extension to Boolean algebra with 'things that can happen'. These items are glued together with operations such alternative, sequential and parallel compositions. This way ACP combines the essence of grammar specification languages and notions of parallelism.
"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies.
Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF.
Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents.
HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
"Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing.
Saddle offers vectorized numerical calculations, automatic alignment of data along indices, robustness to missing (N/A) values, and facilities for I/O.
Saddle draws inspiration from several sources, among them the R programming language & statistical environment, the numpy and pandas Python libraries, and the Scala collections library."
"The akka-patterns project is a dumping ground for lessons learnt on a variety of Scala / Akka / Spray topics.
At the end of 5 months working on real world (commercial) projects, that were originally based on the akka-patterns architecture, Sam Halliday (@fommil) was asked to document the lessons learnt:
Milestone: Lessons Learnt
Pull Request: Lessons Learnt
This short document is a summary of the highlights from the pull request."
""Fluentd" is a OSS lightweight and flexible log collector. Fluentd receives logs as JSON streams, buffers them, and sends them to other systems like S3, MongoDB, Hadoop, or other Fluentds."
Ø The socket library that acts as a concurrency framework.
Ø Faster than TCP, for clustered products and supercomputing.
Ø Carries messages across inproc, IPC, TCP, and multicast.
Ø Connect N-to-N via fanout, pubsub, pipeline, request-reply.
Ø Asynch I/O for scalable multicore message-passing apps.
Ø Large and active open source community.
Ø 30+ languages including C, C++, Java, .NET, Python.
Ø Most OSes including Linux, Windows, OS X.
Ø LGPL free software with full commercial support from iMatix.
"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."