"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."
"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."
"Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."
"Our project (titled xstream)
concentrated on evaluation of XPath over XML streams.
This research area contains multiple challenges resulting
from both the richness of the language and the
requirement of having only a single pass over the data.
We modified and extended one of the known algorithms,
TurboXPath [4], a tree-based IBM algorithm. We also
provide extensive comparative analysis between
TurboXPath and XSQ [5], currently the most advanced of
finite automata (FA)-based algorithms."
"In Forrester's 40-criteria evaluation of application programming interface (API) management solutions,
we identified the 11 most significant software providers in the category - 3scale, Apigee, Axway, CA
Technologies, IBM, Informatica, Intel Services, MuleSoft, SOA Software, Tibco Software, and WSO2 - and
researched, analyzed, and evaluated them. This report details our findings about how well each vendor's
products fulfill our criteria and where they stand in relation to each other. This analysis, combined with
three buying guides that highlight key make-or-break decision factors, will help AD&D professionals select
the right partner for their API management needs."
"The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more.
This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner.
The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext.
You should also check the website, http://spark-notebook.io."
"Since earlier this year, the Performance Engineering Group at Red Hat has run huge amounts of microbenchmarks, benchmarks and application workloads in Docker containers."
"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies.
Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF.
Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents.
HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
"An open-source distributed time series database with no external dependencies.
InfluxDB is the new home for all of your metrics, events, and analytics."
"Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."