Skip to main content

Home/ Arquitectura?/ Group items tagged data-analysis

Rss Feed Group items tagged

Pablo Lalloni

The HDF Group - Why use HDF? - 0 views

  •  
    "HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
Pablo Lalloni

pandas - 0 views

  •  
    "pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."
Pablo Lalloni

Splunk Enterprise Product Tour - Machine Data Collection | Splunk - 1 views

  •  
    "Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."
Pablo Lalloni

nathanmarz/cascalog · GitHub - 0 views

  •  
    "Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."
Pablo Lalloni

kiama - A Scala library for language processing - Google Project Hosting - 0 views

  •  
    "Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."
Pablo Lalloni

CS276B Project Report: Streaming XPath Engine - 0 views

  •  
    "Our project (titled xstream)  concentrated on evaluation of XPath over XML streams. This research area contains multiple challenges resulting  from both the richness  of the language and the requirement of having only a single  pass over the data. We modified and extended one of the known algorithms, TurboXPath  [4], a tree-based IBM algorithm. We also  provide extensive comparative analysis between  TurboXPath and XSQ [5], currently the most advanced of  finite automata (FA)-based algorithms."
Pablo Lalloni

andypetrella/spark-notebook - 0 views

  •  
    "The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, http://spark-notebook.io."
Pablo Lalloni

InfluxDB - 0 views

  •  
    "An open-source distributed time series database with no external dependencies. InfluxDB is the new home for all of your metrics, events, and analytics."
1 - 9 of 9
Showing 20 items per page