Group items tagged data-analysis - Arquitectura?

The HDF Group - Why use HDF? - 0 views

www.hdfgroup.org/why_hdf

development programming data-storage data-analysis data-manipulation library

shared by Pablo Lalloni on 06 Apr 13 - No Cached

Pablo Lalloni on 06 Apr 13

"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."

<div class="cArrow"> </div><div class="cContentInner">"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."</div>

...

Cancel

pandas - 0 views

pandas.pydata.org

python library statistics data analysis programming development

shared by Pablo Lalloni on 30 Sep 13 - No Cached

Pablo Lalloni on 30 Sep 13

"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."

<div class="cArrow"> </div><div class="cContentInner">"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."</div>

...

Cancel

Splunk Enterprise Product Tour - Machine Data Collection | Splunk - 1 views

www.splunk.com/...SP-CAAAAGV

operations infrastructure data-visualization data-analysis data-collection monitoring auditing

shared by Pablo Lalloni on 20 Nov 14 - No Cached

Pablo Lalloni on 20 Nov 14

"Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."

<div class="cArrow"> </div><div class="cContentInner">"Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."</div>

...

Cancel

nathanmarz/cascalog · GitHub - 0 views

github.com/cascalog

distributed-computing hadoop library programming development cloud-computing java clojure jvm

shared by Pablo Lalloni on 04 Apr 13 - No Cached

Pablo Lalloni on 04 Apr 13

"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."

<div class="cArrow"> </div><div class="cContentInner">"Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools."</div>

...

Cancel

Big Data Exploration, Visualization, Analytics - 0 views

www.zoomdata.com

data-visualization big-data real-time data-analysis tools spark hadoop elasticsearch mongodb oracle

shared by Pablo Lalloni on 11 Apr 15 - No Cached

kiama - A Scala library for language processing - Google Project Hosting - 0 views

code.google.com/kiama

scala library language-processing programming development jvm

shared by Pablo Lalloni on 21 Aug 12 - No Cached

Pablo Lalloni on 21 Aug 12

"Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."

<div class="cArrow"> </div><div class="cContentInner">"Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing."</div>

...

Cancel

CS276B Project Report: Streaming XPath Engine - 0 views

www-cs-students.stanford.edu/...xpath.pdf

development programming xml xpath sax papers turboxpath xsq

shared by Pablo Lalloni on 06 Sep 12 -

Cached

Pablo Lalloni on 06 Sep 12

"Our project (titled xstream) concentrated on evaluation of XPath over XML streams. This research area contains multiple challenges resulting from both the richness of the language and the requirement of having only a single pass over the data. We modified and extended one of the known algorithms, TurboXPath [4], a tree-based IBM algorithm. We also provide extensive comparative analysis between TurboXPath and XSQ [5], currently the most advanced of finite automata (FA)-based algorithms."

<div class="cArrow"> </div><div class="cContentInner">"Our project (titled xstream) concentrated on evaluation of XPath over XML streams. This research area contains multiple challenges resulting from both the richness of the language and the requirement of having only a single pass over the data. We modified and extended one of the known algorithms, TurboXPath [4], a tree-based IBM algorithm. We also provide extensive comparative analysis between TurboXPath and XSQ [5], currently the most advanced of finite automata (FA)-based algorithms."</div>

...

Cancel

andypetrella/spark-notebook - 0 views

github.com/...spark-notebook

development data-science bigdata spark tools

shared by Pablo Lalloni on 05 Oct 15 - No Cached

Pablo Lalloni on 05 Oct 15

"The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, http://spark-notebook.io."

<div class="cArrow"> </div><div class="cContentInner">"The main intent of this tool is to create reproducible analysis using Scala, Apache Spark and more. This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner. The usage of Spark comes out of the box, and is simply enabled by the implicit variable named sparkContext. You should also check the website, <a href="http://spark-notebook.io" rel="nofollow" target="_blank">http://spark-notebook.io</a>."</div>

...

Cancel

InfluxDB - 0 views

influxdb.com/index.html

development infrastructure metrics tools analytics data-analysis time-series events

shared by Pablo Lalloni on 19 Jul 15 - No Cached

Pablo Lalloni on 19 Jul 15

"An open-source distributed time series database with no external dependencies. InfluxDB is the new home for all of your metrics, events, and analytics."

<div class="cArrow"> </div><div class="cContentInner">"An open-source distributed time series database with no external dependencies. InfluxDB is the new home for all of your metrics, events, and analytics."</div>

...

Cancel

Group items tagged