Group items tagged data - Arquitectura?

The HDF Group - Why use HDF? - 0 views

www.hdfgroup.org/why_hdf

development programming data-storage data-analysis data-manipulation library

shared by Pablo Lalloni on 06 Apr 13 - No Cached

Pablo Lalloni on 06 Apr 13

"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."

<div class="cArrow"> </div><div class="cContentInner">"HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."</div>

...

Cancel

Informe s/ BigData en el gobierno de UK - 1 views

www.gov.uk/...nologies_Big_Data_report_1.pdf

government big-data uk trends opportunities case-study policy

shared by Pablo Lalloni on 09 Jan 15 - No Cached

Pablo Lalloni on 09 Jan 15

"1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."

<div class="cArrow"> </div><div class="cContentInner">"1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."</div>

...

Cancel

BIG DATA APPLICATIONS Fast Data: Big Data Evolved - White Paper - 0 views

info.typesafe.com/ta-Big-Data-Evolved-WP_LP.html

white paper

shared by munyeco on 17 Sep 15 - No Cached

munyeco on 17 Sep 15

There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:

<div class="cArrow"> </div><div class="cContentInner">There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you: </div>

...

Cancel

Baratine | a distributed in-memory Java service platform - 0 views

baratine.io

distributed java development programming library service distributed-computing

shared by Pablo Lalloni on 11 Sep 14 - No Cached

Pablo Lalloni on 11 Sep 14

"Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."

<div class="cArrow"> </div><div class="cContentInner">"Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."</div>

...

Cancel

Data.js - 1 views

substance.io/data-js

javascript development programming library web-development

shared by Pablo Lalloni on 18 Aug 12 - No Cached

Pablo Lalloni on 18 Aug 12

Data.js is a data representation framework for Javascript. It is being developed in the context of Substance, a web-based document authoring and publishing engine. It took some inspiration from various existing libraries such as the Google Visualization API or Underscore.js. You can report bugs and discuss features on the GitHub issues page, on Freenode IRC in the #_substance chann el, post questions to the Google Group, or send tweets to @_substance. With Data.js you can: Model your domain data using a simple graph-based object model that can be serialized to JSON. Traverse your graph, including relationships using a simple API. Manipulate and query data on the client (browser) or on the server (Node.js) by using exactly the same API.

<div class="cArrow"> </div><div class="cContentInner">Data.js is a data representation framework for Javascript. It is being developed in the context of Substance, a web-based document authoring and publishing engine. It took some inspiration from various existing libraries such as the Google Visualization API or Underscore.js. You can report bugs and discuss features on the GitHub issues page, on Freenode IRC in the #_substance chann el, post questions to the Google Group, or send tweets to @_substance. With Data.js you can: Model your domain data using a simple graph-based object model that can be serialized to JSON. Traverse your graph, including relationships using a simple API. Manipulate and query data on the client (browser) or on the server (Node.js) by using exactly the same API. </div>

...

Cancel

Quest Data Connectors - Cloudera Support - 0 views

ccp.cloudera.com/...Quest+Data+Connectors

apache hadoop connectors development tools operations cloud-computing oracle infrastructure

shared by Pablo Lalloni on 10 Apr 13 - No Cached

Pablo Lalloni on 10 Apr 13

"Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."

<div class="cArrow"> </div><div class="cContentInner">"Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."</div>

...

Cancel

bandicoot - having fun with structured data - 0 views

bandilab.org

programming data sets relational development tools data-services persistence languages

shared by Pablo Lalloni on 19 Mar 13 - No Cached

Pablo Lalloni on 19 Mar 13

"Bandicoot is an open source programming system with a new set-based programming language, persistency capabilities, and run-time environment. The language is similar to general purpose programming languages where you write functions/methods and access data through variables. Though, in Bandicoot, you always manipulate data in sets using a small set-based algebra (the relational algebra)." "Here are the main features: - functions are automatically exposed via HTTP using CSV for data, e.g. /List, /Append - supports persistency via global variables (with transactions and ACID) - can run on multiple computers to scale up the read throughput - built in operators from the relational algebra with a simple syntax, e.g. "+" (union), "-" (minus) - small binary ~100KB"

<div class="cArrow"> </div><div class="cContentInner">"Bandicoot is an open source programming system with a new set-based programming language, persistency capabilities, and run-time environment. The language is similar to general purpose programming languages where you write functions/methods and access data through variables. Though, in Bandicoot, you always manipulate data in sets using a small set-based algebra (the relational algebra)." "Here are the main features: - functions are automatically exposed via HTTP using CSV for data, e.g. /List, /Append - supports persistency via global variables (with transactions and ACID) - can run on multiple computers to scale up the read throughput - built in operators from the relational algebra with a simple syntax, e.g. "+" (union), "-" (minus) - small binary ~100KB"</div>

...

Cancel

Data Connector for Oracle and Hadoop - Unstructured and Structured Data Transfers - 0 views

www.quest.com/onnector-for-oracle-and-hadoop

data connector oracle hadoop cloud-computing tools

shared by Pablo Lalloni on 18 Mar 13 - No Cached

Pablo Lalloni on 18 Mar 13

"Data Connector for Oracle and Hadoop is the fastest, most scalable way to transfer data between Oracle and Hadoop. "

<div class="cArrow"> </div><div class="cContentInner">"Data Connector for Oracle and Hadoop is the fastest, most scalable way to transfer data between Oracle and Hadoop. "</div>

...

Cancel

Graphite - Scalable Realtime Graphing - Graphite - 0 views

graphite.wikidot.com/start

development tools operations performance monitoring instrumentation infrastructure

shared by Pablo Lalloni on 21 May 13 - Cached

Pablo Lalloni on 21 May 13

What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.

<div class="cArrow"> </div><div class="cContentInner">What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.</div>

...

Cancel

Splunk Enterprise Product Tour - Machine Data Collection | Splunk - 1 views

www.splunk.com/...SP-CAAAAGV

operations infrastructure data-visualization data-analysis data-collection monitoring auditing

shared by Pablo Lalloni on 20 Nov 14 - No Cached

Pablo Lalloni on 20 Nov 14

"Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."

<div class="cArrow"> </div><div class="cContentInner">"Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."</div>

...

Cancel

saddle/saddle · GitHub - 0 views

github.com/saddle

scala development library numerics data-manipulation programming

shared by Pablo Lalloni on 06 Apr 13 - No Cached

Pablo Lalloni on 06 Apr 13

"Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing. Saddle offers vectorized numerical calculations, automatic alignment of data along indices, robustness to missing (N/A) values, and facilities for I/O. Saddle draws inspiration from several sources, among them the R programming language & statistical environment, the numpy and pandas Python libraries, and the Scala collections library."

<div class="cArrow"> </div><div class="cContentInner">"Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing. Saddle offers vectorized numerical calculations, automatic alignment of data along indices, robustness to missing (N/A) values, and facilities for I/O. Saddle draws inspiration from several sources, among them the R programming language & statistical environment, the numpy and pandas Python libraries, and the Scala collections library."</div>

...

Cancel

Falcor: One Model Everywhere - 0 views

netflix.github.io/falcor

javascript json data development programming web-development

shared by Pablo Lalloni on 24 Oct 16 - No Cached

Big Data Poster - Data Science Central - 0 views

www.datasciencecentral.com/...big-data-poster

big-data development architecture data-science operations cloud-computing infrastructure

shared by Pablo Lalloni on 05 Oct 14 - No Cached

Pablo Lalloni on 05 Oct 14

"A great resource (PDF document) about big data, originally posted on CTOvision.com."

<div class="cArrow"> </div><div class="cContentInner">"A great resource (PDF document) about big data, originally posted on CTOvision.com."</div>

...

Cancel

Data Modeling for NoSQL - 0 views

www.infoq.com/...data-modeling-mongodb

data modeling nosql mongodb bigdata big-data development

shared by Pablo Lalloni on 14 May 13 - No Cached

Pablo Lalloni on 14 May 13

"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."

<div class="cArrow"> </div><div class="cContentInner">"Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."</div>

...

Cancel

pachyderm/pachyderm - 0 views

github.com/...pachyderm

development big-data cloud-computing infrastructure docker data-analytics

shared by Pablo Lalloni on 14 Oct 15 - No Cached

Pablo Lalloni on 14 Oct 15

"Pachyderm is a complete data analytics solution that lets you efficiently store and analyze your data using containers. We offer the scalability and broad functionality of Hadoop, with the ease of use of Docker."

<div class="cArrow"> </div><div class="cContentInner">"Pachyderm is a complete data analytics solution that lets you efficiently store and analyze your data using containers. We offer the scalability and broad functionality of Hadoop, with the ease of use of Docker."</div>

...

Cancel

XSQ: A Streaming XPath Engine - 0 views

www.cs.umd.edu/xsq

streaming development programming xml xpath sax xsq

shared by Pablo Lalloni on 06 Sep 12 - No Cached

Pablo Lalloni on 06 Sep 12

"XSQ evaluates XPath queries over streaming XML data. That is, it makes only pass over the data, in an order determined by the data source"

<div class="cArrow"> </div><div class="cContentInner">"XSQ evaluates XPath queries over streaming XML data. That is, it makes only pass over the data, in an order determined by the data source"</div>

...

Cancel

Rationale - Datomic - 0 views

www.datomic.com/rationale.html

development cloud-computing database distributed-computing

shared by Pablo Lalloni on 01 Aug 13 - No Cached

Pablo Lalloni on 01 Aug 13

"Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."

<div class="cArrow"> </div><div class="cContentInner">"Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."</div>

...

Cancel

Shark - Lightning Fast Data Warehouse System - 0 views

shark.cs.berkeley.edu

hive spark bigdata hadoop warehouse data development tools cloud-computing distributed-computing infrastructure

shared by Pablo Lalloni on 04 Jun 13 - No Cached

Pablo Lalloni on 04 Jun 13

"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."

<div class="cArrow"> </div><div class="cContentInner">"Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."</div>

...

Cancel

Ferry | Big Data Development Environment Using Docker - 0 views

ferry.opencore.io/latest

development cloud-computing infrastructure devops big-data

shared by Pablo Lalloni on 05 Sep 14 - No Cached

Pablo Lalloni on 05 Sep 14

"Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI."

<div class="cArrow"> </div><div class="cContentInner">"Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI."</div>

...

Cancel

Apache Spark: 100 terabytes (TB) of data sorted in 23 minutes | Opensource.com - 1 views

opensource.com/...apache-spark-new-world-record

spark data

shared by Sebastián Zaffarano on 20 Jan 15 - No Cached

Pablo Lalloni liked it

Sebastián Zaffarano on 20 Jan 15

"In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used Apache Spark on 207 EC2 virtual machines and sorted 100 TB of data in 23 minutes."

<div class="cArrow"> </div><div class="cContentInner">"In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used Apache Spark on 207 EC2 virtual machines and sorted 100 TB of data in 23 minutes."</div>

...

Cancel

Group items tagged