Skip to main content

Home/ Arquitectura?/ Group items tagged data

Rss Feed Group items tagged

Pablo Lalloni

The HDF Group - Why use HDF? - 0 views

  •  
    "HDF (Hierarchical Data Format) technologies are relevant when the data challenges being faced push the limits of what can be addressed by traditional database systems, XML documents, or in-house data formats. Leveraging the powerful HDF products and the expertise of The HDF Group, organizations realize substantial cost savings while solving challenges that seemed intractable using other data management technologies. Many HDF adopters have very large datasets, very fast access requirements, or very complex datasets. Others turn to HDF because it allows them to easily share data across a wide variety of computational platforms using applications written in different programming languages. Some use HDF to take advantage of the many open-source and commercial tools that understand HDF. Similar to XML documents, HDF files are self-describing and allow users to specify complex data relationships and dependencies. In contrast to XML documents, HDF files can contain binary data (in many representations) and allow direct access to parts of the file without first parsing the entire contents. HDF, not surprisingly, allows hierarchical data objects to be expressed in a very natural manner, in contrast to the tables of relational database. Whereas relational databases support tables, HDF supports n-dimensional datasets and each element in the dataset may itself be a complex object. Relational databases offer excellent support for queries based on field matching, but are not well-suited for sequentially processing all records in the database or for subsetting the data based on coordinate-style lookup."
Pablo Lalloni

Informe s/ BigData en el gobierno de UK - 1 views

  •  
    "1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data. 2. This paper sets out to cover the following areas: a) Defining Big Data b) High-level trends in Big Data c) Opportunities for Big Data applications 3. In developing this paper, a 'community of interest' has been established comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."
munyeco

BIG DATA APPLICATIONS Fast Data: Big Data Evolved - White Paper - 0 views

  •  
    There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:
Pablo Lalloni

Baratine | a distributed in-memory Java service platform - 0 views

  •  
    "Baratine is a new distributed in-memory Java service platform for building high performance web services that combine both data and logic in the same JVM. Say again? In Baratine, the data lives within the service and the service owns its own data. This means: the data is not owned by the database the data is not modified by another process the data is not separate and distinct from the service => The data sits right in the service in the same JVM, the same thread, and the same class instance."
Pablo Lalloni

Data.js - 1 views

  •  
    Data.js is a data representation framework for Javascript. It is being developed in the context of Substance, a web-based document authoring and publishing engine. It took some inspiration from various existing libraries such as the Google Visualization API or Underscore.js.  You can report bugs and discuss features on the GitHub issues page, on Freenode IRC in the #_substance chann el, post questions to the Google Group, or send tweets to @_substance. With Data.js you can: Model your domain data using a simple graph-based object model that can be serialized to JSON. Traverse your graph, including relationships using a simple API. Manipulate and query data on the client (browser) or on the server (Node.js) by using exactly the same API. 
Pablo Lalloni

Quest Data Connectors - Cloudera Support - 0 views

  •  
    "Quest Data Connector for Oracle and Hadoop is a freeware plug-in to Cloudera's Distribution including Apache Hadoop that allows for fast and scalable data transfer between Hadoop and Oracle. Attributes: Transfer data to and from Oracle up to 5 times faster than Sqoop alone. Can easily transfer data to and from Oracle that has no primary key or was not stored in primary key order. Reduces overhead on the Oracle instance: Upwards of 80% reduction in CPU consumption. Up to 95% reduction in IO time. Allows other Oracle workloads to simultaneously run seamlessly without disruption. SLA-driven commercial support available when used as a part of Cloudera Enterprise."
Pablo Lalloni

bandicoot - having fun with structured data - 0 views

  •  
    "Bandicoot is an open source programming system with a new set-based programming language, persistency capabilities, and run-time environment. The language is similar to general purpose programming languages where you write functions/methods and access data through variables. Though, in Bandicoot, you always manipulate data in sets using a small set-based algebra (the relational algebra)." "Here are the main features:   - functions are automatically exposed via HTTP using CSV for data, e.g. /List, /Append  - supports persistency via global variables (with transactions and ACID)  - can run on multiple computers to scale up the read throughput  - built in operators from the relational algebra with a simple syntax, e.g. "+" (union), "-" (minus)  - small binary ~100KB"
Pablo Lalloni

Data Connector for Oracle and Hadoop - Unstructured and Structured Data Transfers - 0 views

  •  
    "Data Connector for Oracle and Hadoop is the fastest, most scalable way to transfer data between Oracle and Hadoop. "
Pablo Lalloni

Graphite - Scalable Realtime Graphing - Graphite - 0 views

  •  
    What is Graphite? Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces. Who should use Graphite? Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.
Pablo Lalloni

Splunk Enterprise Product Tour - Machine Data Collection | Splunk - 1 views

  •  
    "Splunk Enterprise is the industry-leading platform for operational intelligence. Collect and index any machine data from virtually any source in real time. Search, monitor, analyze and visualize your data to gain new insights and intelligence. Index everything for deep visibility, forensics and troubleshooting. Work smarter as you and your team share searches and add knowledge specific to your organization. Create ad hoc reports to identify trends or prove compliance controls. Create interactive dashboards to monitor for security incidents, service levels and other key performance metrics. Analyze user transactions, customer behavior, machine behavior, security threats and fraudulent activity, all in real time."
Pablo Lalloni

saddle/saddle · GitHub - 0 views

  •  
    "Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing. Saddle offers vectorized numerical calculations, automatic alignment of data along indices, robustness to missing (N/A) values, and facilities for I/O. Saddle draws inspiration from several sources, among them the R programming language & statistical environment, the numpy and pandas Python libraries, and the Scala collections library."
Pablo Lalloni

Big Data Poster - Data Science Central - 0 views

  •  
    "A great resource (PDF document) about big data, originally posted on CTOvision.com."
Pablo Lalloni

Data Modeling for NoSQL - 0 views

  •  
    "Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB."
Pablo Lalloni

pachyderm/pachyderm - 0 views

  •  
    "Pachyderm is a complete data analytics solution that lets you efficiently store and analyze your data using containers. We offer the scalability and broad functionality of Hadoop, with the ease of use of Docker."
Pablo Lalloni

XSQ: A Streaming XPath Engine - 0 views

  •  
    "XSQ evaluates XPath queries over streaming XML data. That is, it makes only pass over the data, in an order determined by the data source"
Pablo Lalloni

Rationale - Datomic - 0 views

  •  
    "Datomic is a distributed database designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. It does this by: Bringing declarative data manipulation into the application, and the data with it Getting time, process and perception right Process (writes) require coordination Perception (reads) require none The past doesn't change Leveraging immutability, and a sound model of state Datomic has: ACID Transactions Joins A sound data model A logical query language - Datalog Thus, Datomic avoids the compromises and losses of many NoSQL solutions. In addition, it offers flexibility and power over the traditional model in supporting: Hierarchy Multi-valued attributes Minimal schema Reliable operation on unreliable, ephemeral cloud instances Time Datomic avoids manual caching and replication, complex configuration, sharding (automatic or manual), logging, locking, latching and disk management of traditional servers."
Pablo Lalloni

Shark - Lightning Fast Data Warehouse System - 0 views

  •  
    "Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 100 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions."
Pablo Lalloni

Ferry | Big Data Development Environment Using Docker - 0 views

  •  
    "Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI."
Sebastián Zaffarano

Apache Spark: 100 terabytes (TB) of data sorted in 23 minutes | Opensource.com - 1 views

  •  
    "In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used Apache Spark on 207 EC2 virtual machines and sorted 100 TB of data in 23 minutes."
1 - 20 of 150 Next › Last »
Showing 20 items per page