"1. The Government has already made a commitment to Big Data by classifying it as one of the 'Eight Great Technologies' which will propel the UK to future growth and help it stay ahead in the global race. The 'Information Economy Strategy' reports on the increase in data being generated and the importance of new types of computing power in order to reap the economic value of the data.
2. This paper sets out to cover the following areas:
a) Defining Big Data
b) High-level trends in Big Data
c) Opportunities for Big Data applications
3. In developing this paper, a 'community of interest' has been established
comprising policy leads and analysts from across government with an interest in Big Data. This paper draws on their insights, insights from the private sector, academics, and the extensive open source literature on the Big Data topic."
There is a fundamental shift occurring in Big Data, from data at rest to data in motion. In this white paper, Dean Wampler explores the ecosystem that is emerging around Fast Data and provides handy diagrams and code samples to help you:
Asombroso: un paper de 1994 de investigadores de Sun en el cual concluyen que "RPC" está roto de varias maneras y que no se puede arreglar con ninguna implementación.
Ummm... ¿de qué año era CORBA? ¿de qué año era EJB?
"Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale."
"Our project (titled xstream)
concentrated on evaluation of XPath over XML streams.
This research area contains multiple challenges resulting
from both the richness of the language and the
requirement of having only a single pass over the data.
We modified and extended one of the known algorithms,
TurboXPath [4], a tree-based IBM algorithm. We also
provide extensive comparative analysis between
TurboXPath and XSQ [5], currently the most advanced of
finite automata (FA)-based algorithms."
"In this paper we propose the TurboXPath path processor, which accepts a language equivalent to a subset of the
for-let-where constructs of XQuery over a single document.
TurboXPath can be extended to provide full XQuery support
or used to augment federated database engines for efficient
handling of queries over XML data streams produced by external sources. Internally, TurboXPath uses a tree-shaped path
expression with multiple outputs to drive the execution. The
result of a query execution is a sequence of tuples of XML
fragments matching the output nodes. Based on a streamed
execution model, TurboXPath scales up to large documents
and has limited memory consumption for increased concurrency"
Assessing the effect of different programming languages and programming styles on programmer productivity is of critical interest. In his paper, Gilles Dubochet, describes how he investigated two aspects of programming style using eye movement tracking. He found that it is, on average, 30% faster to comprehend algorithms that use for-comprehensions and maps, as in Scala, rather than those with the iterative while-loops of Java.
"In this article, we discuss the three-way relationship between three
such desirable features - fairness, isolation, and throughput (FIT) - and argue that only two out of the
three of them can be achieved simultaneously."
"We present a novel streaming algorithm for evaluating XPath expressions that use backward axes
(parent and ancestor) and forward axes in a single document-order traversal of an XML document.
Other streaming XPath processors, such as YFilter, XTrie, and TurboXPath handle only forward axes.
We show through experiments that our algorithm significantly outperforms (by more than a factor of
two) a traditional non-streaming XPath engine. Furthermore, since our algorithm only retains relevant
portions of the input document in memory, it scales better than traditional XPath engines. It can process
large documents; we have successfully tested documents over 1GB in size. On the other hand, the
traditional XPath engine degrades considerably in performance for documents over 100 MB in size and
fails to complete for documents of size over 200 MB."