With the above design, it takes six SQL Join operations to
access and display the information about a single user. This makes
rendering the profile page a fairly database intensive operation which is
compounded by the fact that profile pages are the most popular pages on social
networking sites.
Database denormalization is the kind of performance optimization that should be
carried out as a last resort after trying things like creating database indexes,
using SQL views
and implementing application specific in-memory
caching. However if you hit massive scale and are dealing with millions of
queries a day across hundreds of millions to billions of records or have decided
to go with database partitioning/sharding then you will likely end up resorting
to denormalization
De-Normalization is OK if you are'nt going to update
Denormalization means that you you are now likely to deal with data
inconsistencies because you are storing redundant copies of data and may not be
able to update all copies of a column value simultaneously when it is
changed for a variety of reasons. Having tools in your infrastructure to support
fixing up data of this sort then become very important.
In a lazy language you have no guarantee that the first line will be executed before the second! This means we can't do IO, can't
use native functions in any meaningful way (because they need to be called in order since they depend on side effects), and can't
interact with the outside world! If we were to introduce
primitives that allow ordered code execution we'd lose the benefits of reasoning about our code mathematically
continuations, monads, and uniqueness typing.
Alonzo Church developed a formal system called lambda calculus. The
system was essentially a programming language for one of these imaginary machines
Functions that operate on other functions (accept them
as arguments) are called higher order functions.
currying is used to reduce the number of arguments
only executes code when it's required
A lazy compiler thinks of functional code exactly as mathematicians
think of an algebra expression - it can cancel things out and completely prevent execution, rearrange pieces of code for higher efficiency,
even arrange code in a way that reduces errors, all guaranteeing optimizations won't break the code.
John McCarthy (also a Princeton graduate) developed interest in Alonzo
Church's work. In 1958 he unveiled a List Processing language (Lisp)
Lisp machine - effectively a native hardware implementation of Alonzo's lambda calculus!
it was proved that lambda calculus is equivalent to a Turing machine.
It turns out that functional programs can keep state, except they don't use variables to do it. They use functions
instead. The state is kept in function parameters, on the stack. If you want to keep state for a while and every now
and then modify it, you write a recursive function
Erlang engineers have been upgrading
live systems without stopping them for years.
Erlang systems are not scalable and reliable. Java systems are. Erlang
systems are simply rock solid
Ericsson designed a functional language called Erlang for use in its highly tolerant
and scalable telecommunication switches.
Continuation
Passing Style or CPS
A "continuation" is a parameter we may choose to pass to our function that
specifies where the function should return.
continuations are a generalization of functions.
CPS version needs no stack! No function ever "returns" in the traditional sense, it just calls another
function with the result instead. We don't need to push function arguments on the stack with every call and then pop them back, we can
simply store them in some block of memory and use a jump instruction instead. We'll never need the original arguments - they'll never
be used again since no function ever returns!
What does the stack contain? Simply the arguments, and a
pointer to memory where the function should return. Do you see a light bulb? The stack simply contains continuation information! The
pointer to the return instruction in the stack is essentially the same thing as the function to call in CPS programs!
A continuation and a pointer to the return instruction in the stack are really the same thing, only a continuation
is passed explicitly, so that it doesn't need to be the same place where the function was called from
When we get a current continuation and store it somewhere, we end up
storing the current state of our program - freezing it in time. This is similar to an OS putting itself into hibernation. A continuation
object contains the information necessary to restart the program from the point where the continuation object was acquired.
Spring has a layered
architecture, meaning that you can choose to use just about any part of it in
isolation, yet its architecture is internally consistent
it's easy to introduce Spring incrementally into
existing projects
the limits you need to put on yourself when storing a billion rows in a
database, and they included: no joins, no transactions, no stored procedures,
and no triggers.
Joshua has similar suggestions from his experience building del.icio.us: no
joins, no transactions, no
autoincrement
BigTable, Google's column-based store with no transactions
What's the point in designing tables for a webapp when an RDF-backed store will manage the data for you and RDF queries will come back as tabular data anyway?
designing and maintaining yet another relational schema for yet another webapp - doing so is starting to make as much sense as designing my own filesystem or TP monitor.
RDF + SPARQL + distributed data sources from around the web?
reason that rails and django are so productive; they're highly optimised for domain models. Raw RDF doesn't really do domains like that; you have to expend effort distilling triples into 'things';