Chris Harrison is a Ph.D. student in the Human-Computer Interaction Institute at Carnegie Mellon University. This site is used as a repository for some of his many projects. These hail from a variety of fields, including computer science, information visualization, engineering, history and HCI.
Many Eyes is an IBM site with a goal of making data visualization algorithms and data sets widely available. It is a fantastic place to spend a few hours.
Environments
SIMILE is focused on developing robust, open source tools that empower users to access, manage, visualize and reuse digital assets. Learn more about the SIMILE project.
KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.