Best content in VirgoLab | Diigo

Still confused by Data Mining? (I know I am) - 0 views

blogs.msdn.com/...y-data-mining-i-know-i-am.aspx

data mining reference

shared by Roger Chen on 01 Jul 08 - Cached

Datawocky: How Google Measures Search Quality - 0 views

anand.typepad.com/...e-measures-search-quality.html

data mining google machine learning search

shared by Roger Chen on 11 Jun 08 - Cached

The heart of the matter is this: how do you measure the quality of search results
...

Cancel
The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
...

Cancel
here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
...

Cancel
...1 more annotation...
Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
...

Cancel

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

anand.typepad.com/...an-machine-learned-models.html

data mining google

shared by Roger Chen on 11 Jun 08 - Cached

Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
...

Cancel
The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
...

Cancel
It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
...

Cancel
...2 more annotations...
This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
...

Cancel
My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
...

Cancel

Collaborative Filtering: Lifeblood of The Social Web - ReadWriteWeb - 0 views

www.readwriteweb.com/...ative_filtering_social_web.php

internetwatch recommender

shared by Roger Chen on 01 Jul 08 - Cached

This, of course, relies on the fact that people's interests, preferences, and ideologies don't change too drastically over time.
...

Cancel
A filtering system with preference-based recommendations, in essence, is the future of the social web.
...

Cancel
The best implementations of a Collaborative Filtering (CF) system along with a preference based recommendation/discovery system that I have seen are always on music streaming and discovery sites.
...

Cancel
...3 more annotations...
As you can see from above, it is certainly possible to have a good collaborative filtering system without a recommendation engine
...

Cancel
Collaborative Filtering (Wikipedia definition) is a mechanism used to filter large amounts of information by spreading the process of filtering among a large group of people.
...

Cancel
The important thing, one that not many social sites realize, is that a (CF) system that doesn't automatically match content to your preferences, is inherently flawed. The reason for this is simple: Unless you can achieve perfect diversity and independence of opinion, one point of view will always dominate another on a particular platform. The dominant point of view on the social web is a left-leaning one, and without the ability to get the most appropriate pieces of content to the people that care most about them, the right-wing point of view gets buried almost every time.
...

Cancel

The End Of The Scientific Method… Wha….? « Life as a Physicist - 0 views

gordonwatts.wordpress.com/...d-of-the-scientific-method-wha

data mining thinking

shared by Roger Chen on 27 Jun 08 - Cached

His basic thesis is that when you have so much data you can map out every connection, every correlation, then the  data becomes the model. No need to derive or understand what is actually happening — you have so much data that you can already make all the predictions that a model would let you do in the first place. In short — you no longer need to develop a theory or hypothesis - just map the data!
...

Cancel
First, in order for this to work you need to have millions and millions and millions of data points. You need, basically, ever single outcome possible, with all possible other factors. Huge amounts of data. That does not apply to all branches of science.
...

Cancel
The second problem with this approach is you will never discover anything new. The problem with new things is there is no data on them!
...

Cancel
...3 more annotations...
Correlations are a way of catching a scientist’s attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress
...

Cancel
Anderson is right — we are entering a new age where the ability to mine these large amounts of data are going to open up whole new levels of understanding
...

Cancel
This is a new tool, and it will open up all sorts of doors for us. But the end of the scientific method? No — because that implies an end of discovery. And end of new things.
...

Cancel

Evolving Thoughts - Basic Concepts in Science: A list - 0 views

scienceblogs.com/...c_concepts_in_science_a_li.php

reference science

shared by Roger Chen on 01 Jul 08 - Cached

Roger Chen on 01 Jul 08

This is a list of the Basic Concepts posts being put up by Science Bloggers and others. It will be updated and put to the top when new entries are published.

<div class="cArrow"> </div><div class="cContentInner">This is a list of the Basic Concepts posts being put up by Science Bloggers and others. It will be updated and put to the top when new entries are published.</div>

...

Cancel