Skip to main content

Home/ Groups/ VirgoLab
Roger Chen

Datawocky: How Google Measures Search Quality - 0 views

  • The heart of the matter is this: how do you measure the quality of search results
  • The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
  • here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
  • ...1 more annotation...
  • Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
Roger Chen

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

  • Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
  • The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
  • It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
  • ...2 more annotations...
  • This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
  • My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
Roger Chen

Collaborative Filtering: Lifeblood of The Social Web - ReadWriteWeb - 0 views

  • This, of course, relies on the fact that people's interests, preferences, and ideologies don't change too drastically over time.
  • A filtering system with preference-based recommendations, in essence, is the future of the social web.
  • The best implementations of a Collaborative Filtering (CF) system along with a preference based recommendation/discovery system that I have seen are always on music streaming and discovery sites.
  • ...3 more annotations...
  • As you can see from above, it is certainly possible to have a good collaborative filtering system without a recommendation engine
  • Collaborative Filtering (Wikipedia definition) is a mechanism used to filter large amounts of information by spreading the process of filtering among a large group of people.
  • The important thing, one that not many social sites realize, is that a (CF) system that doesn't automatically match content to your preferences, is inherently flawed. The reason for this is simple: Unless you can achieve perfect diversity and independence of opinion, one point of view will always dominate another on a particular platform. The dominant point of view on the social web is a left-leaning one, and without the ability to get the most appropriate pieces of content to the people that care most about them, the right-wing point of view gets buried almost every time.
Roger Chen

The End Of The Scientific Method… Wha….? « Life as a Physicist - 0 views

  • His basic thesis is that when you have so much data you can map out every connection, every correlation, then the  data becomes the model. No need to derive or understand what is actually happening — you have so much data that you can already make all the predictions that a model would let you do in the first place. In short — you no longer need to develop a theory or hypothesis - just map the data!
  • First, in order for this to work you need to have millions and millions and millions of data points. You need, basically, ever single outcome possible, with all possible other factors. Huge amounts of data. That does not apply to all branches of science.
  • The second problem with this approach is you will never discover anything new. The problem with new things is there is no data on them!
  • ...3 more annotations...
  • Correlations are a way of catching a scientist’s attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress
  • Anderson is right — we are entering a new age where the ability to mine these large amounts of data are going to open up whole new levels of understanding
  • This is a new tool, and it will open up all sorts of doors for us. But the end of the scientific method? No — because that implies an end of discovery. And end of new things.
Roger Chen

Evolving Thoughts - Basic Concepts in Science: A list - 0 views

  •  
    This is a list of the Basic Concepts posts being put up by Science Bloggers and others. It will be updated and put to the top when new entries are published.
Roger Chen

Useful Stock Phrases for Your Business Emails - 0 views

  •  
    Very useful collection of "phrases" :-)
Roger Chen

Data & Knowledge Engineering (0169-023X) - ACM Guide to Computing Literature - 0 views

  •  
    Data & Knowledge Engineering (0169-023X)
Roger Chen

Expert Systems with Applications: An International Journal (0957-4174) - 0 views

  •  
    ACM Portal - Expert Systems with Applications: An International Journal
Roger Chen

Social Network Evolution - Sean Percival's Blog - 0 views

  • Some of us run to each new service, play around for a bit and then quickly abandon it.
    • Roger Chen
       
      This aplles to many applications. LOL.
« First ‹ Previous 141 - 160 of 352 Next › Last »
Showing 20 items per page