Group items tagged algorithm - VirgoLab

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

anand.typepad.com/...an-machine-learned-models.html

data mining google

shared by Roger Chen on 11 Jun 08 - Cached

Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
...

Cancel
The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
...

Cancel
It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
...

Cancel
...2 more annotations...
This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
...

Cancel
My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
...

Cancel

PageRank in academic publishing « Peter Rohde's Blog - 0 views

peterrohde.wordpress.com/...agerank-in-academic-publishing

academia search

shared by Roger Chen on 02 May 08 - Cached

The standard measure scientists use to judge the importance of scientific papers is a simple citation count. That is, how many other papers cite the paper in question? While this measure has its merits, it has one fundamental flaw - not all citations are equal.
...

Cancel
Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals
...

Cancel
To represent the web we use a directed graph, where the edges carry a direction.
...

Cancel
...4 more annotations...
The goal of the PageRank algorithm is two-fold. We wish to construct a measure of relevance that, first, is related to how many incoming links a site has, and second, what the importance of the source of those links was.
...

Cancel
Well scientific papers can be mapped to a graph in a similar way to web-sites. Specifically, vertices in the graph would represent papers, and edges citations. The PageRank algorithm can be applied out-of-the-box.
...

Cancel
First of all, one could discount self-citations from the index
...

Cancel
A second variation that one might try is to add a time bias when calculating the index, such that links from more recent papers carry more weight than from older papers.
...

Cancel

Roger Chen on 02 May 08

Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals.

<div class="cArrow"> </div><div class="cContentInner">Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals.</div>

...

Cancel

Genetic Algorithm, Grid Computing, and Visualization Techniques - 0 views

atomai.blogspot.com/...orithm-grid-computing-and.html

data mining research visualization

shared by Roger Chen on 28 Jun 08 - Cached

Pyflix - Trac - 0 views

pyflix.python-hosting.com

netflix python

shared by Roger Chen on 29 Jul 08 - Cached

Roger Chen on 29 Jul 08

Pyflix is a small package written in Python that provides an easy entry point for getting up and running in the Netflix Prize competition. It combines an efficient storage scheme with an intuitive high-level API that allows contestants to focus on the real problem, the recommendation system algorithm. To get started with Pyflix, keep reading.

<div class="cArrow"> </div><div class="cContentInner">Pyflix is a small package written in Python that provides an easy entry point for getting up and running in the Netflix Prize competition. It combines an efficient storage scheme with an intuitive high-level API that allows contestants to focus on the real problem, the recommendation system algorithm. To get started with Pyflix, keep reading. </div>

...

Cancel

Slope One - Wikipedia, the free encyclopedia - 0 views

en.wikipedia.org/Slope_One

recommender

shared by Roger Chen on 26 Jun 08 - Cached

Roger Chen on 26 Jun 08

Slope One is a family of algorithms used for Collaborative filtering introduced in Slope One Predictors for Online Rating-Based Collaborative Filtering by Daniel Lemire and Anna Maclachlan. Arguably, it is the simplest form of non-trivial item-based collaborative filtering based on ratings

<div class="cArrow"> </div><div class="cContentInner">Slope One is a family of algorithms used for Collaborative filtering introduced in Slope One Predictors for Online Rating-Based Collaborative Filtering by Daniel Lemire and Anna Maclachlan. Arguably, it is the simplest form of non-trivial item-based collaborative filtering based on ratings</div>

...

Cancel

Collaborative filtering made easy - Slope One in Python - 0 views

www.serpentine.com/...laborative-filtering-made-easy

python recommender

shared by Roger Chen on 20 Jul 08 - Cached

Roger Chen on 20 Jul 08

An implementaion of algorithm "Slope One" introduced by Daniel Lemire and Anna Maclachlan. The code were written in Python with excellent explanation.

<div class="cArrow"> </div><div class="cContentInner">An implementaion of algorithm "Slope One" introduced by Daniel Lemire and Anna Maclachlan. The code were written in Python with excellent explanation. </div>

...

Cancel

Many Eyes - 0 views

services.alphaworks.ibm.com/...home

visualization

shared by Roger Chen on 25 Jun 08 - Cached

Roger Chen on 25 Jun 08

Many Eyes is an IBM site with a goal of making data visualization algorithms and data sets widely available. It is a fantastic place to spend a few hours.

<div class="cArrow"> </div><div class="cContentInner">Many Eyes is an IBM site with a goal of making data visualization algorithms and data sets widely available. It is a fantastic place to spend a few hours.</div>

...

Cancel

Why the cloud cannot obscure the scientific method - 0 views

arstechnica.com/...ure-the-scientific-method.html

data mining thinking

shared by Roger Chen on 26 Jun 08 - Cached

Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn't have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance.
...

Cancel
Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.
...

Cancel
Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications.
...

Cancel
...1 more annotation...
without the testable predictions made by the theory, we'll never be able to tell how precisely it is wrong
...

Cancel

Roger Chen on 26 Jun 08

This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

<div class="cArrow"> </div><div class="cContentInner">This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory" rel="nofollow" target="_blank">http://www.wired.com/science/discoveries/magazine/16-07/pb_theory</a></div>

...

Cancel

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

www.wired.com/...pb_theory

data mining google statistics thinking

shared by Roger Chen on 29 Jun 08 - Cached

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
...

Cancel
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
...

Cancel
The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
...

Cancel
...6 more annotations...
Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
...

Cancel
Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
- Roger Chen on 29 Jun 08
  
  That's what Chris Anderson thought is old-school.
  
  <div class="cArrow"> </div><div class="cContentInner">That's what Chris Anderson thought is old-school.</div>
  
  ...
  
  Cancel
...

Cancel
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
- Roger Chen on 29 Jun 08
  
  Come to conclusion? I don't think so.
  
  <div class="cArrow"> </div><div class="cContentInner">Come to conclusion? I don't think so.</div>
  
  ...
  
  Cancel
...

Cancel
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
...

Cancel
What can science learn from Google?
...

Cancel
This kind of thinking is poised to go mainstream.
- Roger Chen on 29 Jun 08
  
  ???
  
  <div class="cArrow"> </div><div class="cContentInner">???</div>
  
  ...
  
  Cancel
...

Cancel

Roger Chen on 29 Jun 08

"All models are wrong, and increasing you can succeed without them."

<div class="cArrow"> </div><div class="cContentInner">"All models are wrong, and increasing you can succeed without them."</div>

...

Cancel

SocialMedia to unveil 'friendship ranks' | Tech news blog - CNET News.com - 0 views

news.cnet.com/8301-10784_3-9974220-7.html

internetwatch social media

shared by Roger Chen on 24 Jun 08 - Cached

Goldstein is expected to announce "social banners," or display ads that turn you or your friends into the hook of a marketing message. In tandem, SocialMedia will announce that it's developed a patent-pending algorithm called FriendRank to power those social banners. It's like Google's PageRank, but instead of ranking pages for their popularity, it ranks friendships.
...

Cancel