Group items tagged mining - VirgoLab

Datawocky: How Google Measures Search Quality - 0 views

anand.typepad.com/...e-measures-search-quality.html

data mining google machine learning search

shared by Roger Chen on 11 Jun 08 - Cached

The heart of the matter is this: how do you measure the quality of search results
...

Cancel
The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
...

Cancel
here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
...

Cancel
...1 more annotation...
Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
...

Cancel

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

anand.typepad.com/...an-machine-learned-models.html

data mining google

shared by Roger Chen on 11 Jun 08 - Cached

Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
...

Cancel
The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
...

Cancel
It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
...

Cancel
...2 more annotations...
This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
...

Cancel
My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
...

Cancel

Data & Knowledge Engineering (0169-023X) - ACM Guide to Computing Literature - 0 views

portal.acm.org/toc.cfm

data mining journals papers

shared by Roger Chen on 30 Jun 08 - Cached

Roger Chen on 30 Jun 08

Data & Knowledge Engineering (0169-023X)

<div class="cArrow"> </div><div class="cContentInner">Data & Knowledge Engineering (0169-023X)</div>

...

Cancel

Expert Systems with Applications: An International Journal (0957-4174) - 0 views

portal.acm.org/toc.cfm

data mining journals papers

shared by Roger Chen on 30 Jun 08 - Cached

Roger Chen on 30 Jun 08

ACM Portal - Expert Systems with Applications: An International Journal

<div class="cArrow"> </div><div class="cContentInner">ACM Portal - Expert Systems with Applications: An International Journal</div>

...

Cancel

Chris Anderson: Aware of All Statistical Traditions (with bonus fall course announcement) - 0 views

cscs.umich.edu/...581.html

data mining google statistics thinking

shared by Roger Chen on 29 Jun 08 - Cached

Genetic Algorithm, Grid Computing, and Visualization Techniques - 0 views

atomai.blogspot.com/...orithm-grid-computing-and.html

data mining research visualization

shared by Roger Chen on 28 Jun 08 - Cached

ACM SIGKDD: Current Explorations Issue - 0 views

www.sigkdd.org/...issue.php

data mining journals papers

shared by Roger Chen on 01 Oct 08 - Cached

K-Mean Clustering Tutorial - 0 views

people.revoledu.com/...index.html

data mining reference

shared by Roger Chen on 22 Jul 08 - Cached

Adnan Masood's Weblog! عدنان مسعود - Attending KDD 2008; Top 10 DM Algorithms... - 0 views

www.axisebusiness.com/...3e-466e-8731-1f2ea09c7689.aspx

data mining

shared by Roger Chen on 16 Jul 08 - Cached

Data Randomization - 0 views

research.microsoft.com/...view.aspx

data mining papers reference research

shared by Roger Chen on 04 Sep 08 - Cached

Roger Chen on 04 Sep 08

Attacks that exploit memory errors are still a serious problem. We present data randomization, a new technique that provides probabilistic protection against these attacks by xoring data with random masks. Data randomization uses static analysis to partition instruction operands into equivalence classes: it places two operands in the same class if they may refer to the same object in an execution that does not violate memory safety. Then it assigns a random mask to each class and it generates code instrumented to xor data read from or written to memory with the mask of the memory operand's class. Therefore, attacks that violate the results of the static analysis have unpredictable results. We implemented a data randomization prototype that compiles programs without modifications and can preventmany attacks with low overhead. Our prototype prevents all the attacks in our benchmarks while introducing an average runtime overhead of 11% (0%to 27%) and an average space overhead below 1%.

<div class="cArrow"> </div><div class="cContentInner">Attacks that exploit memory errors are still a serious problem. We present data randomization, a new technique that provides probabilistic protection against these attacks by xoring data with random masks. Data randomization uses static analysis to partition instruction operands into equivalence classes: it places two operands in the same class if they may refer to the same object in an execution that does not violate memory safety. Then it assigns a random mask to each class and it generates code instrumented to xor data read from or written to memory with the mask of the memory operand's class. Therefore, attacks that violate the results of the static analysis have unpredictable results. We implemented a data randomization prototype that compiles programs without modifications and can preventmany attacks with low overhead. Our prototype prevents all the attacks in our benchmarks while introducing an average runtime overhead of 11% (0%to 27%) and an average space overhead below 1%.</div>

...

Cancel

Geeking with Greg: KDD 2008 panel on the future of social networks - 0 views

glinden.blogspot.com/...panel-on-future-of-social.html

conference data mining network social

shared by Roger Chen on 04 Sep 08 - Cached

Some Datasets Available on the Web » Data Wrangling Blog - 0 views

www.datawrangling.com/-datasets-available-on-the-web

reference data mining research

shared by Roger Chen on 12 Nov 08 - Cached

A Brief History of Text Analytics - 0 views

www.b-eye-network.com/6311

reference research text mining

shared by Roger Chen on 14 Jul 08 - Cached

Similarity Learning - IDL - EE - Washington.edu - 1 views

idl.ee.washington.edu/...Default.aspx

data mining research

shared by Roger Chen on 11 Jan 09 - Cached

Machine Learning (Theory) » Use of Learning Theory - 0 views

hunch.net/?p=496

data mining machine learning research thinking

shared by Roger Chen on 23 Dec 08 - Cached

Machine Learning (Theory) » Programming Languages for Machine Learning Implem... - 0 views

hunch.net/?p=230

data mining machine learning programming

shared by Roger Chen on 30 Jul 08 - Cached

Attribute-Relation File Format (ARFF) - 0 views

www.cs.waikato.ac.nz/...arff.html

data mining python reference

shared by Roger Chen on 30 Jul 08 - Cached

Roger Chen on 30 Jul 08

An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software.

<div class="cArrow"> </div><div class="cContentInner">An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software.</div>

...

Cancel

Collaborative Filtering Research Papers - 0 views

jamesthornton.com/cf

data mining papers recommender

shared by Roger Chen on 30 Jun 08 - Cached

KNIME - Konstanz Information Miner - 0 views

www.knime.org

data mining tools

shared by Roger Chen on 01 Aug 08 - Cached

Roger Chen on 01 Aug 08

KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.

<div class="cArrow"> </div><div class="cContentInner">KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.</div>

...

Cancel

GroupLens Research - 0 views

www.grouplens.org

academia people data mining recommender

shared by Roger Chen on 26 Jun 08 - Cached

Roger Chen on 26 Jun 08

GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota. We conduct research in several areas, including: * recommender systems * online communities * mobile and ubiquitious technologies * digital libraries * local geographic information systems

<div class="cArrow"> </div><div class="cContentInner">GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota. We conduct research in several areas, including: * recommender systems * online communities * mobile and ubiquitious technologies * digital libraries * local geographic information systems</div>

...

Cancel

Group items tagged

Datawocky: How Google Measures Search Quality - 0 views

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

Data & Knowledge Engineering (0169-023X) - ACM Guide to Computing Literature - 0 views

Expert Systems with Applications: An International Journal (0957-4174) - 0 views

Chris Anderson: Aware of All Statistical Traditions (with bonus fall course announcement) - 0 views

Genetic Algorithm, Grid Computing, and Visualization Techniques - 0 views

ACM SIGKDD: Current Explorations Issue - 0 views

K-Mean Clustering Tutorial - 0 views

Adnan Masood's Weblog! عدنان مسعود - Attending KDD 2008; Top 10 DM Algorithms... - 0 views

Data Randomization - 0 views

Geeking with Greg: KDD 2008 panel on the future of social networks - 0 views

Some Datasets Available on the Web » Data Wrangling Blog - 0 views

A Brief History of Text Analytics - 0 views

Similarity Learning - IDL - EE - Washington.edu - 1 views

Machine Learning (Theory) » Use of Learning Theory - 0 views

Machine Learning (Theory) » Programming Languages for Machine Learning Implem... - 0 views

Attribute-Relation File Format (ARFF) - 0 views

Collaborative Filtering Research Papers - 0 views

KNIME - Konstanz Information Miner - 0 views

GroupLens Research - 0 views

Related searches