Group items tagged search - VirgoLab

Google Experiments With Next Generation Image Search - 0 views

www.techcrunch.com/...h-next-generation-image-search

papers search www

shared by Roger Chen on 28 Apr 08 - Cached

Roger Chen on 28 Apr 08

Two Google scientists presented a paper at WWW 2008 held in Beijing last week that outlines their vision for the future of image search.

<div class="cArrow"> </div><div class="cContentInner">Two Google scientists presented a paper at WWW 2008 held in Beijing last week that outlines their vision for the future of image search. </div>

...

Cancel

Geeking with Greg: Finding the location of interests and objects from search logs - 0 views

glinden.blogspot.com/...location-of-interests-and.html

data mining papers

shared by Roger Chen on 08 May 08 - Cached

Roger Chen on 08 May 08

A paper at WWW 2008, "Spatial Variation in Search Engine Queries" (PDF), by Lars Backstrom, Jon Kleinberg, Ravi Kumar, and Jasmine Novak offered many clever examples of using where people are when they do a web search both to determine when interest in a topic is geographically isolated and to estimate the physical location of objects.

<div class="cArrow"> </div><div class="cContentInner">A paper at WWW 2008, "Spatial Variation in Search Engine Queries" (PDF), by Lars Backstrom, Jon Kleinberg, Ravi Kumar, and Jasmine Novak offered many clever examples of using where people are when they do a web search both to determine when interest in a topic is geographically isolated and to estimate the physical location of objects.</div>

...

Cancel

Datawocky: How Google Measures Search Quality - 0 views

anand.typepad.com/...e-measures-search-quality.html

data mining google machine learning search

shared by Roger Chen on 11 Jun 08 - Cached

The heart of the matter is this: how do you measure the quality of search results
...

Cancel
The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
...

Cancel
here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
...

Cancel
...1 more annotation...
Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
...

Cancel

Seth's Blog: Nearly infinite - 0 views

sethgodin.typepad.com/...nearly-infinite.html

internetwatch search thinking

shared by Roger Chen on 28 Apr 08 - Cached

Roger Chen on 28 Apr 08

Search makes the infinite finite (at least for a while).

<div class="cArrow"> </div><div class="cContentInner">Search makes the infinite finite (at least for a while).</div>

...

Cancel

Welcome To The Future Of Online Search | MakeUseOf.com - 0 views

www.makeuseof.com/...to-the-future-of-online-search

search

shared by Roger Chen on 20 Jun 08 - Cached

The Noisy Channel: Special Issues of Information Processing & Management - 0 views

thenoisychannel.blogspot.com/...ial-issues-of-information.html

papers

shared by Roger Chen on 11 May 08 - Cached

Roger Chen on 11 May 08

Max Wilson at the University of Southampton recently called my attention to a pair of special issues of Information Processing & Management. The first is on Evaluation of Interactive Information Retrieval Systems; the second is on Evaluating Exploratory Search Systems. Both are available online at ScienceDirect.

<div class="cArrow"> </div><div class="cContentInner">Max Wilson at the University of Southampton recently called my attention to a pair of special issues of Information Processing & Management. The first is on Evaluation of Interactive Information Retrieval Systems; the second is on Evaluating Exploratory Search Systems. Both are available online at ScienceDirect.</div>

...

Cancel

Data Mining and Knowledge Discovery Search Engine - 0 views

www.google.com/...cse

data mining search tools

shared by Roger Chen on 26 Jun 08 - Cached

How text search has evolved over the past 15 years | Text Technologies - 0 views

www.texttechnologies.com/...evolved-over-the-past-15-years

data mining text mining

shared by Roger Chen on 19 Jun 08 - Cached

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

anand.typepad.com/...an-machine-learned-models.html

data mining google

shared by Roger Chen on 11 Jun 08 - Cached

Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
...

Cancel
The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
...

Cancel
It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
...

Cancel
...2 more annotations...
This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
...

Cancel
My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
...

Cancel

Microsoft on Organizing Information in Storylines -SEO by the SEA - 0 views

www.seobythesea.com/?p=1048

papers search

shared by Roger Chen on 03 May 08 - Cached

Roger Chen on 03 May 08

A newly published patent application from Microsoft takes an interesting spin on presenting information, pulling together news from a mix of sources to present topics in storylines, and providing ways to have that information delivered to us over computers, smart phones, watch interfaces, and in other ways.

<div class="cArrow"> </div><div class="cContentInner">A newly published patent application from Microsoft takes an interesting spin on presenting information, pulling together news from a mix of sources to present topics in storylines, and providing ways to have that information delivered to us over computers, smart phones, watch interfaces, and in other ways.</div>

...

Cancel

PageRank for Ranking Journals « Synthèse - 0 views

synthese.wordpress.com/...pagerank-for-ranking-journals

academia papers search

shared by Roger Chen on 02 May 08 - Cached

PageRank in academic publishing « Peter Rohde's Blog - 0 views

peterrohde.wordpress.com/...agerank-in-academic-publishing

academia search

shared by Roger Chen on 02 May 08 - Cached

The standard measure scientists use to judge the importance of scientific papers is a simple citation count. That is, how many other papers cite the paper in question? While this measure has its merits, it has one fundamental flaw - not all citations are equal.
...

Cancel
Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals
...

Cancel
To represent the web we use a directed graph, where the edges carry a direction.
...

Cancel
...4 more annotations...
The goal of the PageRank algorithm is two-fold. We wish to construct a measure of relevance that, first, is related to how many incoming links a site has, and second, what the importance of the source of those links was.
...

Cancel
Well scientific papers can be mapped to a graph in a similar way to web-sites. Specifically, vertices in the graph would represent papers, and edges citations. The PageRank algorithm can be applied out-of-the-box.
...

Cancel
First of all, one could discount self-citations from the index
...

Cancel
A second variation that one might try is to add a time bias when calculating the index, such that links from more recent papers carry more weight than from older papers.
...

Cancel

Roger Chen on 02 May 08

Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals.

<div class="cArrow"> </div><div class="cContentInner">Numerous authors/bloggers have advocated using a PageRank-like index for quantifying the importance of papers or journals.</div>

...

Cancel

Word Count as a Measure of Quality on Wikipedia-Beyond Search - 0 views

www.guwendong.cn/...wiki_word_count.html

academia papers research

shared by Roger Chen on 07 May 08 - Cached

Roger Chen on 07 May 08

刚刚结束的 WWW2008 会议里有一篇 short paper，《Size Matters: Word Count as a Measure of Quality on Wikipedia》。里面给出了一个令人吃惊的实验结果，在进行 Wikipedia 的文章质量评价时，仅仅只需要使用"Word Count"一个参数，就可以取得 96.31% 的准确率！这个结果，比许多使用复杂模型的算法，都要好！

<div class="cArrow"> </div><div class="cContentInner">刚刚结束的 WWW2008 会议里有一篇 short paper，《Size Matters: Word Count as a Measure of Quality on Wikipedia》。里面给出了一个令人吃惊的实验结果，在进行 Wikipedia 的文章质量评价时，仅仅只需要使用"Word Count"一个参数，就可以取得 96.31% 的准确率！这个结果，比许多使用复杂模型的算法，都要好！</div>

...

Cancel

蔡學鏞【言程序】: 網路世界的爬蟲類 - 0 views

jerrylovesrebol.blogspot.com/...blog-post_11.html

search

shared by Roger Chen on 05 May 08 - Cached

ReadWriteWeb Semantic Web Blog Search - 0 views

www.google.com/...cse

internetwatch tools

shared by Roger Chen on 18 May 08 - Cached

OpenSearch 初探 - wowbox blog (網頁設計知識庫) - 0 views

www.wowbox.com.tw/...article.asp

firefox search

shared by Roger Chen on 20 Jun 08 - Cached

UBC Academic Search - Another Impact Factor Metric - <i>W-index</i> - 0 views

weblogs.elearning.ubc.ca/...046689.html

academia papers

shared by Roger Chen on 19 Jun 08 - Cached

Roger Chen on 19 Jun 08

A new index measuring a scientist's impact in his/her field has been developed called the Wu index or w-index. Developed by Qiang Wu from the University of Science and Technology of China in Hefei, it was published as The w-index: A significant improvement of the h-index in this week's Physics arxiv.

<div class="cArrow"> </div><div class="cContentInner">A new index measuring a scientist's impact in his/her field has been developed called the Wu index or w-index. Developed by Qiang Wu from the University of Science and Technology of China in Hefei, it was published as The w-index: A significant improvement of the h-index in this week's Physics arxiv. </div>

...

Cancel

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

www.wired.com/...pb_theory

data mining google statistics thinking

shared by Roger Chen on 29 Jun 08 - Cached

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
...

Cancel
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
...

Cancel
The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
...

Cancel
...6 more annotations...
Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
...

Cancel
Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
- Roger Chen on 29 Jun 08
  
  That's what Chris Anderson thought is old-school.
  
  <div class="cArrow"> </div><div class="cContentInner">That's what Chris Anderson thought is old-school.</div>
  
  ...
  
  Cancel
...

Cancel
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
- Roger Chen on 29 Jun 08
  
  Come to conclusion? I don't think so.
  
  <div class="cArrow"> </div><div class="cContentInner">Come to conclusion? I don't think so.</div>
  
  ...
  
  Cancel
...

Cancel
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
...

Cancel
What can science learn from Google?
...

Cancel
This kind of thinking is poised to go mainstream.
- Roger Chen on 29 Jun 08
  
  ???
  
  <div class="cArrow"> </div><div class="cContentInner">???</div>
  
  ...
  
  Cancel
...

Cancel

Roger Chen on 29 Jun 08

"All models are wrong, and increasing you can succeed without them."

<div class="cArrow"> </div><div class="cContentInner">"All models are wrong, and increasing you can succeed without them."</div>

...

Cancel

SCI-E Journal Search - Scientific - 0 views

www.thomsonscientific.com/...jloptions.cgi

journals reference research

shared by Roger Chen on 30 Jun 08 - Cached

SCI Journal Search - Scientific - 0 views