Group items tagged google - VirgoLab

Datawocky: How Google Measures Search Quality - 0 views

anand.typepad.com/...e-measures-search-quality.html

data mining google machine learning search

shared by Roger Chen on 11 Jun 08 - Cached

The heart of the matter is this: how do you measure the quality of search results
...

Cancel
The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
...

Cancel
here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
...

Cancel
...1 more annotation...
Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
...

Cancel

Google Experiments With Next Generation Image Search - 0 views

www.techcrunch.com/...h-next-generation-image-search

papers search www

shared by Roger Chen on 28 Apr 08 - Cached

Roger Chen on 28 Apr 08

Two Google scientists presented a paper at WWW 2008 held in Beijing last week that outlines their vision for the future of image search.

<div class="cArrow"> </div><div class="cContentInner">Two Google scientists presented a paper at WWW 2008 held in Beijing last week that outlines their vision for the future of image search. </div>

...

Cancel

Paper: MapReduce: Simplified Data Processing on Large Clusters | High Scalability - 0 views

highscalability.com/data-processing-large-clusters

google papers

shared by Roger Chen on 19 Jun 08 - Cached

Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
...

Cancel

Roger Chen on 19 Jun 08

Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.

<div class="cArrow"> </div><div class="cContentInner">Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.</div>

...

Cancel

Google Gadget for Linux - Google Code - 0 views

code.google.com/...google-gadgets-for-linux

linux tools

shared by Roger Chen on 26 Jan 09 - Cached

Geeking with Greg: Kai-Fu Lee keynote at SIGIR - 0 views

glinden.blogspot.com/...i-fu-lee-keynote-at-sigir.html

china internetwatch.china

shared by Roger Chen on 22 Jul 08 - Cached

Google China was optimized for finding the one site you need to go to, as it is elsewhere, but, Kai-Fu said, according to eyetracking studies and log data, Chinese users tend to be much less task-oriented, read much more of the page, and click many more links than US users.
...

Cancel
One curious question that Kai-Fu raised was whether these preferences will remain true over time. Expert internet users tend to be more task-oriented than novice users. Google China has had much more success in gaining market share in China among expert users
...

Cancel

Roger Chen on 22 Jul 08

Googler Kai-Fu Lee gave a keynote at SIGIR 2008 on "The Google China Experience".

<div class="cArrow"> </div><div class="cContentInner">Googler Kai-Fu Lee gave a keynote at SIGIR 2008 on "The Google China Experience".</div>

...

Cancel

Rant: Google is NOT Making us STUPID - 0 views

io9.com/...google-is-not-making-us-stupid

google internetwatch thinking

shared by Roger Chen on 24 Jun 08 - Cached

The internet is giving us a form of ADHD when it comes to reading, and we should be scared of that.
...

Cancel

Is Google Making Us Stupid? - 0 views

www.theatlantic.com/...google

google internetwatch

shared by Roger Chen on 24 Jun 08 - Cached

friendfeed-api - Google Code - 0 views

code.google.com/...list

friendfeed tools

shared by Roger Chen on 23 Apr 08 - Cached

Roger Chen on 23 Apr 08

FriendFeed API libraries

<div class="cArrow"> </div><div class="cContentInner">FriendFeed API libraries</div>

...

Cancel

python-twitter - Google Code - 0 views

code.google.com/python-twitter

python tools twitter

shared by Roger Chen on 12 May 08 - Cached

Roger Chen on 12 May 08

This library provides a pure python interface for the Twitter API.

<div class="cArrow"> </div><div class="cContentInner">This library provides a pure python interface for the Twitter API. </div>

...

Cancel

Why the cloud cannot obscure the scientific method - 0 views

arstechnica.com/...ure-the-scientific-method.html

data mining thinking

shared by Roger Chen on 26 Jun 08 - Cached

Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn't have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance.
...

Cancel
Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.
...

Cancel
Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications.
...

Cancel
...1 more annotation...
without the testable predictions made by the theory, we'll never be able to tell how precisely it is wrong
...

Cancel

Roger Chen on 26 Jun 08

This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

<div class="cArrow"> </div><div class="cContentInner">This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory" rel="nofollow" target="_blank">http://www.wired.com/science/discoveries/magazine/16-07/pb_theory</a></div>

...

Cancel

Is Google Changing Your Brain? - Harvard Business Online's HBR Editors' Blog - 0 views

discussionleader.hbsp.com/...gle_changing_your_brain_1.html

google

shared by Roger Chen on 19 Jun 08 - Cached

Academic Productivity » Google's Palimpsest project: Open-Source Science Data - 0 views

www.academicproductivity.com/...oject-open-source-science-data

academia google research

shared by Roger Chen on 15 Jun 08 - Cached

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

www.wired.com/...pb_theory

data mining google statistics thinking

shared by Roger Chen on 29 Jun 08 - Cached

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
...

Cancel
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
...

Cancel
The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
...

Cancel
...6 more annotations...
Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
...

Cancel
Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
- Roger Chen on 29 Jun 08
  
  That's what Chris Anderson thought is old-school.
  
  <div class="cArrow"> </div><div class="cContentInner">That's what Chris Anderson thought is old-school.</div>
  
  ...
  
  Cancel
...

Cancel
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
- Roger Chen on 29 Jun 08
  
  Come to conclusion? I don't think so.
  
  <div class="cArrow"> </div><div class="cContentInner">Come to conclusion? I don't think so.</div>
  
  ...
  
  Cancel
...

Cancel
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
...

Cancel
What can science learn from Google?
...

Cancel
This kind of thinking is poised to go mainstream.
- Roger Chen on 29 Jun 08
  
  ???
  
  <div class="cArrow"> </div><div class="cContentInner">???</div>
  
  ...
  
  Cancel
...

Cancel

Roger Chen on 29 Jun 08

"All models are wrong, and increasing you can succeed without them."

<div class="cArrow"> </div><div class="cContentInner">"All models are wrong, and increasing you can succeed without them."</div>

...

Cancel

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

anand.typepad.com/...an-machine-learned-models.html

data mining google

shared by Roger Chen on 11 Jun 08 - Cached

Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
...

Cancel
The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
...

Cancel
It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
...

Cancel
...2 more annotations...
This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
...

Cancel
My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
...

Cancel

ByteOfZhpy - zhpy - Google Code - 0 views

code.google.com/...ByteOfZhpy

programming python

shared by Jing Lai on 13 Jul 08 - Cached

Jing Lai on 13 Jul 08

python基礎線上電子書

<div class="cArrow"> </div><div class="cContentInner">python基礎線上電子書</div>

...

Cancel

"社会化网络"到底有没有"商务"？ | "别来无恙"团队博客 - 0 views

blog.bielaiwuyang.com/?p=47

internetwatch internetwatch.china social network

shared by Roger Chen on 20 Jun 08 - Cached

纵观互联网的所有商业形式，都源于三种简单明了的商业属性，那就是媒体属性、市场属性和工具属性。
...

Cancel
所谓的“社会化网络”更趋向于互联网的媒体属性，本质上说它依然是一个传播渠道，不同的是传播的载体是人而不再仅仅是信息。基于这样的特征，“社会化网络”的“商务”价值也应该主要集中在网络营销业务上。如果真要考虑“社会化网络+电子商务”的模式，也需要（事实上也不得不）将二者区别开来，不能过于交织甚至混为一谈。
...

Cancel
媒体属性指向网络广告业务，或者说网络营销，典型代表是门户网站、Google、Youtube；市场属性指向电子商务，典型代表是Amazon、eBay、阿里巴巴，也包括虚拟物品销售；工具属性指向付费服务，典型代表是Flickr，包括各种SAAS产品。换句话说，所有能让你靠互联网赚钱的买卖，只有这三种。
...

Cancel
...1 more annotation...
传统的“社区”早就具备网络营销的基本条件，猫扑和天涯都是上一个时代不错的例子。那么更加新潮的SNS怎么会不行呢？当然行，问题还是在人，或者说在于资源。网络营销对于大多数年轻的团队来说，都是一件很头疼的事情。把一个社区运营好，靠的是线上的组织策划能力。而要把网络营销业务做得有声有色，那却要靠现实资源的整合。
...

Cancel

Could Social Media See the End of Google's PageRank? - 0 views

mashable.com/...google-pagerank

google internetwatch search social media

shared by Roger Chen on 03 Jul 08 - Cached

Paul Buchheit: The power of links and the value of global knowledge - 0 views

paulbuchheit.blogspot.com/...links-and-value-of-global.html

internetwatch social network