Skip to main content

Home/ VirgoLab/ Group items tagged google

Rss Feed Group items tagged

Roger Chen

Datawocky: How Google Measures Search Quality - 0 views

  • The heart of the matter is this: how do you measure the quality of search results
  • The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
  • here's the shocker -- these metrics are not very sensitive to new ranking models! When Google tries new ranking models, these metrics sometimes move, sometimes not, and never by much
  • ...1 more annotation...
  • Two learnings from this story: one, the results depend quite strongly on the test set, which again speaks against machine-learned models. And two, Yahoo and Google users differ quite significantly in the kinds of searches they do
Roger Chen

Google Experiments With Next Generation Image Search - 0 views

  •  
    Two Google scientists presented a paper at WWW 2008 held in Beijing last week that outlines their vision for the future of image search.
Roger Chen

Paper: MapReduce: Simplified Data Processing on Large Clusters | High Scalability - 0 views

  • Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
  •  
    Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
Roger Chen

Geeking with Greg: Kai-Fu Lee keynote at SIGIR - 0 views

  • Google China was optimized for finding the one site you need to go to, as it is elsewhere, but, Kai-Fu said, according to eyetracking studies and log data, Chinese users tend to be much less task-oriented, read much more of the page, and click many more links than US users.
  • One curious question that Kai-Fu raised was whether these preferences will remain true over time. Expert internet users tend to be more task-oriented than novice users. Google China has had much more success in gaining market share in China among expert users
  •  
    Googler Kai-Fu Lee gave a keynote at SIGIR 2008 on "The Google China Experience".
Roger Chen

Rant: Google is NOT Making us STUPID - 0 views

  • The internet is giving us a form of ADHD when it comes to reading, and we should be scared of that.
Roger Chen

friendfeed-api - Google Code - 0 views

  •  
    FriendFeed API libraries
Roger Chen

python-twitter - Google Code - 0 views

  •  
    This library provides a pure python interface for the Twitter API.
Roger Chen

Why the cloud cannot obscure the scientific method - 0 views

  • Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn't have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance.
  • Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.
  • Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications.
  • ...1 more annotation...
  • without the testable predictions made by the theory, we'll never be able to tell how precisely it is wrong
  •  
    This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
Roger Chen

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

  • Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
  • Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
  • The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
  • ...6 more annotations...
  • Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
  • Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
    • Roger Chen
       
      That's what Chris Anderson thought is old-school.
  • But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
    • Roger Chen
       
      Come to conclusion? I don't think so.
  • There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
  • What can science learn from Google?
  • This kind of thinking is poised to go mainstream.
    • Roger Chen
       
      ???
  •  
    "All models are wrong, and increasing you can succeed without them."
Roger Chen

Datawocky: Are Machine-Learned Models Prone to Catastrophic Errors? - 0 views

  • Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan. In these cases, you can make quite a fool of yourself by assuming that the future looks like the past.
  • The current generation of machine learning algorithms can work well in Mediocristan but not in Extremistan.
  • It has long been known that Google's search algorithm actually works at 2 levels: An offline phase that extracts "signals" from a massive web crawl and usage data. An example of such a signal is page rank. These computations need to be done offline because they analyze massive amounts of data and are time-consuming. Because these signals are extracted offline, and not in response to user queries, these signals are necessarily query-independent. You can think of them tags on the documents in the index. There are about 200 such signals. An online phase, in response to a user query. A subset of documents is identified based on the presence of the user's keywords. Then, these documents are ranked by a very fast algorithm that combines the 200 signals in-memory using a proprietary formula.
  • ...2 more annotations...
  • This raises a fundamental philosophical question. If Google is unwilling to trust machine-learned models for ranking search results, can we ever trust such models for more critical things, such as flying an airplane, driving a car, or algorithmic stock market trading? All machine learning models assume that the situations they encounter in use will be similar to their training data. This, however, exposes them to the well-known problem of induction in logic.
  • My hunch is that humans have evolved to use decision-making methods that are less likely blow up on unforeseen events (although not always, as the mortgage crisis shows)
Jing Lai

ByteOfZhpy - zhpy - Google Code - 0 views

  •  
    python基礎線上電子書
Roger Chen

"社会化网络"到底有没有"商务"? | "别来无恙"团队博客 - 0 views

  • 纵观互联网的所有商业形式,都源于三种简单明了的商业属性,那就是媒体属性、市场属性和工具属性。
  • 所谓的“社会化网络”更趋向于互联网的媒体属性,本质上说它依然是一个传播渠道,不同的是传播的载体是人而不再仅仅是信息。基于 这样的特征,“社会化网络”的“商务”价值也应该主要集中在网络营销业务上。如果真要考虑“社会化网络+电子商务”的模式,也需要(事实上也不得不)将二 者区别开来,不能过于交织甚至混为一谈。
  • 媒体属性指向网络广告业务,或者说网络营销,典型代表是门户网站、Google、Youtube;市场属性指向电子商务,典型代表是Amazon、eBay、阿里巴巴,也包括虚拟物品销售;工具属性指向付费服务,典型代表是Flickr,包括各种SAAS产品。换句话说,所有能让你靠互联网赚钱的买卖,只有这三种。
  • ...1 more annotation...
  • 传统的“社区”早就具备网络营销的基本条件,猫扑和天涯都 是上一个时代不错的例子。那么更加新潮的SNS怎么会不行呢?当然行,问题还是在人,或者说在于资源。网络营销对于大多数年轻的团队来说,都是一件很头疼 的事情。把一个社区运营好,靠的是线上的组织策划能力。而要把网络营销业务做得有声有色,那却要靠现实资源的整合。
Roger Chen

Paul Buchheit: The power of links and the value of global knowledge - 0 views

  • With Pagerank, Google took a very different approach. Instead of considering each page in isolation, they examined the link structure of the entire web and computed a global evaluation of that structure. In other words, they began looking at the entire forest instead of just the individual trees.
1 - 20 of 27 Next ›
Showing 20 items per page