Skip to main content

Home/ Groups/ VirgoLab
Roger Chen

Content analysis and the cold-start problem - Duke Listens! - 0 views

  • A classic problem in traditional collaborative filtering recommendation is the 'cold start' problem. It is hard to generate recommendations for new items because there isn't enough taste data about the new items to make reliable correlations with other items. That's where content analysis comes in. The cold start problem can be alleviated by basing recommendations on similarity of content as well as the wisdom of the crowds. New items can be analyzed and enrolled into a recommender, making these items available and recommendable.
  •  
    The cold-start problem is common in tranditional CF system. Conten analysis comes in to prevent form suffering from cold-start.
Roger Chen

Using Aardvark - Duke Listens! - 0 views

  • For one thing, we are using content analysis, classification and autotagging to help identify relevant content. We use incoming links and attention to determine how much authority a particular entry has on a topic.
    • Roger Chen
       
      Attention? How to find and measure the attention?
  •  
    Project Aura - a blog recommender.
Roger Chen

科学网-博士生应接受一个完整的研究过程训练(转发) - 0 views

  • 杨乐认为,博士生阶段有两项任务,一是要打下较广博的基础,二是在导师的指导下,受到一次完整的研究工作的训练,做出合格的学位论文。
  • 第二项比第一项更重要,更能直接反映出研究水平的高低。也就是在导师的带领下,完成一个研究工作全过程的训练,比如从确定课题、阅读文献、攻克难关、扩大战果、撰写论文,这样一个完整过程
  • 过去,教育部曾规定,博士论文要超过100页。杨乐说,作这样的篇幅规定也许不一定适合,但它的道理是博士论文并不全部都是创造性的成果,其中有相当篇幅是对选题所在领域或方向的一个总结,相关文献要念得很透,消化后再用自己的观点和语言将这方面的工作做一个很好的总结;同时,博士论文也应该包括创新成果,因此,博士论文并不完全相同于写一篇研究论文。
Roger Chen

Geeking with Greg: Kai-Fu Lee keynote at SIGIR - 0 views

  • Google China was optimized for finding the one site you need to go to, as it is elsewhere, but, Kai-Fu said, according to eyetracking studies and log data, Chinese users tend to be much less task-oriented, read much more of the page, and click many more links than US users.
  • One curious question that Kai-Fu raised was whether these preferences will remain true over time. Expert internet users tend to be more task-oriented than novice users. Google China has had much more success in gaining market share in China among expert users
  •  
    Googler Kai-Fu Lee gave a keynote at SIGIR 2008 on "The Google China Experience".
Roger Chen

科学网-博客即思想 - 0 views

  • McLuhan常常被翻译成麦克卢汉,其实这个地方h不发音,所以也有人翻译为麦克卢恩。既然如此,我觉得还不如翻译成麦克乱,因为大众传播时代的特点就是乱。
    • Roger Chen
       
      LOL
  • 大众媒介会选择性地放大和抑制思想和信息,比较明白的人应该与大众媒介保持一定的距离,不要把大众媒介流行的东西太当回事,更不能把大众媒介中的说法当成思想和知识的标准。
  • Only puny secrets need protection. Big discoveries are protected by public incredulity.
  • ...8 more annotations...
  • Money is the poor man’s credit card.
  • The car has become the carapace, the protective and aggressive shell, of urban and suburban man.
  • Why is it so easy to acquire the solutions of past problems and so difficult to solve current ones?
  • The trouble with a cheap, specialized education is that you never stop paying for it.
  • The future of the book is the blurb.
  • "I may be wrong, but I’m never in doubt.”
  • Politics offers yesterday’s answers to today’s questions.
  • When all men think alike, no one thinks very much.
Roger Chen

Semantic Library » Zotero and semantic principles - 0 views

  • Our Zotero Server, connected to the client, will enable all kinds of new collaboration opportunities and data-mining of aggregated collections. We also plan to provide hooks into high-performance computing projects like the SEASR text-mining project based at UIUC
  • Data mining is becoming a major trend in eResearch as computing power increases and more and more researchers have direct access to open data sets. In the future, we won’t just be citing articles, figures, images, movies, and books, we’ll also be citing specific data points.
Roger Chen

Michael Nielsen » Open science - 0 views

  • Scientific papers represent only a tiny fraction of the useful knowledge that scientists have to share with the world:
Roger Chen

Collaborative Filtering: Lifeblood of The Social Web - ReadWriteWeb - 0 views

  • This, of course, relies on the fact that people's interests, preferences, and ideologies don't change too drastically over time.
  • A filtering system with preference-based recommendations, in essence, is the future of the social web.
  • The best implementations of a Collaborative Filtering (CF) system along with a preference based recommendation/discovery system that I have seen are always on music streaming and discovery sites.
  • ...3 more annotations...
  • As you can see from above, it is certainly possible to have a good collaborative filtering system without a recommendation engine
  • Collaborative Filtering (Wikipedia definition) is a mechanism used to filter large amounts of information by spreading the process of filtering among a large group of people.
  • The important thing, one that not many social sites realize, is that a (CF) system that doesn't automatically match content to your preferences, is inherently flawed. The reason for this is simple: Unless you can achieve perfect diversity and independence of opinion, one point of view will always dominate another on a particular platform. The dominant point of view on the social web is a left-leaning one, and without the ability to get the most appropriate pieces of content to the people that care most about them, the right-wing point of view gets buried almost every time.
Roger Chen

Paper: MapReduce: Simplified Data Processing on Large Clusters | High Scalability - 0 views

  • Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
  •  
    Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
Roger Chen

Business Analytics: The Attention System of the Human Brain - 0 views

  • It is important to understand how the human brain functions in the area attention, at the cognitive and neuronal levels, so we can attempt to replicate these functions in any decision support and business analytics system
  • Data processing and analytics are separate functions that interact
  • Pattern Recognition
  • ...2 more annotations...
  • Detection
  • Knowing the anatomical reasons for decision support and analytics in brain function gives a higher importance to create decision support systems
  •  
    It is important to understand how the human brain functions in the area attention, at the cognitive and neuronal levels, so we can attempt to replicate these functions in any decision support and business analytics system.
Roger Chen

How to Maximize Citations « Apperceptual - 0 views

  • Why should we want our papers to be highly cited? I assume here that we want our work to influence other researchers, and that citation count is a reasonable estimate of influence.
  •  
    Why should we want our papers to be highly cited? I assume here that we want our work to influence other researchers, and that citation count is a reasonable estimate of influence.
Roger Chen

Good research: invent new problems or explain mysteries - 0 views

  • There are at least 3 types of good research questions: 1) explain with a theoretical model a (puzzling) experimental observation 2) improve by at least an order of magnitude an existing technique 3) make up a new problem and be the first to propose a solution
  • So I submit to you Lemire’s first rule of good research: you must either be trying to explain puzzling experimental results, or be inventing new problems.
Roger Chen

Taking action : business|bytes|genes|molecules - 0 views

  • when we build software for scientists, we should think about what they would do with the returned information. That’s where context is really important as well. I’ve seen too many examples where the user is offered options that make no sense for what you want to achieve.
  •  
    So when we build software for scientists, we should think about what they would do with the returned information. That's where context is really important as well. I've seen too many examples where the user is offered options that make no sense for what you want to achieve.
Roger Chen

The Internet Is a Brain - Jeff Stibel - 0 views

  • Let’s get concrete about what I mean here. The brain is one of the most complex networks in the world, with more neurons than there are stars in the galaxy. Its hardware is a complex network of neurons; its software a complex network of memories. And so too is the Internet a network. Its hardware is a complex network of computers; its software a complex network of websites. There is a lot we can learn from the brain and it can tell us where the Internet is headed next.
Roger Chen

灾难后的学术工作-谢泳 - 新浪BLOG - 0 views

  • 东三省肺疫流行还是晚清时期,晋北肺疫流行已是民国,但这两套疫事报告书有一个共同的特点,就是都有详细的来自最高层的材料,从大臣奏折到皇帝御批,从督军报告到总统电令,悉数全在其中。所以今天研究中国瘟疫史的人,可以通过这些材料分析当时中央和地方对疫事的态度和责任。另外,这两套编纂和印制颇为复杂的报告书,都是在疫情过后的当年完成,第二年出版的,这个速度令人感佩。
  • 1910年12月(宣统二年),东三省肺疫流行过后,曾出版过三大册《东三省疫事报告书》(宣统三年十一月出版,非卖品),对于此次灾难的记录非常详细。整个过程涉及的所有材料都在其中,这三册报告书中,有一册全部是图片和相关的图表
  • 我们有一万条理由比一百年前做得更好,但我还没有这样的信心,因为有些东西制约了我们做事的器局
  • ...1 more annotation...
  • 1918年晋北肺疫流行,疫情过后,关于这次疫情的所有情况,同样编成了三本一套的《山西省疫事报告书》(王承基总编纂,中华民国八年六月中华书局承印)
Roger Chen

Rant: Google is NOT Making us STUPID - 0 views

  • The internet is giving us a form of ADHD when it comes to reading, and we should be scared of that.
Roger Chen

"社会化网络"到底有没有"商务"? | "别来无恙"团队博客 - 0 views

  • 纵观互联网的所有商业形式,都源于三种简单明了的商业属性,那就是媒体属性、市场属性和工具属性。
  • 所谓的“社会化网络”更趋向于互联网的媒体属性,本质上说它依然是一个传播渠道,不同的是传播的载体是人而不再仅仅是信息。基于 这样的特征,“社会化网络”的“商务”价值也应该主要集中在网络营销业务上。如果真要考虑“社会化网络+电子商务”的模式,也需要(事实上也不得不)将二 者区别开来,不能过于交织甚至混为一谈。
  • 媒体属性指向网络广告业务,或者说网络营销,典型代表是门户网站、Google、Youtube;市场属性指向电子商务,典型代表是Amazon、eBay、阿里巴巴,也包括虚拟物品销售;工具属性指向付费服务,典型代表是Flickr,包括各种SAAS产品。换句话说,所有能让你靠互联网赚钱的买卖,只有这三种。
  • ...1 more annotation...
  • 传统的“社区”早就具备网络营销的基本条件,猫扑和天涯都 是上一个时代不错的例子。那么更加新潮的SNS怎么会不行呢?当然行,问题还是在人,或者说在于资源。网络营销对于大多数年轻的团队来说,都是一件很头疼 的事情。把一个社区运营好,靠的是线上的组织策划能力。而要把网络营销业务做得有声有色,那却要靠现实资源的整合。
Roger Chen

Social Media Research Blog: Some things are just Semi-Social - 0 views

  • Social Media is a lot about sharing.
  • As we start to experiment with social software we realize that sharing is good and soon become open to sharing a lot more. There are some things though, that just seem semi-social. What I mean by Semi-Social is roughly "Thing I would not mind sharing with a small group of trusted friends and family members".
Roger Chen

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

  • Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
  • Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
  • The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
  • ...6 more annotations...
  • Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
  • Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
    • Roger Chen
       
      That's what Chris Anderson thought is old-school.
  • But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
    • Roger Chen
       
      Come to conclusion? I don't think so.
  • There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
  • What can science learn from Google?
  • This kind of thinking is poised to go mainstream.
    • Roger Chen
       
      ???
  •  
    "All models are wrong, and increasing you can succeed without them."
Roger Chen

Why the cloud cannot obscure the scientific method - 0 views

  • Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn't have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance.
  • Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.
  • Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications.
  • ...1 more annotation...
  • without the testable predictions made by the theory, we'll never be able to tell how precisely it is wrong
  •  
    This article is a response to Chris Anerson's article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" - http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
‹ Previous 21 - 40 Next › Last »
Showing 20 items per page