Skip to main content

Home/ Semantic Search Engines/ Contents contributed and discussions participated by Seçkin Anıl Ünlü

Contents contributed and discussions participated by Seçkin Anıl Ünlü

Seçkin Anıl Ünlü

Semantic Web Patterns: A Guide to Semantic Technologies - ReadWriteWeb - 0 views

  • To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence - computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.
  • The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically.
  • Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper
  • ...17 more annotations...
  • Similarly, top-down semantic tools are focused on dealing with imperfections in existing information.
  • Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF - a powerful, graph-based language for declaring things, and attributes and relationships between things.
  • the major benefit of RDF is interoperability and standardization, particularly for enterprises
  • Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles.
  • The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.
  • People simply do not care that a product is built on the Semantic Web, all they are looking for is utility and usefulness.
  • RDF solves a problem of data interoperability and standards.
  • Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.
  • Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it.
  • Another semantic API is offered by Dapper - a web service which facilitates the extraction of structure from unstructured HTML pages.
  • The premise that semantical understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google's algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies.
  • Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.
  • Contextual navigation does not just improve search, but rather shortcuts it.
  • The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.
  • Semantic databases are another breed of semantic applications focused on annotating web information to be more structured.
  • Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia - with RDF inside and less information in total.
  • With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing.
Seçkin Anıl Ünlü

Semantic Search: The Myth and Reality - ReadWriteWeb - 0 views

  • Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different.
  • We all know that semantic technologies are powerful, but how and why?
  • The mistake is that semantic search engines present us with Google-like search box and allow us to enter free form queries. So we type the things that we are used to asking - primitive queries.
  • ...11 more annotations...
  • The situation is made more difficult by the fact that right now there is only a thin range of problems where semantic search can clearly do better. This range is complex queries involving inferencing and reasoning over a complex data set.
  • Sadly, natural language processing gives little advantage when it comes to this category of problems.
  • Before looking at the problems that are perfect for semantic search, lets look at the hardest problems. These are computationally challenging problems that really have nothing to do with understanding semantics.
  • There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.
  • The good news is that there is a set of problems that are great for semantic search. These are the problems we have been solving so wonderfully with relational database.
  • At its most structured extreme we find Freebase - the semantic database of everything. Freebase is accessible via free text search, but more importantly via MQL (Metaweb Query Language).
  • Companies like Hakia and Powerset are probably working the hardest. These companies are trying to simultaneously build Freebase-like structures on the fly and then do natural language queries on top of them. The difference is that Hakia is using (likely similar) technology to query over the entire web, while Powerset has (probably shrewdly) chosen to restrict the search to Wikipedia.
  • Here is the problem - the natural language interface has nothing to do with the underlying data representation.
  • Fundamentally, Hakia, Powerset, and Freebase are databases. Fundamentally, all of them have some kind of Natural Language Processing that translates the question into a canonical query over the database.
  • Having a simplistic search interface hurts Powerset and Hakia, and to a lesser extent Freebase, which is not positioning itself as generic search.
  • Instead, the expectation should really be to solve the problems that can not be solved by Google today.
1 - 4 of 4
Showing 20 items per page