Skip to main content

Home/ Semantic-Web-Web3.0/ Group items tagged json

Rss Feed Group items tagged

Erwin Karbasi

How to publish Linked Data on the Web - 0 views

  • The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. The term Linked Data was coined by Tim Berners-Lee in his Linked Data Web architecture note. The term refers to a style of publishing and interlinking structured data on the Web. The basic assumption behind Linked Data is that the value and usefulness of data increases the more it is interlinked with other data. In summary, Linked Data is simply about using the Web to create typed links between data from different sources. The basic tenets of Linked Data are to: use the RDF data model to publish structured data on the Web use RDF links to interlink data from different data sources
  • The glue that holds together the traditional document Web is the hypertext links between HTML pages. The glue of the data web is RDF links. An RDF link simply states that one piece of data has some kind of relationship to another piece of data. These relationships can have different types. For instance, an RDF link that connects data about people can state that two people know each other; an RDF link that connects information about a person with information about publications in a bibliographic database might state that a person is the author of a specific paper.
  • In 'Dereferencing HTTP URIs' the W3C Technical Architecture Group (TAG) distinguish between two kinds of resources: information resources and non-information resources (also called 'other resources') . This distinction is quite important in a Linked Data context. All the resources we find on the traditional document Web, such as documents, images, and other media files, are information resources. But many of the things we want to share data about are not: People, physical products, places, proteins, scientific concepts, and so on. As a rule of thumb, all “real-world objects” that exist outside of the Web are non-information resources.
  • ...13 more annotations...
  • Dereferencing HTTP URIs URI Dereferencing is the process of looking up a URI on the Web in order to get information about the referenced resource. The W3C TAG draft finding about Dereferencing HTTP URIs introduced a distinction on how URIs identifying information resources and non-information resources are dereferenced: Information Resources: When a URI identifying an information resource is dereferenced, the server of the URI owner usually generates a new representation, a new snapshot of the information resource's current state, and sends it back to the client using the HTTP response code 200 OK. Non-Information Resources cannot be dereferenced directly. Therefore Web architecture uses a trick to enable URIs identifying non-information resources to be dereferenced: Instead of sending a representation of the resource, the server sends the client the URI of a information resource which describes the non-information resource using the HTTP response code 303 See Other. This is called a 303 redirect. In a second step, the client dereferences this new URI and gets a representation describing the original non-information resource.
  • Content Negotiation HTML browsers usually display RDF representations as raw RDF code, or simply download them as RDF files without displaying them. This is not very helpful to the average user. Therefore, serving a proper HTML representation in addition to the RDF representation of a resource helps humans to figure out what a URI refers to. This can be achieved using an HTTP mechanism called content negotiation. HTTP clients send HTTP headers with each request to indicate what kinds of representation they prefer. Servers can inspect those headers and select an appropriate response. If the headers indicate that the client prefers HTML, then the server can generate an HTML representation. If the client prefers RDF, then the server can generate RDF. Content negotiation for non-information resources is usually implemented in the following way. When a URI identifying a non-information resource is dereferenced, the server sends a 303 redirect to an information resource appropriate for the client. Therefore, a data source often serves three URIs related to each non-information resource, for instance: http://www4.wiwiss.fu-berlin.de/factbook/resource/Russia (URI identifying the non-information resource Russia) http://www4.wiwiss.fu-berlin.de/factbook/data/Russia (information resource with an RDF/XML representation describing Russia) http://www4.wiwiss.fu-berlin.de/factbook/page/Russia (information resource with an HTML representation describing Russia)
  • The picture below shows how dereferencing a HTTP URI identifying a non-information resource plays together with content negotiation: The client performs an HTTP GET request on a URI identifying a non-information resource. In our case a vocabulary URI. If the client is a Linked Data browser and would prefer an RDF/XML representation of the resource, it sends an Accept: application/rdf+xml header along with the request. HTML browsers would send an Accept: text/html header instead. The server recognizes the URI to identify a non-information resource. As the server can not return a representation of this resource, it answers using the HTTP 303 See Other response code and sends the client the URI of an information resource describing the non-information resource. In the RDF case: RDF content location. The client now asks the server to GET a representation of this information resource, requesting again application/rdf+xml. The server sends the client a RDF/XML document containing a description of the original resource vocabulary URI.
  • How to set RDF Links to other Data Sources RDF links enable Linked Data browsers and crawlers to navigate between data sources and to discover additional data. The application domain will determine which RDF properties are used as predicates. For instance, commonly used linking properties in the domain of describing people are foaf:knows, foaf:based_near and foaf:topic_interest . Examples of combining these properties with property values from DBpedia, the DBLP bibliography and the RDF Book Mashup are found in Tim Berners-Lee's and Ivan Herman's FOAF profiles. It is common practice to use the owl:sameAs property for stating that another data source also provides information about a specific non-information resource. An owl:sameAs link indicates that two URI references actually refer to the same thing. Therefore, owl:sameAs is used to map between different URI aliases (see Section 2.1). Examples of using owl:sameAs to indicate that two URIs talk about the same thing are again Tim's FOAF profile which states that http://www.w3.org/People/Berners-Lee/card#i identifies the same resource as http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee and http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007. Other usage examples are found in DBpedia and the Berlin DBLP server. RDF links can be set manually, which is usually the case for FOAF profiles, or they can be generated by automated linking algorithms. This approach is usually taken to interlink large datasets.
  • Recipes for Serving Information as Linked Data This chapter provides practical recipes for publishing different types of information as Linked Data on the Web. Information has to fulfill the following minimal requirements to be considered "published as Linked Data on the Web": Things must be identified with dereferenceable HTTP URIs. If such a URI is dereferenced asking for the MIME-type application/rdf+xml, a data source must return an RDF/XML description of the identified resource. URIs that identify non-information resources must be set up in one of these ways: Either the data source must return an HTTP response containing an HTTP 303 redirect to an information resource describing the non-information resource, as discussed earlier in this document. Or the URI for the non-information resource must be formed by taking the URI of the related information resource and appending a fragment identifier (e.g. #foo), as discussed in Recipe 7.1. Besides RDF links to resources within the same data source, RDF descriptions should also contain RDF links to resources provided by other data sources, so that clients can navigate the Web of Data as a whole by following RDF links. Which of the following recipes fits your needs depends on various factors, such as: How much data do you want to serve? If you only want to publish several hundred RDF triples, you might want to serve them as a static RDF file using Recipe 7.1. If your dataset is larger, you might want to load it into a proper RDF store and put the Pubby Linked Data interface in front of it as described in Recipe 7.3. How is your data currently stored? If your information is stored in a relational database, you can use D2R Server as described in Recipe 7.2. If the information is available through an API, you might implement a wrapper around this API as described in Recipe 7.4. If your information is represented in some other format such as Microsoft Excel, CSV or BibTeX, you will have to convert it to RDF first as described in Recipe 7.3. How often does your data change? If your data changes frequently, you might prefer approaches which generate RDF views on your data, such as D2R Server (Recipe 7.2), or wrappers (Recipe 7.4).
  • After you have published your information as Linked Data, you should ensure that there are external RDF links pointing at URIs from your dataset, so that RDF browser and crawlers can find your data. There are two basic ways of doing this: Add several RDF links to your FOAF profile that point at URIs identifying central resources within your dataset. Assuming that somebody else in the world knows you and references your FOAF profile, your new dataset is now reachable by following RDF links. Convince the owners of related data sources to auto-generate RDF links to URIs from your dataset. Or to make it easier for the owner of the other dataset, create the RDF links yourself and send them to her so that she just has to merge them with her dataset. A project that is extremely open to setting RDF links to other data sources is the DBpedia community project. Just announce your data source on the DBpedia mailing list or send a set of RDF links to the list.
  • Serving Static RDF Files The simplest way to serve Linked Data is to produce static RDF files, and upload them to a web server. This approach is typically chosen in situations where the RDF files are created manually, e.g. when publishing personal FOAF files or RDF vocabularies or the RDF files are generated or exported by some piece of software that only outputs to files.
  • Serving Relational Databases If your data is stored in a relational database it is usually a good idea to leave it there and just publish a Linked Data view on your existing database. A tool for serving Linked Data views on relational databases is D2R Server. D2R server relies on a declarative mapping between the schemata of the database and the target RDF terms. Based on this mapping, D2R Server serves a Linked Data view on your database and provides a SPARQL endpoint for the database.
  • Alternatively, you can also use: OpenLink Virtuoso to publish your relational database as Linked Data. Virtuoso RDF Views – Getting Started Guide on how to map your relational database to RDF and Deploying Linked Data on how to get URI dereferencing and content negotiation into place. Triplify, a small plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.
  • Serving other Types of Information If your information is currently represented in formats such as CSV, Microsoft Excel, or BibTEX and you want to serve the information as Linked Data on the Web it is usually a good idea to do the following: Convert your data into RDF using an RDFizing tool. There are two locations where such tools are listed: ConverterToRdf maintained in the ESW Wiki, and RDFizers maintained by the SIMILE team. After conversion, store your data in a RDF repository. A list of RDF repositories is maintained in the ESW Wiki. Ideally the chosen RDF repository should come with a Linked Data interface which takes care of making your data Web accessible. As many RDF repositories have not implemented Linked Data interfaces yet, you can also choose a repository that provides a SPARQL endpoint and put Pubby as a Linked Data interface in front of your SPARQL endpoint. The approach described above is taken by the DBpedia project, among others. The project uses PHP scripts to extract structured data from Wikipedia pages. This data is then converted to RDF and stored in a OpenLink Virtuoso repository which provides a SPARQL endpoint. In order to get a Linked Data view, Pubby is put in front of the SPARQL endpoint. If your dataset is sufficiently small to fit completely into the web server's main memory, then you can do without the RDF repository, and instead use Pubby's conf:loadRDF option to load the RDF data from an RDF file directly into Pubby. This might be simpler, but unlike a real RDF repository, Pubby will keep everything in main memory and doesn't offer a SPARQL endpoint.
  • Implementing Wrappers around existing Applications or Web APIs Large numbers of Web applications have started to make their data available on the Web through Web APIs. Examples of data sources providing such APIs include eBay, Amazon, Yahoo, Google and Google Base. An more comprehensive API list is found at Programmable Web. Different APIs provide diverse query and retrieval interfaces and return results using a number of different formats such as XML, JSON or ATOM. This leads to three general limitations of Web APIs: their content can not be crawled by search engines Web APIs can not be accessed using generic data browsers Mashups are implemented against a fixed number of data sources and can not take advantage of new data sources that appear on the Web. These limitations can be overcome by implementing Linked Data wrappers around APIs. In general, Linked Data wrappers do the following: They assign HTTP URIs to the non-information resources about which the API provides data. When one of these URIs is dereferenced asking for application/rdf+xml, the wrapper rewrites the client's request into a request against the underlying API. The results of the API request are transformed to RDF and sent back to the client.
  • Virtuoso Sponger Virtuoso Sponger is a framework for developing Linked Data wrappers (called cartridges) around different types of data sources. Data sources can range from HTML pages containing structured data to Web APIs. See Injecting Facebook Data into the Semantic Data Web for a demo on how Sponger is used to generate a Linked Data view on Facebook.
  • Discovering Linked Data on the Web The standard way of discovering Linked Data on the Web is by following RDF Links within data the client already knows. In order to further ease discovery, information providers can decide to support additional discovery mechanisms: Ping the Semantic Web Ping the Semantic Web is a registry service for RDF documents on the Web, which is used by several other services and client applications. Therefore, you can improve the discoverability of your data by registering your URIs with Ping The Semantic Web. HTML Link Auto-Discovery It also makes sense in many cases to set links from existing webpages to RDF data, for instance from your personal home page to your FOAF profile. Such links can be set using the HTML <link> element in the <head> of your HTML page. <link rel="alternate" type="application/rdf+xml" href="link_to_the_RDF_version" /> HTML <link> elements are used by browser extensions, like Piggybank and Semantic Radar, to discover RDF data on the Web. Semantic Web Crawling: a Sitemap Extension Semantic Web Crawling: a Sitemap Extension. The sitemap extension allows Data publishers can state where RDF is located and which alternative means are provided to access it (Linked Data, SPARQL endpoint, RDF dump). Semantic Web clients and Semantic Web crawlers can use this information to access RDF data in the most efficient way for the task they have to perform.   Dataset List on the ESW Wiki In order to make it easy not only for machines but also for humans to discover your data, you should add your dataset to the Dataset List on the ESW Wiki. Please include some example URIs of interesting resources from your dataset, so that people have starting points for browsing.
Erwin Karbasi

linked-data-api - Linked Data API Specification - Project Hosting on Google Code - 0 views

  • This document defines a vocabulary and processing model for a configurable API layer intended to support the creation of simple RESTful APIs over RDF triple stores. The API layer is intended to be deployed as a proxy in front of a SPARQL endpoint to support: Generation of documents (information resources) for the publishing of Linked Data Provision of sophisticated querying and data extraction features, without the need for end-users to write SPARQL queries Delivery of multiple output formats from these APIs, including a simple serialisation of RDF in JSON syntax
1 - 3 of 3
Showing 20 items per page