The Modigliani Test for Linked Data: Results
april 2010 by rahuldave
In a recent post, I outlined a kind of layman's test for the Semantic Web. I wrote that the tipping point for the Semantic Web may be when anyone can query a set of data about a historical figure and get a long list of structured results in return. I called this 'The Modigliani Test,' after my favorite artist Amedeo Modigliani. To pass this test, you must deliver - using Linked Data - a comprehensive list of locations of original Modigliani art works around the world.
A developer named Atanas Kiryakov gave the test a good crack. In doing so, he illustrated the core issues facing the Semantic Web currently.
Sponsor
The challenge of this test is that there isn't currently enough linked data on the Web about Modigliani. Also the key data in this test is the locations of art works, which probably isn't one of the main data fields for art data when it's uploaded to the Web (artist name and art work title would be the two key data fields).
Kiryakov wasn't the only person who attempted to pass the test; and in fact his results mirror what can be found already on the popular open database Freebase. However Kiryakov, who is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he faced.
The Current State of Linked Data Queries
The result of Kiryakov's attempt is a relatively short list of locations of Modigliani paintings around the world. He admits that the list isn't long enough, but says that it's the closest he could get - not just because of the limited amount of data in the Linked Data Web, but because it's "hard to query and use today."
Essentially Kiryakov created code to query a few known Linked Data sets, with custom manipulations to output location data. This is what he came up with:
PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
ot:preferredLabel ?painting_l.
?ow ot:preferredLabel ?owner_l .
OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}
That query was executed in a tool called LDSR, a "Linked Data Semantic Repository" created by Kiryakov's company Ontotext. He calls LDSR a "search engine for part of the linked data web." Ontotext's LDSR includes data from existing Linked Data repositories such as DBPedia, Freebase, Geonames, UMBEL and Wordnet.
Here is a screenshot of Atanas Kiryakov's attempt to pass the Modigliani Test. He spent over an hour formulating the code used to generate this result.
As you can see, the resulting list was just 8 items long and most of the locations are in major U.S. cities. This falls well short of a comprehensive list of Modigliani art work locations. For example, there's no data about Modigliani paintings in Europe - where Modigliani lived all his life.
Other Sources of Modiglidata
Kiryakov wrote that most of the data returned in the Modigliani example came from Freebase. Indeed, as RWW commenter Brian Karlak pointed out in our original post, you can get much the same result within Freebase itself. Another commenter, Michael, pointed to a non-technical results page. Kiryakov's result has a little more data, but not much more.
However the point of Kiryakov's attempt and blog post was to point out the difficulty of passing the Modigliani Test right now. He noted that "getting useful information from LOD [Linked Open Data] quite often requires a lot of efforts to analyze and post-process them in order to get reasonable answers to structured queries." In other words, it's much more than just inputting a natural language query (note that the Freebase example was provided by a user there named masouras, so it's not something an average user could do).
I should also mention that in the comments to the previous post, Bruce Wayne pointed to his company Factoetum's effort to pass the test - which had 7 results, including some different ones to Ontotext/Freebase. Like Kiryakov, Wayne noted that it's "nearly impossible" for non technical people to use the current solutions.
Finally, to address an issue that some commenters raised in the previous post: yes it would be possible to pass the Modigliani Test with some manual human effort to track down location data. But that's cheating - we want to see this done using Linked Data. And not just for Modigliani works, but for any other artist.
Much Work to Be Done
Atanas Kiryakov concluded that "there is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge."
While that's true, I thank Atanas for giving the Modigliani Test a crack. At least now I know to visit the Museum of Modern Art when I next go to New York!
Let us know your thoughts on the Modigliani Test in the comments. Or perhaps you're a developer willing to take on this challenge?
Discuss
Structured_Data
from google
A developer named Atanas Kiryakov gave the test a good crack. In doing so, he illustrated the core issues facing the Semantic Web currently.
Sponsor
The challenge of this test is that there isn't currently enough linked data on the Web about Modigliani. Also the key data in this test is the locations of art works, which probably isn't one of the main data fields for art data when it's uploaded to the Web (artist name and art work title would be the two key data fields).
Kiryakov wasn't the only person who attempted to pass the test; and in fact his results mirror what can be found already on the popular open database Freebase. However Kiryakov, who is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he faced.
The Current State of Linked Data Queries
The result of Kiryakov's attempt is a relatively short list of locations of Modigliani paintings around the world. He admits that the list isn't long enough, but says that it's the closest he could get - not just because of the limited amount of data in the Linked Data Web, but because it's "hard to query and use today."
Essentially Kiryakov created code to query a few known Linked Data sets, with custom manipulations to output location data. This is what he came up with:
PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
ot:preferredLabel ?painting_l.
?ow ot:preferredLabel ?owner_l .
OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}
That query was executed in a tool called LDSR, a "Linked Data Semantic Repository" created by Kiryakov's company Ontotext. He calls LDSR a "search engine for part of the linked data web." Ontotext's LDSR includes data from existing Linked Data repositories such as DBPedia, Freebase, Geonames, UMBEL and Wordnet.
Here is a screenshot of Atanas Kiryakov's attempt to pass the Modigliani Test. He spent over an hour formulating the code used to generate this result.
As you can see, the resulting list was just 8 items long and most of the locations are in major U.S. cities. This falls well short of a comprehensive list of Modigliani art work locations. For example, there's no data about Modigliani paintings in Europe - where Modigliani lived all his life.
Other Sources of Modiglidata
Kiryakov wrote that most of the data returned in the Modigliani example came from Freebase. Indeed, as RWW commenter Brian Karlak pointed out in our original post, you can get much the same result within Freebase itself. Another commenter, Michael, pointed to a non-technical results page. Kiryakov's result has a little more data, but not much more.
However the point of Kiryakov's attempt and blog post was to point out the difficulty of passing the Modigliani Test right now. He noted that "getting useful information from LOD [Linked Open Data] quite often requires a lot of efforts to analyze and post-process them in order to get reasonable answers to structured queries." In other words, it's much more than just inputting a natural language query (note that the Freebase example was provided by a user there named masouras, so it's not something an average user could do).
I should also mention that in the comments to the previous post, Bruce Wayne pointed to his company Factoetum's effort to pass the test - which had 7 results, including some different ones to Ontotext/Freebase. Like Kiryakov, Wayne noted that it's "nearly impossible" for non technical people to use the current solutions.
Finally, to address an issue that some commenters raised in the previous post: yes it would be possible to pass the Modigliani Test with some manual human effort to track down location data. But that's cheating - we want to see this done using Linked Data. And not just for Modigliani works, but for any other artist.
Much Work to Be Done
Atanas Kiryakov concluded that "there is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge."
While that's true, I thank Atanas for giving the Modigliani Test a crack. At least now I know to visit the Museum of Modern Art when I next go to New York!
Let us know your thoughts on the Modigliani Test in the comments. Or perhaps you're a developer willing to take on this challenge?
Discuss
april 2010 by rahuldave
The Modigliani Test: The Semantic Web's Tipping Point
april 2010 by rahuldave
In our recent posts about Structured Data, we've emphasized that most of the current initiatives have been around uploading new data to the Web - whatever the format. The U.S. and U.K. governments have led the way with their 'open data' websites, but much of that data isn't 'linked' yet. In other words, it's online - but siloed. So how do we get to the next stage of the Semantic Web, linking disparate data sets together so that people can begin to use that data?
The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return. I've decided to term this 'The Modigliani Test.'
Sponsor
Amedeo Modigliani is one of my favorite artists. He was moderately famous during the early 20th century and has something of a cult following nowadays. But he's not Da Vinci or Picasso famous. What I'd like to do in a Semantic Web is type the following query into a search engine and get back a large list of results: tell me the locations of all the original paintings of Modigliani.
As of today, there's no place to type that query in and get a list of structured data. The closest I can find to doing that is the Artcyclopedia entry for Modigliani, which has a list of locations for Modigliani artworks. It's great that they have the location data listed on one web page. However it's not structured data, so we can't query it. There's also not much order to the data, we have no idea if this is a comprehensive list, it's not verified data, and so on.
In summary, there's a lot of data on the Web about the location of original art works - but much of it is in traditional 'document' web pages. What we're after is a giant database of art works, which anybody can query and re-use.
Here's an early, overly geeky view at what a Linked Data of painting locations would look like (hat-tip @dakoller):
The above is a far from comprehensive list of art works by Hieronymus Bosch (a search for Modigliani, by the way, brought up zero results). Plus of course we need a much more intuitive UI, so that non-geeks can use it too.
What do you think, when will The Modigliani Test be passed on the Web?
Discuss
Structured_Data
from google
The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return. I've decided to term this 'The Modigliani Test.'
Sponsor
Amedeo Modigliani is one of my favorite artists. He was moderately famous during the early 20th century and has something of a cult following nowadays. But he's not Da Vinci or Picasso famous. What I'd like to do in a Semantic Web is type the following query into a search engine and get back a large list of results: tell me the locations of all the original paintings of Modigliani.
As of today, there's no place to type that query in and get a list of structured data. The closest I can find to doing that is the Artcyclopedia entry for Modigliani, which has a list of locations for Modigliani artworks. It's great that they have the location data listed on one web page. However it's not structured data, so we can't query it. There's also not much order to the data, we have no idea if this is a comprehensive list, it's not verified data, and so on.
In summary, there's a lot of data on the Web about the location of original art works - but much of it is in traditional 'document' web pages. What we're after is a giant database of art works, which anybody can query and re-use.
Here's an early, overly geeky view at what a Linked Data of painting locations would look like (hat-tip @dakoller):
The above is a far from comprehensive list of art works by Hieronymus Bosch (a search for Modigliani, by the way, brought up zero results). Plus of course we need a much more intuitive UI, so that non-geeks can use it too.
What do you think, when will The Modigliani Test be passed on the Web?
Discuss
april 2010 by rahuldave
10 Ideas For Web of Data Apps
april 2010 by rahuldave
At the end of last week, we posted an open thread asking what application you'd build (or would like someone else to build) using linked data or open data. The thread was inspired by Georgi Kobilarov. In this post, we list 10 of the best ideas we received.
A number of the suggested apps were for social good, for example apps for improving sustainability and finding missing persons. Other apps were more lifestyle-oriented, for example for cooking and genealogy. A few were business focused, such as a brand marketing app and a point-of-sale system. Of course a couple were just plain ol' geeky, which we love too! You can find all 10 ideas below.
Sponsor
Firstly, a quick refresher course on the terminology. Linked data is data that has been uploaded to the Web and linked to other sources, but is not necessarily open for other developers to re-use. Often when people use the term "linked data," they mean data that has been uploaded in a structured format, for example RDF. Open data is data that has been uploaded to the Web and is freely available to use, but isn't necessarily linked to other data sources. The term "open data" is often used for unstructured data, for example CSV files (spreadsheets). The ideal, of course, is data that is both linked and open. We should note however that these definitions are not universally agreed on, but they're good enough for the purposes of this post.
Missing Persons
Juan Sequeda, co-founder of Semantic Web Austin, has an idea for using linked data "to integrate data from displaced populations, specifically in Colombia." He references a BBC report from September 2009, about using semantic Web technology to enable people to search currently incompatible databases of missing persons in Columbia.
Sustainability
Bernard Vatant from Guillestre, France, wants to see "the Web of Data enable people anywhere in the world to find out smart, sustainable and low-cost solutions to their local development issues." For example, success stories in farming, water supply, energy, education and health "in environments similar to mine, anywhere in the world."
In short, Bernard wants a linked data equivalent to WiserEarth - an online community for people interested in sustainability.
A Better World
Aldo Bucchi from Chile wants an app to tackle "negligence, corruption and lack of accountability." Specifically he mentioned a recent 8.8 magnitude earthquake in Chile, which resulted in hundreds of deaths. Aldo believes that some of those deaths were avoidable, because of what he claims was "corruption and malpractice in the construction business." He thinks that a Web of data would help identify such things, as well as help "rebuild the country faster and in a more agile manner [with] the "loose-coupled coordination" that is naturally derived from a shared data substrate and a single world view."
Genealogy App
Sherry Main from Orange County, California would like an app for genealogy. Wrote Sherry:
"It would be amazing to be able to map and locate where your family is from, has been, and what notable events happened. If there was an application on a mobile device that pinged you when you are within a particular radius of say, my great-grandmother's birthplace, as I walked around a town, that would make real-world experiences more meaningful [...] As photos become geo-tagged going forward, imagine being able to get a push notification that showed an important family or historical photo to you as you stood or walked by that location."
Cooking App
Bart Stevens wants to be able to "select a (difficult) recipe and submit this to a service." He wants the following information back:
1. Where can I find the ingredients.
2. Place an order/make a reservation (@bakery, butcher or fish shop) for certain ingredients.
3. A route (street) map, per store.
4. Maybe a payment system.
Point-of-Sale & Inventory System
Daniel O'Connor would like to see a point-of sale-system and inventory system, for example for a small office supplies store.
He beckons us to imagine this: "I receive a new product, scan the barcode of it. My system queries the web for the supplier name, product data, etc [...] recognizes the supplier and hits their URI for the product.
It assimilates all of the recommended price information (ie: good relations); depictions and populates my system." You can read the full scenario in his extended comment.
Brand Marketing
John Davidson suggests that linked data can be used to assist brand marketers, specifically to find out more about their customers. He offers this example:
"A customer becomes a fan of a popular hair care brand on Facebook. She separately opted-in on the brand site to receive email alerts for new products, promotional offers that she can redeem in stores, etc. Are these distinct, separate events or are they somehow connected? By integrating these streams from the "Web of Data" the brand marketer can understand that she is an advocate for their brand. She also has several dozen or more friends she regularly interacts with in social channels. The marketer can engage her with special offers to promote their cool new products with friends in her network. The subsequent buzz and chatter sends friends to their stores to buy the new hair care products and the cycle repeats."
Research Assistant
A comment on Georgi's blog suggests an app to review literature. "Professor Aloha" wrote that he/she would create an application that could "take any research topic and backtrace (through articles, dissertations, presentations, and their accompanying reference lists) all published research articles on that topic, sorting them by year of publication, author, country of origin, journal and major findings."
Enriched People Profiles
Atif Latif from Austria would like to build an aggregator for all of the possible resources related to a person on the Web. The end result, said Atif, "will be [a] highly semantified and enriched profile of a person." Atif is working on this as we speak, with a beta app named CAF-SIAL. Good luck Atif!
In a separate comment, Kingsley Idehen of semantic Web company OpenLink Software mentioned "Verifiable Identity," noting that "all databases (including the Web of Linked Data) need verifiable identity."
Website-less websites
Nathan suggested a number of things, our favorite being "Website-less websites". Nathan wrote that "when all the data is typed and in a single format (let's say rdf) then the need for websites and webpages can completely be disposed off, rather we can view the information in an array of clients side applications each with there own benefits (like we do currently with twitter clients), The entire web can theoretically and quite easily just be one big API."
Bruce Wayne of Factoetum wrote in a separate comment that he is developing "services that will have in impact in bringing about a Website-less web." He gives an example of a list of book titles.
Those are 10 suggestions from the ReadWriteWeb community. Perhaps some enterprising entrepreneurs or developers will pick up a few of these ideas for their next startup!
Discuss
Structured_Data
from google
A number of the suggested apps were for social good, for example apps for improving sustainability and finding missing persons. Other apps were more lifestyle-oriented, for example for cooking and genealogy. A few were business focused, such as a brand marketing app and a point-of-sale system. Of course a couple were just plain ol' geeky, which we love too! You can find all 10 ideas below.
Sponsor
Firstly, a quick refresher course on the terminology. Linked data is data that has been uploaded to the Web and linked to other sources, but is not necessarily open for other developers to re-use. Often when people use the term "linked data," they mean data that has been uploaded in a structured format, for example RDF. Open data is data that has been uploaded to the Web and is freely available to use, but isn't necessarily linked to other data sources. The term "open data" is often used for unstructured data, for example CSV files (spreadsheets). The ideal, of course, is data that is both linked and open. We should note however that these definitions are not universally agreed on, but they're good enough for the purposes of this post.
Missing Persons
Juan Sequeda, co-founder of Semantic Web Austin, has an idea for using linked data "to integrate data from displaced populations, specifically in Colombia." He references a BBC report from September 2009, about using semantic Web technology to enable people to search currently incompatible databases of missing persons in Columbia.
Sustainability
Bernard Vatant from Guillestre, France, wants to see "the Web of Data enable people anywhere in the world to find out smart, sustainable and low-cost solutions to their local development issues." For example, success stories in farming, water supply, energy, education and health "in environments similar to mine, anywhere in the world."
In short, Bernard wants a linked data equivalent to WiserEarth - an online community for people interested in sustainability.
A Better World
Aldo Bucchi from Chile wants an app to tackle "negligence, corruption and lack of accountability." Specifically he mentioned a recent 8.8 magnitude earthquake in Chile, which resulted in hundreds of deaths. Aldo believes that some of those deaths were avoidable, because of what he claims was "corruption and malpractice in the construction business." He thinks that a Web of data would help identify such things, as well as help "rebuild the country faster and in a more agile manner [with] the "loose-coupled coordination" that is naturally derived from a shared data substrate and a single world view."
Genealogy App
Sherry Main from Orange County, California would like an app for genealogy. Wrote Sherry:
"It would be amazing to be able to map and locate where your family is from, has been, and what notable events happened. If there was an application on a mobile device that pinged you when you are within a particular radius of say, my great-grandmother's birthplace, as I walked around a town, that would make real-world experiences more meaningful [...] As photos become geo-tagged going forward, imagine being able to get a push notification that showed an important family or historical photo to you as you stood or walked by that location."
Cooking App
Bart Stevens wants to be able to "select a (difficult) recipe and submit this to a service." He wants the following information back:
1. Where can I find the ingredients.
2. Place an order/make a reservation (@bakery, butcher or fish shop) for certain ingredients.
3. A route (street) map, per store.
4. Maybe a payment system.
Point-of-Sale & Inventory System
Daniel O'Connor would like to see a point-of sale-system and inventory system, for example for a small office supplies store.
He beckons us to imagine this: "I receive a new product, scan the barcode of it. My system queries the web for the supplier name, product data, etc [...] recognizes the supplier and hits their URI for the product.
It assimilates all of the recommended price information (ie: good relations); depictions and populates my system." You can read the full scenario in his extended comment.
Brand Marketing
John Davidson suggests that linked data can be used to assist brand marketers, specifically to find out more about their customers. He offers this example:
"A customer becomes a fan of a popular hair care brand on Facebook. She separately opted-in on the brand site to receive email alerts for new products, promotional offers that she can redeem in stores, etc. Are these distinct, separate events or are they somehow connected? By integrating these streams from the "Web of Data" the brand marketer can understand that she is an advocate for their brand. She also has several dozen or more friends she regularly interacts with in social channels. The marketer can engage her with special offers to promote their cool new products with friends in her network. The subsequent buzz and chatter sends friends to their stores to buy the new hair care products and the cycle repeats."
Research Assistant
A comment on Georgi's blog suggests an app to review literature. "Professor Aloha" wrote that he/she would create an application that could "take any research topic and backtrace (through articles, dissertations, presentations, and their accompanying reference lists) all published research articles on that topic, sorting them by year of publication, author, country of origin, journal and major findings."
Enriched People Profiles
Atif Latif from Austria would like to build an aggregator for all of the possible resources related to a person on the Web. The end result, said Atif, "will be [a] highly semantified and enriched profile of a person." Atif is working on this as we speak, with a beta app named CAF-SIAL. Good luck Atif!
In a separate comment, Kingsley Idehen of semantic Web company OpenLink Software mentioned "Verifiable Identity," noting that "all databases (including the Web of Linked Data) need verifiable identity."
Website-less websites
Nathan suggested a number of things, our favorite being "Website-less websites". Nathan wrote that "when all the data is typed and in a single format (let's say rdf) then the need for websites and webpages can completely be disposed off, rather we can view the information in an array of clients side applications each with there own benefits (like we do currently with twitter clients), The entire web can theoretically and quite easily just be one big API."
Bruce Wayne of Factoetum wrote in a separate comment that he is developing "services that will have in impact in bringing about a Website-less web." He gives an example of a list of book titles.
Those are 10 suggestions from the ReadWriteWeb community. Perhaps some enterprising entrepreneurs or developers will pick up a few of these ideas for their next startup!
Discuss
april 2010 by rahuldave
It's All Semantics: Open Data, Linked Data & The Semantic Web
april 2010 by rahuldave
Yesterday we summarized some of the main developments in the Linked Data world over the past year. Linked Data is a W3C-backed movement that is all about connecting data sets across the Web. It can be viewed as a subset of the wider Semantic Web movement, which is about adding meaning to the Web. However, there is some confusion in the Semantic Web community about the crossover. To add to the confusion, there is a term called 'Open Data' that is being bandied around too. This commonly describes data that has been uploaded to the Web and is accessible to all, but isn't necessarily "linked" to other data sets.
So what's the beef with all of these terms? In this post we seek clarity!
Sponsor
The Difference Between Open Data and Linked Data
In the discussion over yesterday's post, a few people tweeted that the U.K. government's public data website Data.gov.uk is mostly populated with 'Open Data' and not 'Linked Data.' But what does that mean? It means that much of the data on the site is available to the public, but it doesn't link to other data sources on the Web. It could be data that has been uploaded in CSV format (i.e. spreadsheet data), which Sir Tim Berners-Lee said in an interview with me last year is a common occurrence with government departments. Or it could be data in another non-Web format.
Screen from a Tim Berners-Lee presentation on Linked Data, circa 2008
Titti Cimmino put it nicely: Open Data is simply 'data on the web,' whereas Linked Data is a 'web of data.'
However, the idea of Open Data is to turn it into Linked Data. As John S. Erickson pointed out, the first priority of Data.gov.uk (and its U.S. counterpart) is to publish lots of Open Data. The next step is to work towards linking it all up. This is already starting to happen. Answering a question I posed on Twitter, Kingsley Idehen confirmed that Data.gov.uk is currently a combination of Open Data and Linked Data.
Linked Data and The Semantic Web
So may we then suggest that the idea of Linked Data is to turn it into a Semantic Web? Or are they the same thing already?
Lorna Campbell from the University of Strathclyde in Scotland tackled those and other questions in an excellent post earlier this month. She started by warning of the potential for another "holy war" about terminology. I won't delve into that in this post, however this excerpt from Campbell's post gives you a flavor of the terminology angst:
"Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case "linked data", or "linkable data", for data that uses other technologies."
Even Wikipedia can't define Semantic Web...
Campbell quotes from a number of other articles, in trying to come to a conclusion about how Linked Data and the Semantic Web relate. Perhaps the best definition she found was this one by Paul Walk:
data can be open, while not being linked
data can be linked, while not being open
data which is both open and linked is increasingly viable
the Semantic Web can only function with data which is both open and linked"
Why This Matters
So there you have it, Linked Data is NOT the same as the Semantic Web. It's also not necessarily open, in other words accessible to developers.
Whatever the definitions, the key points about all of Open Data, Linked Data and the Semantic Web, are:
data is being uploaded to the Web that wasn't online before (e.g. much of the data on Data.gov.uk).
structure is being added to the data using Linked Data and/or Semantic Web technologies.
The bottom line is that the more data we have on the Web that is linked and has defined meaning, the smarter our web applications will be. This is why these activities are so exciting, despite the terminology confusion!
Image credit: Semantic Web Rubik's Cube, dullhunk
Discuss
Structured_Data
from google
So what's the beef with all of these terms? In this post we seek clarity!
Sponsor
The Difference Between Open Data and Linked Data
In the discussion over yesterday's post, a few people tweeted that the U.K. government's public data website Data.gov.uk is mostly populated with 'Open Data' and not 'Linked Data.' But what does that mean? It means that much of the data on the site is available to the public, but it doesn't link to other data sources on the Web. It could be data that has been uploaded in CSV format (i.e. spreadsheet data), which Sir Tim Berners-Lee said in an interview with me last year is a common occurrence with government departments. Or it could be data in another non-Web format.
Screen from a Tim Berners-Lee presentation on Linked Data, circa 2008
Titti Cimmino put it nicely: Open Data is simply 'data on the web,' whereas Linked Data is a 'web of data.'
However, the idea of Open Data is to turn it into Linked Data. As John S. Erickson pointed out, the first priority of Data.gov.uk (and its U.S. counterpart) is to publish lots of Open Data. The next step is to work towards linking it all up. This is already starting to happen. Answering a question I posed on Twitter, Kingsley Idehen confirmed that Data.gov.uk is currently a combination of Open Data and Linked Data.
Linked Data and The Semantic Web
So may we then suggest that the idea of Linked Data is to turn it into a Semantic Web? Or are they the same thing already?
Lorna Campbell from the University of Strathclyde in Scotland tackled those and other questions in an excellent post earlier this month. She started by warning of the potential for another "holy war" about terminology. I won't delve into that in this post, however this excerpt from Campbell's post gives you a flavor of the terminology angst:
"Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case "linked data", or "linkable data", for data that uses other technologies."
Even Wikipedia can't define Semantic Web...
Campbell quotes from a number of other articles, in trying to come to a conclusion about how Linked Data and the Semantic Web relate. Perhaps the best definition she found was this one by Paul Walk:
data can be open, while not being linked
data can be linked, while not being open
data which is both open and linked is increasingly viable
the Semantic Web can only function with data which is both open and linked"
Why This Matters
So there you have it, Linked Data is NOT the same as the Semantic Web. It's also not necessarily open, in other words accessible to developers.
Whatever the definitions, the key points about all of Open Data, Linked Data and the Semantic Web, are:
data is being uploaded to the Web that wasn't online before (e.g. much of the data on Data.gov.uk).
structure is being added to the data using Linked Data and/or Semantic Web technologies.
The bottom line is that the more data we have on the Web that is linked and has defined meaning, the smarter our web applications will be. This is why these activities are so exciting, despite the terminology confusion!
Image credit: Semantic Web Rubik's Cube, dullhunk
Discuss
april 2010 by rahuldave
The State of Linked Data in 2010
march 2010 by rahuldave
In May last year we wrote about the state of Linked Data, an official W3C project that aims to connect separate data sets on the Web. Linked Data is a subset of the wider Semantic Web movement, in which data on the Web is encoded with meaning using technologies such as RDF and OWL. The ultimate vision is that the Web will become much more structured, which opens up many possibilities for "smarter" Web applications.
At this stage last year, we noted that Linked Data was ramping up fast - evidenced by the increasing number of data sets on the Web as at March 2009. Fast forward a year and the Linked Data "cloud" has continued to expand. In this post we look at some of the developments in Linked Data over the past year.
Sponsor
Governments Get on Board
The most high-profile usage of Linked Data over the past year has come from two governments: the United States and United Kingdom.
The U.S. was first to open up some of its non-personal data for use by developers, with the May 2009 launch of Data.gov. In January 2010, the U.K. government announced Data.gov.uk - with the help of Sir Tim Berners-Lee, the inventor of the World Wide Web. At launch, Data.gov.uk had nearly 3,000 data sets available for developers to build mashups with. At the time it was more than three times as much data than the U.S. site offered.
Following on from the launch of Data.gov.uk, U.K. Prime Minister Gordon Brown announced a new British Institute for Web Science along with $45 million in government backing. The Institute will be led by Berners-Lee and prominent researcher Nigel Shadbolt. This was great news for Linked Data, because according to Prime Minister Brown, the Institute "will help place the U.K. at the cutting edge of research on the Semantic Web and other emerging web and internet technologies."
Commercial Applications
There have been commercial success stories too, such as OpenCalais for media, MusicBrainz for music and GoodRelations for e-commerce. There are also many commercial sites tapping into the general knowledge data store at dbpedia.org.
However it's relatively early days for commercial applications of Linked Data. We're beginning to see smart people explore potential use cases, such as this list for news organizations, but much of the early implementation is being done by publicly funded entities such as the U.K.'s BBC.
The latest version of the Linking Open Data dataset cloud, as at July 2009, maintained by Richard Cyganiak and Anja Jentzsch.
Just Get The Data Up There
To reiterate, Linked Data is data that has been connected to other data sets using Semantic Web technologies such as RDF (Resource Description Framework) or RDFa (a simpler variation). Minus the acronyms, Linked Data is simply structured data.
However, one of the reasons the Semantic Web hasn't yet been widely adopted, at least commercially, is that it's often difficult or time consuming to mark up data semantically. RDF in particular has a reputation for being painful to code. With that in mind, the past year has been as much about prompting governments and organizations to put their data up on the Web in whatever form they can.
Indeed when I interviewed Berners-Lee last July, he told me that he'd be happy if governments "just put data up in whatever form it's available." He mentioned that "Comma separated values (CSV) files are remarkably popular." He'd be much more happier if it was semantically marked up data, using the likes of RDF, but conversion can happen after it's been uploaded to the Web.
So overall, Linked Data is still early in its adoption curve. However it's undeniably become a solid on-ramp to the wider Semantic Web and world of structured data.
For a good technical overview of the current state of Linked Data and the Semantic Web, see this presentation by Davide Palmisano.
Discuss
Structured_Data
from google
At this stage last year, we noted that Linked Data was ramping up fast - evidenced by the increasing number of data sets on the Web as at March 2009. Fast forward a year and the Linked Data "cloud" has continued to expand. In this post we look at some of the developments in Linked Data over the past year.
Sponsor
Governments Get on Board
The most high-profile usage of Linked Data over the past year has come from two governments: the United States and United Kingdom.
The U.S. was first to open up some of its non-personal data for use by developers, with the May 2009 launch of Data.gov. In January 2010, the U.K. government announced Data.gov.uk - with the help of Sir Tim Berners-Lee, the inventor of the World Wide Web. At launch, Data.gov.uk had nearly 3,000 data sets available for developers to build mashups with. At the time it was more than three times as much data than the U.S. site offered.
Following on from the launch of Data.gov.uk, U.K. Prime Minister Gordon Brown announced a new British Institute for Web Science along with $45 million in government backing. The Institute will be led by Berners-Lee and prominent researcher Nigel Shadbolt. This was great news for Linked Data, because according to Prime Minister Brown, the Institute "will help place the U.K. at the cutting edge of research on the Semantic Web and other emerging web and internet technologies."
Commercial Applications
There have been commercial success stories too, such as OpenCalais for media, MusicBrainz for music and GoodRelations for e-commerce. There are also many commercial sites tapping into the general knowledge data store at dbpedia.org.
However it's relatively early days for commercial applications of Linked Data. We're beginning to see smart people explore potential use cases, such as this list for news organizations, but much of the early implementation is being done by publicly funded entities such as the U.K.'s BBC.
The latest version of the Linking Open Data dataset cloud, as at July 2009, maintained by Richard Cyganiak and Anja Jentzsch.
Just Get The Data Up There
To reiterate, Linked Data is data that has been connected to other data sets using Semantic Web technologies such as RDF (Resource Description Framework) or RDFa (a simpler variation). Minus the acronyms, Linked Data is simply structured data.
However, one of the reasons the Semantic Web hasn't yet been widely adopted, at least commercially, is that it's often difficult or time consuming to mark up data semantically. RDF in particular has a reputation for being painful to code. With that in mind, the past year has been as much about prompting governments and organizations to put their data up on the Web in whatever form they can.
Indeed when I interviewed Berners-Lee last July, he told me that he'd be happy if governments "just put data up in whatever form it's available." He mentioned that "Comma separated values (CSV) files are remarkably popular." He'd be much more happier if it was semantically marked up data, using the likes of RDF, but conversion can happen after it's been uploaded to the Web.
So overall, Linked Data is still early in its adoption curve. However it's undeniably become a solid on-ramp to the wider Semantic Web and world of structured data.
For a good technical overview of the current state of Linked Data and the Semantic Web, see this presentation by Davide Palmisano.
Discuss
march 2010 by rahuldave