Jim Pick believes that the future of the Web lies in the sharing of vast amounts of data. So, the 38-year-old freelance software developer has been doing his part to bring this vision to fruition. In November, Pick created a Vancouver-specific site on Freebase, a collaborative on-line database, and set about collecting data on topics such as the city’s bloggers, streets, and parks.
Data stored on Freebase is available to anyone and is structured in Resource Description Framework, the lingua franca of what’s known as the Semantic Web. Pick predicts that, as more data becomes available and more applications make use of this data, the Web—and our world—will become a lot more like science fiction.
“What we’re used to today is not what it is going to be like five years from now or 10 years from now,” the founder of the RDF Vancouver Semantic Web User Group told the Georgia Straight in a downtown coffee shop. “So, I think the Semantic Web—where you start providing more information that the computers can parse and read and absorb and use—that’s where it’s going to go.”
Although the Semantic Web is poised to revolutionize how we use the Internet, it isn’t widely recognized or discussed in mainstream circles. It’s sometimes referred to as a component of Web 3.0, the next phase of the Web’s development.
The Semantic Web is the brainchild of Tim Berners-Lee, the English computer scientist who invented the World Wide Web 20 years ago. In a landmark 2001 article published in Scientific American, Berners-Lee and two coauthors noted that most Web content can be read by humans but can’t be understood by computers.
The writers of “The Semantic Web” explained that, when much of the information on the Web is encoded in such a way that it can be processed automatically, software agents will be able to perform complicated tasks on behalf of users, like booking a doctor’s appointment that fits a person’s schedule and health plan. They even imagined microwave ovens checking frozen-food manufacturers’ Web sites for cooking instructions.
“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation,” wrote Berners-Lee and fellow computer scientists James Hendler and Ora Lassila.
Eight years later, many of the building blocks of the Semantic Web are in place. Since the first version of RDF was standardized in 1999, the World Wide Web Consortium—which was founded by Berners-Lee and others—has developed key Semantic Web technologies such as Web Ontology Language, the SPARQL Protocol and RDF Query Language, and RDFa.
On the phone from his home in Amsterdam, Ivan Herman, the consortium’s Semantic Web activity lead, explained to the Straight that the basis of RDF, and therefore of structured data on the Semantic Web, is the “triple”: a statement containing a subject, a predicate, and an object. Machine-readable triples can be expressed in Web pages by using a set of Extensible Hypertext Markup Language extensions called RDFa to annotate existing content, or they can be contained in separate RDF files. On the Semantic Web, the familiar HTTP uniform resource identifier, or Internet address, isn’t just used to point to on-line documents; it can refer to real-world objects, concepts, places, and even people.
With the expected completion of a second version of Web Ontology Language and other standards, 2009 could prove to be a “very, very big year” for the Semantic Web, according to Herman. The computer scientist said that the Semantic Web will lead to “mashups on steroids”—applications that pull in data from all over the Web—and search engines that offer highly personalized results.
“People very often ask me ”˜What is the killer app?’ and I have no idea,” Herman said. “I don’t think that there will be one application. This is the kind of technology that will be behind a screen. It is always behind a screen, and people in this sense will not see it right away.”
In 2006, Berners-Lee coauthored (with Nigel Shadbolt and Wendy Hall) an article titled “The Semantic Web Revisited”, which noted that some commentators claim his vision of a Web of open, linked data has “failed to deliver”. However, in the journal IEEE Intelligent Systems, the three asserted that they “see the use of ontologies in the e-science community presaging ultimate success for the Semantic Web—just as the use of HTTP within the CERN particle physics community led to the revolutionary success of the original Web”.
At the TED2009 conference on February 4, Berners-Lee asked attendees to contribute to the Semantic Web effort by putting their data on the Web. He encouraged governments not to hold on to information until they’ve “made a beautiful Web site”, but to release “unadulterated data” on-line. If more scientists put linked data on the Web, Berners-Lee suggested, this could help solve some of humanity’s greatest challenges, such as finding cures for cancer and Alzheimer’s disease.
“The power of being able to ask those questions of a scientist—questions which actually bridge across different disciplines—is really a complete sea change,” he told an audience in Long Beach, California. “It’s very, very important. Scientists are totally stymied at the moment. The power of the data that other scientists have collected is locked up, and we need to get it unlocked so we can tackle those huge problems.”
Nando de Freitas, an associate professor of computer science and cognitive systems at the University of British Columbia, argues that if we rely on humans to structure data using ontologies—formal representations of concepts—it “will take forever” to build the Semantic Web. The artificial-intelligence researcher told the Straight that two “probably more fruitful ways” to organize the Web’s information are folksonomies (tag-based systems) and intelligent natural-language processing (which allows the automatic conversion of human language into structured data).
“These things will automatically find interesting correlations,” de Freitas said by phone from his office at UBC’s Point Grey campus. “Like, ”˜Did you know that 90 percent of people with diabetes tend to eat this kind of stuff?’ Being able to have these intelligent algorithms we’ll—through natural-language processing and learning—be able to mine the Web and extract all this important knowledge. It’s just going to make the knowledge transparent to people.”
But the availability of data and the ease of finding correlations on the Semantic Web will raise serious privacy concerns, according to de Freitas. He warns that everything a person does on the Internet, including doing Web searches, sending e-mail, and using social-networking sites, involves giving data away—data that may be made public in the future. For instance, de Freitas said, the Semantic Web will make it easy for employers and insurance companies to calculate someone’s predisposition to a disease.
“The technology will exist,” he said. “There’s nothing we can do about it. But then what you hope is that governments and the social systems will look after individuals, will do the right thing.”
Jim Pick, founder of the RDF Vancouver Semantic Web User Group, describes the Vancouver base on Freebase. Stephen Hui video.
The inaugural meeting of RDF Vancouver took place in October, and was attended by eight people. At the informal group’s second event in November, Pick delivered a talk about Freebase, while two other speakers introduced the Semantic Web and explained the use of microformats to encode semantics within HTML attributes. Aside from the three presenters, only one person attended the meeting.
Pick conceded that it’s difficult to explain the Semantic Web to those who aren’t already conversant with its basic technologies. Still, he plans to organize a third RDF Vancouver event sometime in the coming months.
“People are having trouble wrapping their heads around it,” Pick said. “But people learn over time. They learn a little bit, a little bit, a little bit. Pretty soon, there’ll be enough people that understand all of the prerequisites to be able to understand what this stuff is about.”
You can follow Stephen Hui on Twitter at twitter.com/stephenhui.