UBC computer scientist Nando de Freitas discusses the Semantic Web

    1 of 1 2 of 1

      Nando de Freitas argues that there’s more than one way to build the Semantic Web. He also says the Semantic Web will raise serious privacy concerns. De Freitas is an associate professor of computer science and cognitive systems at the University of British Columbia, as well as an associate member of the school’s department of statistics. The computer scientist also works on Worio, a semantic search engine developed by UBC students that uses tags to determine the subjects of Web pages.

      The Georgia Straight reached de Freitas by phone in January at his office on UBC’s Point Grey campus. What follows are some of his comments on various aspects of the Semantic Web.

      On the need for the Semantic Web:

      When you’re confronted with the Web and you find so much data and the data’s very unstructured, it’s easy for humans to look at the page and you can kind of make sense of it. But you will never—none of us will ever—be able to know what is everything that is out there on the Web. We’re missing a lot of information as a result, because we can’t even think in our heads the topic to be able to enter it, say, in Google. So, we’ll eventually need intelligent techniques to organize this data in a better way that becomes accessible to all of us...

      A big goal of the Semantic Web was to create this Web of data, where everything would be nicely structured and where you would be able to enter medical data and financial data. Basically, all the data of the planet would be accessible through nicely organized structures that would make it possible for everyone to have a better experience. But the problem is that creating those structures is very hard. So, we need to use the stuff that I’ve researched—machine learning techniques and so on—to automatically come up with those structures.

      On alternatives to using ontologies to build the Semantic Web:

      There’s two ways to make the Semantic Web a reality. One is to get people to agree on the ontologies...

      The Semantic Web is not becoming a reality as people expected it, because it’s hard to get people to do things one way, especially when the task is massive and there is seemingly no immediate reward for doing so. So, that’s what’s motivated a group of researchers to take a different approach, which is to try to create a Semantic Web not through ontologies but through things that already exist—the so-called folksonomies, which are tag-based systems, or by trying to extract information automatically from Web pages and being able to organize that information. That’s more or less what Google is also trying to do.

      On the role of intelligent natural-language processing:

      Natural-language processing is actually very hard to accomplish computationally. Even Google struggles, and they have the biggest computing power on the plant. So, there’s still challenges. I think what’s becoming more evident is this need for intelligent algorithms to come on board and help with this. Essentially, we can see Web pages and we humans can understand them, but the machines cannot understand those Web pages. The machines can’t actually understand what is being said semantically. They know that these words are there, but they don’t know what the page is really about. What is the thing that’s really being discussed? What are the topics of discussion?...

      Being to understand content and the semantics through intelligent techniques—intelligent natural-language processing and machine learning—being able to learn what’s going on, to me is the missing piece to create this Web.

      On using the Semantic Web to discover new information:

      These things will automatically find interesting correlations. Like, “Did you know that 90 percent of people with diabetes tend to eat this kind of stuff?” Being able to have these intelligent algorithms we’ll—through natural-language processing and learning—be able to mine the Web and extract all this important knowledge. It’s just going to make the knowledge transparent to people. Things we don’t know now.

      On privacy and the Web:

      I’m concerned that people are not being properly informed. I’m concerned that people are under this impression that when they type things that they’re private—and they’re not. I’m concerned that people are misled into this warm feeling that what they’re doing is private. Your searches in a search engine—all of these things are very public. Companies protect you, but the only thing that’s protecting you is the company. If the company goes bust, or the company’s bought by another company that has a different policy, or if the company has to go to court and the government rules against it, your data becomes automatically public. So, people have to be aware of this. I think people should be informed that their data is public, so at least they know that they’re giving their data away.

      On privacy and the Semantic Web:

      It poses challenges with medical data. That’s kind of important, because it affects things like insurance and all of that. People will very easily calculate your predisposition to develop some sort of disease, and based on that they can adjust your premiums. Now, that is something that governments will have to figure out in the future—how to create a system that is fair enough, though people already know that they have a higher chance of developing some sort of cancer or so. Privacy’s definitely one of the concerns...

      The technology will exist. There’s nothing we can do about it. But then what you hope is that governments and the social systems will look after individuals, will do the right thing.

      You can follow Stephen Hui on Twitter at twitter.com/stephenhui.