Skip navigation

Category Archives: semanticweb

In the latest Nodalities podcast, Paul Miller talks to Dame Wendy Hall, Professor of Computer Science at the University of Southampton and a founding Director of the Web Science Research Initiative.

The Web Science Research Initiative (WSRI) is a joint venture between MIT and the University of Southampton to teach the literal academic ‘science’ of the web.

Founded in 2006 alongside Sir Tim Berners-Lee, Professor Nigel Shadbolt and Daniel J. Weitznerby, Dame Wendy talks with Paul about some of the thinking she and Sir Tim shared that eventually resulted in the conception of the project.

They recognised there are many determining factors outside of pure technology that shape the evolution of the Web. That as a human construct, there is a need for new ways of thinking about the Web, that we need to understand as much about how and what effects humans have on its evolution as much as how the Web effects our society.

The Web is one of the most (if not the most) transformative applications in the history of computing and communications. It’s changed how we teach, publish, govern and do business, studied in anthropology, sociology, psychology, economics – needless to say a lengthy list – and the Web Science is to consider the Web in all these fields (and more) not only as a computer science.

It’s also to intended to anticipate future developments, forsee the good and bad consequences of its change.

They’ve been working with the Web for a long time – since the earliest days of hypertext and hypermedia and with such experience have recognised the cyclical nature of Web trends, that every five years or so sees great advances in the Web’s evolution. Think Web 2.0 for the latest phase – the next (apparently) being Web 3.0 (or the ‘Data Web’ or the ‘Web of Linked Data’) or the Semantic Web – whatever buzzword you want to ply it with. The WSRI, in part, stands to find out what’s likely to come, to inform us and our decisions.

Of course, it was also in part founded to evangelise the Semantic Web. The Semantic Web was and is still Berners-Lee’s original vision for the Web that he had back as early as WWW94 (though ‘unnamed’). These small phases add up to the larger realisation of this original dream – and with that, Dame Wendy discusses her thoughts on how this will continue in its future. She talks about the WSRI’s efforts to create a wide network of key Web Science labs across the globe and their work with curriculum developers and government agencies, also of their training of university teachers and educators to inject Web science into higher education as recognised academia.

Paul Miller also shares some thoughts on his ZDNet blog – at first he was sceptical, suggesting that we really don’t need yet another academic subject just to ‘permit’ us to study the Web, that we’re perfectly well served by enough areas of study (those listed above) that already seek to understand both the Web and its impact upon all of us. But he too, can’t deny that Web Science as a ‘label’ can be beneficial to the Semantic cause in both the evangelistic sense but also by providing ‘institutional credibility’ to their area of research.

I collected a number of Web Science and the WSRI related bookmarks during my thesis research, for further reading:

Not to be outdone by Google’s efforts this week, have also expanded their search technology to return specific ‘direct’ answers to searches, where possible, by means of semantic language processing.

Fortunately far more public than Google, announced on their blog yesterday that they’ve been developing proprietry semantic analysis technology since October of last year in their efforts of advancing the next generation of search tools.

DADS(SM) (Direct Answers from Databases), DAFS(SM) (Direct Answers from Search), and AnswerFarm(SM) technologies, which are breaking new ground in the areas of semantic, web text, and answer farm search technologies. Specifically, the increasing availability of structured data in the form of databases and XML feeds has fueled advances in our proprietary DADS technology. With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time. Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query.

The results aren’t returned as explicitly as Google’s, mainly due to the amount of adverts above the page fold, but they work. Try searching for ‘Football on TV this weekend‘ or ‘Movies on TV now‘ and you’ll see the results in accordingly custom-formatted sections.

Unfortunately the results are still only returned in HTML, so again – the term ‘semantics’ here describes the form of processing are doing beind the scenes rather than depicting this as their first outright foray in to the Semantic Web (capital S).

This though is proprietary technology and presumably it’ll stay that way. So I’m unsure whether to celebrate their realisation of the importance of semantics (in search at least) or in realising their more ‘closed source’ ethos, consider this to be almost against the idea of the Semantic Web – portability, sharing, transparency – as they hold these advances close to their chest in order to gain an edge over their competitors, causing others in the future to understandably do too.

Quite out of the blue and without notification of it’s launch as far as I’ve been able to find, Google seem to be exposing semantic data in their global search results.

Try searching for ‘What is the capital city of England?’ or ‘Who is Bill Clinton’s wife?’ and you’ll see sourced direct answers returned at the top of your search results.

It’s hard to tell if these direct results are actually semantic expressions or just presented to appear that way – in the expected Semantic triple of subject-predicate-object. The list of sources definitely don’t structure their information with semantic expression, so perhaps quite an amount of logic and natural language processing is being done on Google’s part to process non- or semi-structured data.

I’ve tried to find out before what Google have been up to concerning semantic technology but found little. The coverage over at ReadWriteWeb reports that neither they or their Semantic Web contacts had heard or seen anything about this before, but the community feedback suggests there’s been variations of this for some time – including a three year old Google program called ‘Direct Answers’ – but none of the coverage of that program offers the kind of examples we’re seeing here.

Marshall Kirkpatrick points to a blog post of Matt Cutts, Google search algorithm engineer, but it seems to be a dead link now. Though trailing through Google’s caches, it seems to find him quote:

Many of the data points are being pulled in from the structured part of Wikipedia entries, which is interesting. Other sources are wide ranging, from a license plate website to Jason Calacanis’s Mahalo.

If Google are constructing semantic data from semi-structured or non-structured source data, then there’s undoubtedly some quite powerful semantic processing technology in place. I highly doubt this will be the final product of their development with such technologies, simply the first we’ve noticed – most likely why it’s slipped under most people’s notice.

The inaccuracy is also an issue. Try searching ‘Who is Bob Dylan’s wife?’ – and you’ll see Sara Lownds (his ex-wife) returned. Seeing these direct answers reminds me of True Knowledge.

Even their example questions though, are far more complex – for example, ‘Who owns Universal Studios?‘, ‘Is the Chrysler building taller than the Eiffel Tower?‘, ‘What is the half life of plutonium 239?‘.

More importantly, if it doesn’t know that answer, it won’t ‘guess’ – it’ll tell you it doesn’t know and ask you to deconstruct your query in order to expand it’s knowledge base so it can find the answer later.

As Marshall says, this is all speculation based on limited observation – and low visibility of Google’s development. Hopefully there’ll be more soon!

For some time I’ve been meaning to write about Facebook Connect and Google Friend Connect, two potentially huge social web developments that have been gathering speed and popularity over the past few weeks.

Both services are very similar. Essentially, each functions to simplify the connection between social and non-social websites by offering connectivity (and some functionality) of each’s proprietary central platform on 3rd party websites.

The idea is that a user can ‘Connect’ with whichever service the site has employed and find users with whom they’ve already connected with on the other services – rather than creating a new account, profile, repeat the steps of entering information to then find the friends you’ve already added over and over again with every other social-enabled web app you’ve used previously.

I first saw Facebook Connect in August with their demonstration service  The Run Around. There, you could ‘Connect with Facebook’ to initially join the site and immediately see who else (of your Facebook friends) has joined too. This is all outside of the Facebook chrome, not on the Facebook domain. What’s more, as well as interacting with the linked data pulled from Facebook, the website could push data back in. The actual site intended to track your running routes and times, so when you submitted a new ‘run’, it would publish to your live newsfeed on your Facebook profile.

The idea is simple, the effect could be game-changing. It’s been met with both cautious optimism and healthy skepticism.

If this becomes as massive as it could be, we could see a single sign-in that abolishes the need to register and re-register for every newly launched social app. We’re already experiencing social fatigue within that process as consumers and as developers, we’re having to build whole registration and authentication systems from scratch every time. Plugging into a platform like this – that we assume to be secure and trusted – could offer a means to develop and deploy services much easier and faster.

But can we trust – or do we want to trust – a propriety platform to do this for us? The idea of a single social graph isn’t new, but I don’t know if I want Facebook to offer it. I’d much prefer FOAF 🙂 – but how many people outside of the development world have heard of it?

I feel I need to write another post entirely about OpenID, OpenSocial and OAuth entirely – services that can’t go unmentioned here – but Marshall Kirkpatrick at ReadWriteWeb wrote a direct comparison of Facebook Connect and OpenID that asks some interesting questions as well as offering a good introduction to the open source services anyway. Although he started by discussing as to which of the two should website owners use to authenticate and learn about their users, the community expanded his initial mindmap to cover pretty much every angle in the comparison – and it’s very detailed, see it here.

He also asks, even if it doesn’t become the dominant identifier online, will Facebook’s challenge breathe new life into the movement for open source, standards based, federated user identity?

Then there’s Google Friend Connect – launched in public beta the same day as Facebook Connect went public for 3rd party sites. This does use a blend of the open source services, but although integrating the open standards might suggest a weightier development process, the first thing to notice is a far less developer-oriented implementation than Facebook Connect.

Using Facebook Connect is down to the site creator to construct and integrate an interface to facilitate the connection – Google Friend Connect is widgety, with pretty much zero coding other than cutting and pasting directed portions. Similarly with the functionality, Google offer widgets for simple commenting on pages, media sharing, or rating content. With Facebook Connect you have to write that yourself – although admittedly, you then have full reign on design and interaction.

There’s a demonstration video on the Google blog’s announcement of the beta launch.

It’s not like this is just a two-horse race though, or that someone won’t work out a way two use both anyway. Google and Facebook are in direct competition, but attempting to open the Web in this way extends far beyond them.

What I find interesting is the interoperability. These technologies aren’t semantic, but do push the exposure and interoperation on a user’s social graph with ideas akin to the Semantic Web – utilising data to extend single-site online identities and networking social connections.

They’re not Semantic Web efforts but they have similar aims. Friend Connect’s goal is an open social web, the Semantic Web is – quite simply 😉 – a fully understood, completely open web, not only it’s social domain.

Just lately I’ve been really interested in finding out Google’s position on semantic technologies and their view on the Semantic Web.

I’d been asked before whether Google were making any efforts in developing semantic technology, but I couldn’t really say. Then I attended the Googleworld debate, at London’s ICA, but couldn’t really find the chance to pose any technical questions.

In an attempt to satisfy my curiosity – and anyway, to investigate something I believe to be of interest that, as far as I can find, hasn’t received any real attention to date – I wrote an open letter, of sorts, forwarded to Google and Semantic Web researchers I’ve found connected to Google, simply asking:

What’s the deal?

Dear Google,

My name is Marc Hibbins, I write a blog I’m sure you wouldn’t have read, I tend to cover new web technologies, online trends, my own development issues, but I’m also extremely interested in the Semantic Web.

Over the past couple of weeks I’ve become increasing interested in finding out Google’s position, or just their ideas even, on the Semantic Web and semantic technologies. I recently wrote about the increase in tech blogs covering the subject over the past couple of weeks, and I’ve been asked a few times – what’s Google up to?

I also recently attended a debate called ‘Googleworld’ – it covered, generally, the past ten years of Google and what’s to come. I wrote about it, and the chair of the meeting replied that he too, is unaware of Google’s position.

Could you shed any light on the topic? Having had a thorough look around online, I’ve found next to nothing. I’m extremely intrigued to find out if Google have any plans with semantic technology – or even if there’s any in place already that just might not be so visible?

If you’ve no plans, do you have any comments? Do you think it’ll even ever happen?

Kindest regards,

Marc Hibbins

I wasn’t sure what kind of response I’d get, if any at all. Or if anybody I did get in touch would be wary of offering any insight that might be misinterpreted as any ‘official’ position.

To my surprise, my first response came from executive ‘Google Fellow’ Jeff Dean. He works in the Systems Infrastructure Group (crawling, indexing and query systems – full bio here), but he couldn’t initially offer any real strong thoughts on the issue. He did say however, that he wasn’t sure if Google even had any real position on the subject at all. If nothing, at least this confirmed that my lack of findings wasn’t down to only an absence of research published externally from Google – or poor investigative work on my part.

My second reply was from Stefan Decker, professor at the National University of Ireland, Galway and director of the Digital Enterprise Research Institute, an internationally recognised institute in Semantic Web and web science research. He co-presented a very interesting Google Tech Talk last year, and worked in Stanford at the same group as Sergey Brin and Larry Page.

He said, very explicitly, that:

In short: The Google management does not believe in meta-data.

Craig Silverstein is on record several times negatively of talking about the topic, as well as Sergey Brin. It is very clear that they are not proactive – a serious mistake from my point of view.

Interesting. I got in touch with his co-speakers, Eyal Oren and Sebastian Kruk. Both said they have contacts at Google still, but neither are aware of any public developments.

Eyal pointed me toward Sindice, a semantic search engine and index as perhaps (though only speculatively – as likely any search engine), might one day receive interest from Google. Perhaps to incorporate their infrastructure for RDF and semantic data consumption. But as he said, there’s absolutely no evidence of it right now.

Sebastian on the other hand described the lack of address specifically as:

[Their] ‘anti-semantic’ approach.

An increasing trend he’s recognised. Suggesting an almost concious movement against any such development. He also expressed his disappointment at the very low turn out at the Tech Talk, that literally only one attendee showed any real interest.

My final response was initially the most exciting – from Ramanathan V. Guha, who leads development of Google Custom Search. He said he’d be happy to comment on what’s going on, although could only offer his own personal opinion and nothing official – but I’ve not received any correspondence from him since.

All in all, at least I know I’ve not overlooked anything major. Fingers crossed I get a response back from Guha, but otherwise I guess I’m left keeping a close eye out for any other developments.

Picked up a spare ticket to the ‘Googleworld’ debate at the the Institute of Contemporary Arts yesterday evening. Bill Thompson chaired a pleasantly ‘warm’ chat between New York Times columnist Randall Stross and Independent columnist Andrew Keen.

Initially I wasn’t sure what to expect, billed as a look back over the past ten years of Google and forward to whatever might come next, it wasn’t as technically oriented as I’d hoped it’d be. It more focused on social and philanthropical interest – as a well as being a bit of a sell for both their new books.

I would have liked the opportunity to open up discussion to Semantic technologies, perhaps to pose the question, What are Google’s intentions? – if they even have any – of introducing any Semantic Web technologies to their platform. It’s something I was recently asked about after writing my last post, but it wasn’t really the right crowd.

In other news, Semantic start-up Twine goes public today. Founder Nova Spivack, posted some interesting stats yesterday about user engagement on the site over the last eight months during it’s semi-public, semi-beta phase.

It seems their users queue up some lengthy sessions on the site, longer even, he now predicts, than Delicious and MySpace.

It’s been hard not to notice the influx of tech blog posts over the past week or so covering all things Semantic Web. I’ve not had a decent chance to talk about the Semantic Web since I wrote my thesis last year, so I thought I’d take this as a good opportunity to do so and collect some of the best of those links in the process.

It’s staggering how much the Semantic idea has grown since I wrote my dissertation. In it I discussed mainly the social, technical and theoretical concepts of the Semantic Web – when I wrote it there was little else around to write about. There were no public start-ups or beta platforms – the community was small and extended little outside a designated group at the W3C and a handful of tech bloggers – their efforts too, were mostly spent on translating Tim Berners-Lee’s very technical, comparatively abstract, web dream across to the mainstream reader.

At that point the W3C were rapidly beginning to develop technologies like SPARQL and OWL, while others were under varied debate, such as RDF and Microformats. Not having a background in hard computer sciences, my interest was more in exploring the ubiquity, connectivity, of a Semantic Web, investigating what forces we would need to drive the paradigm – how our ideas of the Web were already changing with coming to grips with terms like (the then brand-new) ‘Web 2.0’ and embracing the Social Web phenomenon.

Nova Spivack is a leading voice, CEO of Radar Networks, founder of Semantic start-up Twine – an information storage and knowledge sharing service. He writes frequently at his Minding the Planet blog, full of optimism and colourful metaphor. He recently gave a talk at the Bonnier GRID ’08 conference in Stockholm – basically ‘the TED conference of Scandinavia’, about what he terms the future of the Web, ‘the Semantic Web and the Global Brain‘. Whilst worrying me by using many of the buzzing memes and science fiction references that I think actually harm those trying to invest in his optimism and adopt the Semantic idea – an all-knowing, understanding, artificially intelligent Web just sounds too good – it’s exciting how immediate he predicts the true impact of the Semantic Web impending.

Then it’s even more so to read popular ‘mainstream’ – or at least the more general – tech blogs giving more and more coverage to Semantic web, that the technology and abstract concepts are becoming commonplace and frequently in normal interest.

ReadWriteWeb, ever popular for polls and predictions favour Semantic technologies in various recent top ten-style looks into future web trends (and more here), but too, see the oncoming breakthrough as more immediate. Richard MacManus puts Semantic apps at number one on his hit-list of Web Predictions for 2008.

But while predictions continue to be made, the true killer-app remains allusive. Some do extremely well though. Freebase went public around the end of 2007, essentially a semantic Wikipedia, hasn’t gained the popularity I though it would’ve by now. True Knowledge natural language search is still in beta, though the platform I’ve tested so far is as impressive as their promotional video.

But then kinda out of the blue for me came two Yahoo! developments. SearchMonkey, not exclusively a Semantic Web app, is a search engine that promotes semantic data standards by making use of Microformats and embedded RDF as searchable metadata – definitely read the FAQ – and at the recent Web 3.0 Conference and Expo, announced the consumer release of Yahoo! Open Strategy (Y!OS), ‘blowin’ the doors wide open’ to the ‘open source, hacker attitude’ – basically, ‘rewiring’ Yahoo! to make all data service-wide openly available to developers and consumers alike, granting the opportunity for complete data portability – with the intent to extend even further in the future.

I concluded my thesis in suggesting a new drive would be necessary for consumers Web-wide to understand and willingly adopt the Semantic Web change. At that point the Web Science Research Initiative (WSRI) had just been launched, a joint venture between MIT and the University of Southampton to teach the literal academic science of the web – it was unclear whether this would be enough. Molly Holzschlag wrote a recent article at A List Apart, believing the ‘Ivory tower’ perception of the W3C to be discouraging to everyone independent of the organisation – that they’ve no real outreach. I agree, but think efforts like the newly founded World Wide Web Foundation are a direct result of their awareness of that, but they will fulfil their principle objective and speed the technological advancement faster than she may expect – I hope they do, at least.

For even more reading (and listening) material, subscribe to the Nodalities blog and podcasts. There’s a good interview with David Provost I recommend, discussing many of the things I’ve spoken about here, but also his recently published report on the Semantic Web industry as a whole.

It’s called, ‘On the cusp’. 🙂