Andy Atkins-Kruger

Linguist? Here’s Why the Semantic Web Confuses You – from SES Chicago

Because of what I do – specialising in international and multilingual search that is – for years people have been explaining to me the importance of the ‘semantic web’ and that I should be getting involved. Many thought it had something to do with linguistics (including me). Essentially I saw ‘semantic web’ as something to do with contextual meaning and have struggled to get my head around the idea of computers and algorithms figuring out abstract meaning by ‘understanding’ semantically what was meant by the words on the page.

So for instance, a page about ‘villas’ – could either be villas for sale, to rent, to visit, a guidebook about villas or even just an address list. So for me, semantic web techniques would need to figure out those differences by being really really clever.

That’s right Andy – you’re actually spot on – except for the figuring out part. The bit that’s been confusing me is that we’re not going to let search engines figure it out – well we’re not going to leave to them to guess would be a more correct statement – we’re going to tell them what the meaning of the page is by adding to the coding. Simple as that.

Congratulations to Sean Golliher, Jamie Taylor, Martin Hepp, Jay Myers and Nick Cox for putting together one of the most illuminating panels I’ve seen at SES or any conference for some considerable time. Well planned, well prepared, gripping to the end despite the relatively dry and potentially dull subject. The panel was anything but dull.

As I explained to Sean Golliher after the session, a lot of the discussion during the panel was about the use of vocabularies that describe data to enable machines to better use and position the data for retrieval at query time. But the semantic web industry needs to put its own act in order because it has been using confusing languages or vocabulary itself. What are RDFa, Microformats or Semantic web for instance. The concept is actually simple; the industry has undersold itself to potential users by using too many vocabularies and confusing terms and making it sound like you only had a small chance of understanding what it was all about unless you had a PHD and lived in academia.

According to Sean, this will be corrected by the adoption of the term HTML5. I’m not sure about that because that’s yet another way of describing the same thing. Our of all the descriptions and terms used in the session the one that turned my light on was “rich meta data”. Yes metatags are coming back and are going to be even more interesting than they were before.

Yahoo already supports a lot of the different ‘vocabularies’ of the rich meta data to present events, movies, products and many other categories. Google and Bing are both followers on this topic. The concept is one of labelling. The idea is that each element that needs to be found on the web will have labels that are chosen from a popular vocabulary – such as GoodRelations, Martin Hepp’s creation. And by labelling with rich meta data what that content is, the search engines will be better able to exploit it.

The vocabularies are open source – and in theory, according Nick Cox – search engines will work with all varieties. My takeaway was that that won’t in fact be the case. I see search engine’s choosing the popular vocabularies – such as GoodRelations – so there will be winners and users. However, it would be theoretically possible for a new ‘vocabularly’ to come along and gain sufficient traction to achieve the popularity.

What does this mean for SEO? We all need to start adding ‘rich meta data’ to describe what we are showing on our web pages. Events need to be ‘tagged’ as events, products need to be ‘tagged’ as products and so on. Then the search engines will now what to do. BestBuy.com is reporting a 30% improvement in organic visitors to its site as a result of introducing rich meta data.

Another takeaway? There is a BIG job for SEOs to do in terms of helping clients achieve this in order to help the search engines. If they achieve this, maybe the search engines will give SEOs more credit!

Andy Atkins-Kruger
Andy is the CEO of Webcertain. He is a trained linguist with 20 years experience in international marketing, having helped major brand leaders with their advertising and public relations projects on five continents. Webcertain has been operating multilingual search marketing campaigns for over 15 years and is one of few agencies which only deal with international campaigns; the company doesn't deal in single market projects. Andy speaks regularly at conferences around the world, writes for the Multinational Search column of SearchEngineLand.com and is the Managing Editor of the Multilingual Search blog.

14 Responses to Linguist? Here’s Why the Semantic Web Confuses You – from SES Chicago

  1. Pingback: Green Electricity is Here…why Aren’t You Going Green ? | Echelon-Us

  2. Pingback: Is social media screwing up your search results?

  3. Martin Hepp says:

    GoodRelations RDFa rich mark-up seems to show up in Google snippets now:

    http://www.ebusiness-unibw.org/wiki/GoodRelationsInGoogle

  4. Pingback: Top Stories at SES Chicago 2009 on Day 3 | Brisbane Search Engine Optimization » SEO « Training - Advice - Tools - Resources

  5. Martin Hepp says:

    The key to understand the “Semantic Web” is to regard it as a technical innovation that *reduces” the ambiguity, heterogeneity of representation, and contextual dependendies of data. It does not eliminate all those problems, it will just reduce the computational costs of respective problems.

    Martin

  6. alltoute says:

    Andy,

    I totally agree with you, even small markup could/would improve the results quality a lot. I believe in metadata, I’m working and generating metadata everyday (manually and automatically.) and I’m seeing great things happening with that. I just don’t think that it resolves everything and this is one of the main reason why people are confused by the semantic web ;-)

  7. alltoute says:

    There will be no magic.

    “we’re going to tell them what the meaning of the page is by adding to the coding. Simple as that.”

    I absolutely agree, except that it is not simple. Yes, machine readible information could resolve lot’s of problems, are already resolving a lot and will resolve more in the future. But you can’t markup everything because language is complex, ambiguous and context is a dynamic thing. At one point, too much markup will end up with new kind of ambiguity problems. There is not always only one path to an answer. Humans are dealing with this everyday because they know how to discard noise, connecting the right dots, etc. But this is analysis before and after brain markup.

    Context is not just in the page content, context is also outside, in the head of the user, in the application, etc. The idea of a Semantic Web without any analysis is very utopic for me. The web is already semantic and will continue to be by combining both approach depending on the tasks and available technologies. Technological advancements are going to promote one approach instead of the other over time, but I think that hybrid is the key. For example: Google goggles versus Google Favorite Places with unique QR code window decal. Precision versus recall.

    Finally, about “computers and algorithms figuring out abstract meaning by ‘understanding’ semantically what was meant by the words on the page”. There is no magic, you can’t expect computers beeing able to understand everything on their own, but they sure could help a lot to markup unstructured data. The entire web is not going to be markup manually. We need tools to help us and some of them are text mining/analysis tools. Analysis and markup and analysis and markup and … human-computer-human-computer- …

  8. Martin – thanks for joining our discussions! alltoute – you are expressing exactly the views I had before joining the session in Chicago. What I realised was that even if things were only categorised as “event” or “product” that would help the quality of the results which search engines would be able to present and that the ‘semantic web’ is not about trying to explain the context of every single expression!

  9. Pingback: Twitted by StevenForth

  10. Andy Mabbett says:

    There is great potential for microformats to solve the problems you describe – tagging things as events or products is already done, for instance – but unfortunately their development is in the hands of an unelected and unaccountable cabal which stifles such advancement (there’s an example on my blog).

  11. Martin Hepp says:

    Dear Andy:
    Thanks for your nice comments! The presentation I gave on the panel is now also available as a video (15 min explaining it all :-)) and via slideshare:

    Video: http://vimeo.com/8065914

    Slides: http://tr.im/ses09hepp

    So if you are interested in using the GoodRelations vocabulary to improve the visibility of your products or services on the Web, that should be a good starting point.

    All further info on GoodRelations is at
    http://purl.org/goodrelations/

    Best wishes

    Martin Hepp

    PS: The term “rich meta-data” that I am using to explain RDFa and GoodRelations originally comes from Kingsley Idehen, http://twitter.com/kidehen .

  12. Tom Folkes says:

    yes Vir err, Andy the semantic web does little or nothing to extend search. I have developed a system which does. It allows the user to build ontologies. The system uses the ontology as a search entity. It also allows the user to determine which sites they want to search. Thus it becomes a multi dimensional search. This system also manages all of the searches it does so that the population at large can leverage previous searches. The site is http://www.alexlib.info

  13. Marguerite says:

    Thank you very much for the rehab of what ‘semantic’ in the syntagm ‘semantic web’ refers to, that is to say it has nothing to do with linguistics or semantic technologies developped in NLP R&D labs.

    I am a linguist engineer, I do work in a R&D NLP department, and am continuously facing problems with this abuse of language while trying to explain the ‘real’ features of a semantic analyzer. So it is very pleasant to see that SEOers are willing to get things clearer.

    Leave the ‘semantics’ to the linguists, and continue explaining that rich metadata aims at building a ‘World Wise Web’ :)

  14. Pingback: Twitter Trackbacks for Linguist? Here’s Why the Semantic Web Confuses You - from SES Chicago [multilingual-search.com] on Topsy.com

Leave a Reply

Yandex.Metrica