Because of what I do – specialising in international and multilingual search that is – for years people have been explaining to me the importance of the ‘semantic web’ and that I should be getting involved. Many thought it had something to do with linguistics (including me). Essentially I saw ‘semantic web’ as something to do with contextual meaning and have struggled to get my head around the idea of computers and algorithms figuring out abstract meaning by ‘understanding’ semantically what was meant by the words on the page.
So for instance, a page about ‘villas’ – could either be villas for sale, to rent, to visit, a guidebook about villas or even just an address list. So for me, semantic web techniques would need to figure out those differences by being really really clever.
That’s right Andy – you’re actually spot on – except for the figuring out part. The bit that’s been confusing me is that we’re not going to let search engines figure it out – well we’re not going to leave to them to guess would be a more correct statement – we’re going to tell them what the meaning of the page is by adding to the coding. Simple as that.
Congratulations to Sean Golliher, Jamie Taylor, Martin Hepp, Jay Myers and Nick Cox for putting together one of the most illuminating panels I’ve seen at SES or any conference for some considerable time. Well planned, well prepared, gripping to the end despite the relatively dry and potentially dull subject. The panel was anything but dull.
As I explained to Sean Golliher after the session, a lot of the discussion during the panel was about the use of vocabularies that describe data to enable machines to better use and position the data for retrieval at query time. But the semantic web industry needs to put its own act in order because it has been using confusing languages or vocabulary itself. What are RDFa, Microformats or Semantic web for instance. The concept is actually simple; the industry has undersold itself to potential users by using too many vocabularies and confusing terms and making it sound like you only had a small chance of understanding what it was all about unless you had a PHD and lived in academia.
According to Sean, this will be corrected by the adoption of the term HTML5. I’m not sure about that because that’s yet another way of describing the same thing. Our of all the descriptions and terms used in the session the one that turned my light on was “rich meta data”. Yes metatags are coming back and are going to be even more interesting than they were before.
Yahoo already supports a lot of the different ‘vocabularies’ of the rich meta data to present events, movies, products and many other categories. Google and Bing are both followers on this topic. The concept is one of labelling. The idea is that each element that needs to be found on the web will have labels that are chosen from a popular vocabulary – such as GoodRelations, Martin Hepp’s creation. And by labelling with rich meta data what that content is, the search engines will be better able to exploit it.
The vocabularies are open source – and in theory, according Nick Cox – search engines will work with all varieties. My takeaway was that that won’t in fact be the case. I see search engine’s choosing the popular vocabularies – such as GoodRelations – so there will be winners and users. However, it would be theoretically possible for a new ‘vocabularly’ to come along and gain sufficient traction to achieve the popularity.
What does this mean for SEO? We all need to start adding ‘rich meta data’ to describe what we are showing on our web pages. Events need to be ‘tagged’ as events, products need to be ‘tagged’ as products and so on. Then the search engines will now what to do. BestBuy.com is reporting a 30% improvement in organic visitors to its site as a result of introducing rich meta data.
Another takeaway? There is a BIG job for SEOs to do in terms of helping clients achieve this in order to help the search engines. If they achieve this, maybe the search engines will give SEOs more credit!
Andy Atkins-Kruger
Latest posts by Andy Atkins-Kruger (see all)
- Launching our new concept – Webcertain In-house! - July 26, 2019
- Yes, the robots are here and they’re running Google Ads! - April 10, 2019
- Be prepared: A personal message from Webcertain’s CEO - May 15, 2018
GoodRelations RDFa rich mark-up seems to show up in Google snippets now:
http://www.ebusiness-unibw.org/wiki/GoodRelationsInGoogle
The key to understand the “Semantic Web” is to regard it as a technical innovation that *reduces” the ambiguity, heterogeneity of representation, and contextual dependendies of data. It does not eliminate all those problems, it will just reduce the computational costs of respective problems.
Martin
Andy,
I totally agree with you, even small markup could/would improve the results quality a lot. I believe in metadata, I’m working and generating metadata everyday (manually and automatically.) and I’m seeing great things happening with that. I just don’t think that it resolves everything and this is one of the main reason why people are confused by the semantic web 😉
There will be no magic.
“we’re going to tell them what the meaning of the page is by adding to the coding. Simple as that.”
I absolutely agree, except that it is not simple. Yes, machine readible information could resolve lot’s of problems, are already resolving a lot and will resolve more in the future. But you can’t markup everything because language is complex, ambiguous and context is a dynamic thing. At one point, too much markup will end up with new kind of ambiguity problems. There is not always only one path to an answer. Humans are dealing with this everyday because they know how to discard noise, connecting the right dots, etc. But this is analysis before and after brain markup.
Context is not just in the page content, context is also outside, in the head of the user, in the application, etc. The idea of a Semantic Web without any analysis is very utopic for me. The web is already semantic and will continue to be by combining both approach depending on the tasks and available technologies. Technological advancements are going to promote one approach instead of the other over time, but I think that hybrid is the key. For example: Google goggles versus Google Favorite Places with unique QR code window decal. Precision versus recall.
Finally, about “computers and algorithms figuring out abstract meaning by ‘understanding’ semantically what was meant by the words on the page”. There is no magic, you can’t expect computers beeing able to understand everything on their own, but they sure could help a lot to markup unstructured data. The entire web is not going to be markup manually. We need tools to help us and some of them are text mining/analysis tools. Analysis and markup and analysis and markup and … human-computer-human-computer- …
Martin – thanks for joining our discussions! alltoute – you are expressing exactly the views I had before joining the session in Chicago. What I realised was that even if things were only categorised as “event” or “product” that would help the quality of the results which search engines would be able to present and that the ‘semantic web’ is not about trying to explain the context of every single expression!
There is great potential for microformats to solve the problems you describe – tagging things as events or products is already done, for instance – but unfortunately their development is in the hands of an unelected and unaccountable cabal which stifles such advancement (there’s an example on my blog).
Dear Andy:
Thanks for your nice comments! The presentation I gave on the panel is now also available as a video (15 min explaining it all :-)) and via slideshare:
Video: http://vimeo.com/8065914
Slides: http://tr.im/ses09hepp
So if you are interested in using the GoodRelations vocabulary to improve the visibility of your products or services on the Web, that should be a good starting point.
All further info on GoodRelations is at
http://purl.org/goodrelations/
Best wishes
Martin Hepp
PS: The term “rich meta-data” that I am using to explain RDFa and GoodRelations originally comes from Kingsley Idehen, http://twitter.com/kidehen .
yes Vir err, Andy the semantic web does little or nothing to extend search. I have developed a system which does. It allows the user to build ontologies. The system uses the ontology as a search entity. It also allows the user to determine which sites they want to search. Thus it becomes a multi dimensional search. This system also manages all of the searches it does so that the population at large can leverage previous searches. The site is http://www.alexlib.info
Thank you very much for the rehab of what ‘semantic’ in the syntagm ‘semantic web’ refers to, that is to say it has nothing to do with linguistics or semantic technologies developped in NLP R&D labs.
I am a linguist engineer, I do work in a R&D NLP department, and am continuously facing problems with this abuse of language while trying to explain the ‘real’ features of a semantic analyzer. So it is very pleasant to see that SEOers are willing to get things clearer.
Leave the ‘semantics’ to the linguists, and continue explaining that rich metadata aims at building a ‘World Wise Web’ 🙂