You may have heard that a website should have two types of sitemaps – a sitemap page and an XML sitemap. The sitemap page is for visitors and the XML sitemap for search engines. The XML sitemap is indeed for search engines however the sitemap page is technically for both. For visitors this page may be useful as it contains links to all important pages on your site formatted into a simple list of links. On the other hand, it provides keyword-ed links to valuable pages on your site for search engines.
There are common guidelines for structuring sitemap pages and they are rather straightforward. What about structuring XML sitemaps – especially for larger multilingual or multi-regional sites? While there are certain rules to follow to ensure search engines can process your XML sitemaps correctly, it seems there are no common guidelines for structuring them.
There are two fundamental domain strategy approaches when it comes to global sites. There are sites which use ccTLDs and those which use a common domain to host all local sites. The strategy defines how you can approach structuring XML sitemaps.
For the ccTLD strategy you inevitably need a separate XML sitemap for each site (domain). This is because you generally can list URLs only from the domain on which the sitemap is hosted. Technically, there is a solution for cross-submitting URLs from different hosts but this feature is rarely used. You can read about it in the Sitemap Protocol.
For common generic domain setups there is a number of ways to structure your XML sitemaps. If individual country/language sites are hosted on a DOT COM in subfolders you can list URLs from all sites in one sitemap. This approach is best for small sites. For instance if there are 12 regional sites – each in a subfolder such as .com/de/ or .com/fr/ consisting of 50 pages – you can list all 600 URLs in one XML sitemap. It will be a simple setup.
For larger sites using a common domain setup it is typically best to keep URLs of each local site in a separate sitemap. So, using the above example, we would end up with 12 XML sitemaps. Why? If each regional site were to have 12,000 URLs to list, we would not be able to fit all 144,000 URLs in one sitemap because of the 50k limit. Sure, you could split into 3 sitemaps but to keep things well organised, it is preferred to have a separate sitemap for each.
The sitemap index
To tidy things up more, you would create another sitemap which lists the individual sitemaps. This is called a Sitemap Index and you can reference it in your robots.txt file. Otherwise, you would have to reference 12 sitemap files separately. Sitemap Index files are more convenient when submitting sitemaps to search engines as well.
XML Sitemap Creation
If your website suffers from poor internal or external linking, then the search engines may not be able to crawl and index all your pages, meaning that these pages will not rank and you will miss out on potential traffic. An SEO Strategist will create an XML sitemap for your website, which will tell the search engines directly about valid pages on your website that might not otherwise be visible to the search engines. This will ensure that the search engines can display your pages if they are the best results for a user's query, thereby preventing lost traffic.
Sitemaps for large sites
But what if each site has a quarter of a million pages? We could not even fit one site’s URLs in one sitemap. Do we just list the first 50k URLs in one sitemap, the next 50k in another and so on? Yes, you can do that but it is better to group your pages. Is there a commonly adopted standard for grouping pages in sitemaps? Yes and no. It seems best to simply group pages by following your site’s structure. For that reason the answer is – yes. But since each site is structured differently the answer is also – no – or to be less accurate – it depends.
Probably, the only thing about structures that all sites have in common is that they have a set of top-level, category pages. These – along with your corporate pages – would be listed in a “main” sitemap. As for the rest, that depends on the site. A stock photo site might list all of its stock image pages in a separate sitemap. If your site is structured by “Line of Business” you can group pages this way and have a sitemap per LOB.
Your regional sites might differ from each other as well. Perhaps you have a different design of the .de/ and .co.uk/ sites which are also much larger than others. Structuring sitemaps the same way for all sites might be difficult for this reason so you may choose a different structure.
Search engine spiders crawl the code of web pages from top to bottom and content or links near the beginning of code can be valued more than those at the very bottom. The order in which URLs are listed in an XML sitemap makes absolutely no difference to their importance. The last URL in a sitemap listing 50,000 URLs will be equally important as the very first URL. Unless you use the <priority> attribute which takes values 0 to 1, typically up to 2 decimal places. So, if a URL has priority = 1, it is in theory more important than a URL which has priority = 0.35. Bear in mind that this prioritising is only relative within a sitemap. And it has no impact whatsoever on organic rankings.
Types of XML sitemaps
The standard type of sitemap can submit URLs of HTML pages and PDF files. If you have pages with useful videos or images you can create Video or Image sitemaps. This is another aspect which can determine how you structure your sitemaps.
If you have a set of pages containing videos, you can use additional attributes for video URLs to feed more information into a search engine and have it use this information for universal search results. These include a title and description for the video, a thumbnail image and video player location (direct link to the video file). A different set of attributes is available for images. Before you make use of these additional attributes you have to remember to include an additional name space to your sitemap so that a search engine will understand the extra information you are providing. While you can just add the name space to your main sitemap and use the attributes selectively for URLs containing videos or images, it is common to move these URLs to a separate sitemap purely for video or image content. Remember that there is also an additional name space if you are thinking about deploying your rel-alternate hreflang-x annotations in XML sitemaps.
XML sitemaps for SEO
By now, you are probably wondering whether all of this sitemap structuring matters for SEO? I will not lie to you and tell you it does because it does not. Whether you have a huge sitemap with all your URLs listed randomly or structured in different sitemaps and sorted neatly will have no impact on SEO. Assuming of course sitemaps in both scenarios are free of errors and adhere to sitemap standards. But having them organised well help to manage them easier. If you need to remove or update some URLs, you will quickly know where to find them in an organised structure. So, I suppose the only commonly adopted standard is to have your sitemaps well organised but make sure you do not overdo it and organise yourself into having dozens of sitemaps. Search engines will find their way around them regardless but you should too.
Latest posts by Arkadiusz Kostrzycki (see all)
- Three Things You Can Do To Implement Effective Geo-Targeting - September 7, 2017
- How to structure XML sitemaps for global sites - November 27, 2014
- Is rel-alternate hreflang-x A Geo-Targeting Factor? - November 20, 2014