You may have heard that a website should have two types of sitemaps – a sitemap page and an XML sitemap. The sitemap page is for visitors and the XML sitemap for search engines. The XML sitemap is indeed for search engines however the sitemap page is technically for both. For visitors this page may be useful as it contains links to all important pages on your site formatted into a simple list of links. On the other hand, it provides keyword-ed links to valuable pages on your site for search engines.
There are common guidelines for structuring sitemap pages and they are rather straightforward. What about structuring XML sitemaps – especially for larger multilingual or multi-regional sites? While there are certain rules to follow to ensure search engines can process your XML sitemaps correctly, it seems there are no common guidelines for structuring them.
Global sites
There are two fundamental domain strategy approaches when it comes to global sites. There are sites which use ccTLDs and those which use a common domain to host all local sites. The strategy defines how you can approach structuring XML sitemaps.
For the ccTLD strategy you inevitably need a separate XML sitemap for each site (domain). This is because you generally can list URLs only from the domain on which the sitemap is hosted. Technically, there is a solution for cross-submitting URLs from different hosts but this feature is rarely used. You can read about it in the Sitemap Protocol.
For common generic domain setups there is a number of ways to structure your XML sitemaps. If individual country/language sites are hosted on a DOT COM in subfolders you can list URLs from all sites in one sitemap. This approach is best for small sites. For instance if there are 12 regional sites – each in a subfolder such as .com/de/ or .com/fr/ consisting of 50 pages – you can list all 600 URLs in one XML sitemap. It will be a simple setup.
For larger sites using a common domain setup it is typically best to keep URLs of each local site in a separate sitemap. So, using the above example, we would end up with 12 XML sitemaps. Why? If each regional site were to have 12,000 URLs to list, we would not be able to fit all 144,000 URLs in one sitemap because of the 50k limit. Sure, you could split into 3 sitemaps but to keep things well organised, it is preferred to have a separate sitemap for each.
The sitemap index
To tidy things up more, you would create another sitemap which lists the individual sitemaps. This is called a Sitemap Index and you can reference it in your robots.txt file. Otherwise, you would have to reference 12 sitemap files separately. Sitemap Index files are more convenient when submitting sitemaps to search engines as well.
XML sitemap creation
If your website suffers from poor internal or external linking, then the search engines may not be able to crawl and index all your pages, meaning that these pages will not rank and you will miss out on potential traffic. An SEO Strategist will create an XML sitemap for your website, which will tell the search engines directly about valid pages on your website that might not otherwise be visible to the search engines. This will ensure that the search engines can display your pages if they are the best results for a user's query, thereby preventing lost traffic.
Sponsored
Sitemaps for large sites
But what if each site has a quarter of a million pages? We could not even fit one site’s URLs in one sitemap. Do we just list the first 50k URLs in one sitemap, the next 50k in another and so on? Yes, you can do that but it is better to group your pages. Is there a commonly adopted standard for grouping pages in sitemaps? Yes and no. It seems best to simply group pages by following your site’s structure. For that reason the answer is – yes. But since each site is structured differently the answer is also – no – or to be less accurate – it depends.
Probably, the only thing about structures that all sites have in common is that they have a set of top-level, category pages. These – along with your corporate pages – would be listed in a “main” sitemap. As for the rest, that depends on the site. A stock photo site might list all of its stock image pages in a separate sitemap. If your site is structured by “Line of Business” you can group pages this way and have a sitemap per LOB.
Your regional sites might differ from each other as well. Perhaps you have a different design of the .de/ and .co.uk/ sites which are also much larger than others. Structuring sitemaps the same way for all sites might be difficult for this reason so you may choose a different structure.
Prioritising URLs
Search engine spiders crawl the code of web pages from top to bottom and content or links near the beginning of code can be valued more than those at the very bottom. The order in which URLs are listed in an XML sitemap makes absolutely no difference to their importance. The last URL in a sitemap listing 50,000 URLs will be equally important as the very first URL. Unless you use the <priority> attribute which takes values 0 to 1, typically up to 2 decimal places. So, if a URL has priority = 1, it is in theory more important than a URL which has priority = 0.35. Bear in mind that this prioritising is only relative within a sitemap. And it has no impact whatsoever on organic rankings.
Types of XML sitemaps
The standard type of sitemap can submit URLs of HTML pages and PDF files. If you have pages with useful videos or images you can create Video or Image sitemaps. This is another aspect which can determine how you structure your sitemaps.
If you have a set of pages containing videos, you can use additional attributes for video URLs to feed more information into a search engine and have it use this information for universal search results. These include a title and description for the video, a thumbnail image and video player location (direct link to the video file). A different set of attributes is available for images. Before you make use of these additional attributes you have to remember to include an additional name space to your sitemap so that a search engine will understand the extra information you are providing. While you can just add the name space to your main sitemap and use the attributes selectively for URLs containing videos or images, it is common to move these URLs to a separate sitemap purely for video or image content. Remember that there is also an additional name space if you are thinking about deploying your rel-alternate hreflang-x annotations in XML sitemaps.
XML sitemaps for SEO
By now, you are probably wondering whether all of this sitemap structuring matters for SEO? I will not lie to you and tell you it does because it does not. Whether you have a huge sitemap with all your URLs listed randomly or structured in different sitemaps and sorted neatly will have no impact on SEO. Assuming of course sitemaps in both scenarios are free of errors and adhere to sitemap standards. But having them organised well help to manage them easier. If you need to remove or update some URLs, you will quickly know where to find them in an organised structure. So, I suppose the only commonly adopted standard is to have your sitemaps well organised but make sure you do not overdo it and organise yourself into having dozens of sitemaps. Search engines will find their way around them regardless but you should too.
Arkadiusz Kostrzycki
Latest posts by Arkadiusz Kostrzycki (see all)
- Three Things You Can Do To Implement Effective Geo-Targeting - September 7, 2017
- How to structure XML sitemaps for global sites - November 27, 2014
- Is rel-alternate hreflang-x A Geo-Targeting Factor? - November 20, 2014
thank you for sharing
Hi,
Thanks for the brilliant article.
I hope i get an answer from you for my problem.
Ours is B2B Marketplace platform.
we have created sub directories for each country. for ex : http://www.mywebsite.com/in for INDIA and so on. But the Home Page is global.
All url’s of category and product pages pertaining to INDIA are in INDIA folder in the sitemap. Now to keep the product name closer to the domain name we have created individual urls’s for main and sub category. For example : Main category : Printing Machines and Sub Category : Bag Printing Machine. We have 2 separate url’s in the sitemap. mywebsite.com/printing-machines and mywebsite.com/bag-printing-machine.
HOW TO MAKE GOOGLE BOT UNDERSTAND THAT BAG PRINTING MACHINE IS A PART OF “PRINTING MACHINE” CATEGORY.
Hi Arkadiusz,
Thanks for this post, it covers nearly all of my questions.
I do have one more question:
I have a subfolder set up whereby the root domain (example.com) is just a language picker and set to x-default.
There are no other pages on the root domain.
The countries are then set up as sub directories:
/en-gb/, /en-us/ etc.
Currently the root domain has a sitemap that only contains one page (the homepage itself).
The subdirectories then contain their own sitemaps.
Is this an acceptable set up? The plugin I am currently using (Yoast) does not offer the option to have one single sitemap_index.xml for the root domain, thus I have the set up described above.
Hi,
There are two options.
The first one is to create a sitemap listing URLs from all the subfolders including pages from the root .com/. If the total amount of URLs would be more than 50,000 then it would need splitting into multiple sitemaps referenced in a sitemap index. The sitemap (or multiple sitemaps), including the index-sitemap have to be in the root folder of .com/
The second option is to have separate sitemaps per subfolder e.g. sitemap-uk.xml for /uk/ pages. Each sitemap could be placed in the subfolder e.g. .com/uk/sitemap-uk.xml. This is good for debugging, because any URL listed outside of /uk/ subfolder would produce a sitemap error, making it easier to spot these errors. Placing all sitemaps in the root is also fine e.g. .com/sitemap-uk.xml, but will not offer the debugging option. The sitemap index would list all sitemaps and needs uploading to the root of .com/
This second option offers better XML sitemap reporting e.g. in Google Search Console where you can then see pages submitted vs. indexed as well as any errors or warnings – individually per sitemap. It’s a more flexible solution which I would recommend for the sample sitemap setup.
Hope this helps ?
Arkadiusz
Sample website url : http://www.website.com
sub folder urls : http://www.website.com/can/
http://www.website.com/aus/
http://www.website.com/uk/
how to set xml sitemap ?
http://www.website.com/can/sitemap.xml
http://www.website.com/aus/sitemap.xml
http://www.website.com/uk/sitemap.xml
And keep all the above xml sitemaps in index xml sitemap:
http://www.website.com/sitemap.xml
Would you create a sitemap for each country (.com/CountryCode/Language/) or would you not just use hreflangs?
Hi Stu,
that’s an interesting question. So, let’s say you create a sitemap for the US site and have Google discover pages of other regional sites by crawling hreflang-x links in HTML of those US pages. You could only do this in a limited set of situations. If your regional sites would be different you couldn’t really use hreflang-x because it’s to mark-up equivalent pages. If you would deploy hreflang-x in XML (usually preferred) you would need to additionally list all regional URLs anyway to have bi-directional references – whether in one sitemap if all is on .COM or multiple sitemaps if you have different domains. And finally, if you’re not geo-targeting your sites you wouldn’t use hreflang-x at all. Aside from these scenarios, there are other fundamental reasons why I wouldn’t use hreflang-x instead of regional sitemaps 1) only Google and Yandex support hreflang-x while XML sitemaps are widely adopted and you would be neglecting other search engines 2) XML sitemaps have been around for years and will continue to be while we don’t know really for how long hreflang-x will be available 3) it’s faster and easier to create standard XML sitemaps per site than it is to deploy hreflang-x so you would just create yourself more work and complicate things.