Fun with Google’s own Sitemap

Lunar Site Map

Lunar "Site Map".... *** Click here for larger size ***

With all of the other products besides its search engine that Google has branched out into, some aspects of SEO are actually important to Google itself, ironically.    It may be surprising to some that Google itself has a sitemap so that search engine spiders can properly index their empire, located at http://www.google.com/sitemap.xml

It’s fun from time to time to check this out and see what Google is up to.   Surprisingly, there is a TON of junk in there – Firefox plug-ins that no longer exists, products that Google has discontinued, individual entries calling out various JPG images which seems really odd, and so on – in fact, a number of the URLs are simply redirects to other URls already listed in the sitemap, or even in some cases to pages that no longer exist.

One gets the sense that Google has not seriously reviewed its sitemap in some time – a thorough review and a proposal for reorganizing it into subfiles would be a great project for an intern for a week or two.

It used to be that Google had old links to interesting, fun, and possibly useful things in there (such as a very light mobile search interface that was kind of neat), but they must have weeded most of them out at some point. Here’s a few mildly interesting links buried in there:

Some kind of Mother’s Day tribute to Googlers Moms, I must have missed it when it came out originally: http://www.google.com/moms/

This appears to be the standard Google interface, but I believe it’s from some alternate datacenter – it feels faster to me, it would be worth some knowledgeable person running some speed tests on this versus the main site:
http://www.google.com/webhp

Google in Klingonese:
http://www.google.com/webhp?hl=xx-klingon

In researching this article, I found shockingly little information from Google itself on Sitemap best practices – they have some scattered advice on the basics, but any more advanced advice seem to come from others.

Here’s some best practice violations I believe Google is guilty of with its own sitemap:

  1. Google lists URLs for numerous pages that no longer exist.
  2. There are huge runs of URLs that could probably be better managed in separate sitemaps – it’s quite surprising that Google’s sitemap is not a sitemap index file that simply references other sitemaps.
  3. It contains many URLs that appear to be for different language versions of Google – also probably more easily managed in separate sitemap files.

Any other opinions on this?

No comments yet, be the first.

Leave a Reply