Tuesday, September 20, 2005

Canonical domain urls, www and non www

I have recently experienced a traffic (and sales) drop on one of my main sites. This was to be expected as the recent changover from the Access backend to the SQL meant some downtime. Unfortunately, this downtime occured at the worst possible time, as the world's largest search engine (Google) who also happen to be that site's main source of traffic were crawling it at the time. Basically I messed up the migration, because I didn't reaslize that the whole site was going to be physically moved to another machine for the upgrade. I saw the new option the ISP had given me to create the new SQL database, so I preceeded to set everything up and migrate. Of course, the whole thing fell over once the hosting company swapped machines, as the machine name and probably several other things changed. I also had a few dramas with getting the product data imported into the SQL database, as I had made some modifications to the Access database, and had to change the SQL one accordingly.
The whole thing meant 24 hours of down time, or semi down time, as all the static pages were still working, so the site was still functioning as a product catalogue.

Now, on to the whole point of this post. Webmasters may be aware of the fact that the domain name www.site.com is different from site.com. Although ownership of a domain secures ownership of both, they are technically two different names. Normally they will just point to the same place, such as the IP address of a web server. As they are both pointing to the same web server, the same content will be returned whether the user enters www.site.com or site.com. Also of note here, is that the URL entered into the address bar will not change when the page is loaded. Therefore this is not a redirect, but an alias.
Google and probably other search engines can see this as two domains with duplicate content. This is countered by setting the non-www domain to perform a 301 Perminant redirect to the www domain, changing the address int the browser, and pointing search engine bots to the correct address. Because my hosting company cannot redirect the non www name separately, I used and ASP solution. I got them to set default.asp over default.htm the default page priority, and I made default.asp redirect everything to http://www.mysite.com/default.htm.
This has worked well, and it quickly got rid of my duplicate home pages on Google. (This was about a year ago.) But during this changeover, it turns out the ASP was down for a few hours (the default page priority was not set on the new server), and during this time, GBot crawled the root of my site and indexed the duplicate pages. It would have to be the worst timing ever. So I have cleaned the site up, redone all the sitemaps, and sent a note to Google, asking them to get rid of the duplicate entry that has appeared. it comes up mysite.com/ in the index. Now I just have to hope that all will sort it's self out.

No comments: