My team builds a fair number of community sites including Channel 8 (for Students), TechNet Edge (for IT Pros), Channel 10 (for enthusiasts, power users and gamers), Mix Online (for web developers and designers) and the original site… Channel 9 (aimed mostly at developers) … and we’ve recently starting putting out sites on a new code base. One of the changes in that new code base was a move to an AJAX style interface for viewing lists of posts on the page. We like the way this works for paging through lists of entries, comments, etc… but we have known from the beginning that it was going to cause us some trouble in the world of search engines and other crawlers. Without JavaScript, there was very little being output onto the page, and what was there was mostly navigational chrome. Taking a look at Google’s cache of TechNet Edge from a few days ago gives this:

[not much to see without script](http://duncanmackenzie.net/images/d105b1c6-c386-44b0-ad06-abb5dfc2d260.jpg" rel=“lightbox[504]” title=“not much to see without script)

Checking how your page appears in the cache of Google or Live is one way to check how you appear to crawlers, but it doesn’t work great when you are making changes or running in development. One handy way is to check your site using Lynx, like Joshua mentions in this post on Mix Online.

The content on the site was ending up in the index of search engines anyway, through the virtue of RSS feeds and incoming links… but the value of your site to crawlers is going to be much lower than it should be if they don’t see any content when they visit. As I said earlier… we always knew this would be a problem, but I guess we just didn’t get around to fixing it before pushing out a full three sites using AJAX based paging. Last week I had a meeting with a SEO consultant and they pointed out the exact issue I’ve been describing. Well… given a long weekend… and no interest in working on my actual planned tasks… I decided to implement two features to help how our sites appear to crawlers.

First, I added some code that swaps out our fancy Ajax entry list with a simple ASP.NET repeater if the browser doesn’t appear to be one that is supported by Microsoft Atlas, making our site usable to other browsers (Atlas supports the bulk of users, but not all) and also making our content visible to a crawler. So far, I only output the first page of any given entry list, but that makes the results go from blank to this:

[current cached version from Live.com](http://duncanmackenzie.net/images/15f14fa3-9d40-457d-ae10-b478d6afefc2.png" rel=“lightbox[504]” title=“current cached version from Live.com)

Next, I added an XML sitemap, following the specs from sitemaps.org, by outputting a sitemap index at <http:///sitemapindex.ashx> and then outputting a series of sitemaps (by page #) from http:///sitemap.ashx?page= (see Mix’s sitemap index, and sitemap as an example). Finally, I put a link to the sitemap index into the robots.txt file for each site.

Between the two, I’m hoping our content will get indexed better by a variety of search engines, resulting in more people finding us when searching for relevant topics. These changes also help to make us a little bit more usable to some users, but that is another area where we need to do a lot more work. If these changes improve our accessibility that’s great, but I’d hate to even suggest that they get us anywhere near our goals in that area.