A bug in my RSS generator, but is it really invalid?

The RSS generator for MSDN, creator of this feed, and many more … has a small problem. Way upstream, when various people inside the company enter information about an upcoming headline, they have the ability to specify a URL to a download. The intent was for this to be a URL to an actual downloadable file, so when I generate an RSS item from that headline entry, I take that URL and turn it into an enclosure entry in the RSS file.

<item>

<title>Read about Atlas - Ajax for ASP.NET</title>

<description>ASP.NET "Atlas" is a package of
new Web development technologies that integrates an extensive set of
client script libraries with the rich, server-based development
platform of ASP.NET 2.0. </description>

<link>http://msdn.microsoft.com/asp.net/future/</link>

<dc:creator>Microsoft Corporation</dc:creator>

<category domain="msdndomain:ContentType">Link</category>

<category domain="msdndomain:Audience">Developers</category>

<category domain="msdndomain:Hardware">CPU</category>

<category domain="msdndomain:Operating Systems">Windows</category>

<category domain="msdndomain:Subject">Web development</category>

<msdn:headlineImage />

<msdn:headlineIcon>http://msdn.microsoft.com/msdn-
online/shared/graphics/icons/offsite.gif</msdn:headlineIcon>

<msdn:contentType>Link</msdn:contentType>

<msdn:simpleDate>Sep 19</msdn:simpleDate>

<enclosure url="
http://go.microsoft.com/fwlink/?LinkId=52384"
length="17437"
type="text/html; charset=utf-8" />


<guid isPermaLink="false">Titan_2519</guid>

<pubDate>Mon, 19 Sep 2005 18:20:40 GMT</pubDate>

</item>

This generally works fine, I make a HEAD request with that URL which gives me back the MIME type and the Content Length, both of which are needed for the enclosure element in the RSS item. Sometimes though, people put in a URL to the download’s landing page, not the download itself. There are good reasons for this, as the download page often contains useful information and/or multiple localized versions of the download, but it was not what I expected. In this case, I put the enclosure in with the MIME type I get back from that URL, which ends up being ‘text/html’ and with a byte size that reflects the size of the landing page.

This wasn’t really what I wanted to happen, so I need to figure out a solution at my end, but what I noticed today and what has me a little puzzled is that at least two different validators (here and here) report these types of entries as validation errors. The error they specify is that text/html is not a valid MIME type…. but, according to the RFC(s) (see 4.1.2 of this RFC) and other sources, it most certainly is a valid type. So, is there a hidden rule in RSS that enclosures have to fall within some special subset of MIME types, or are both of these validators broken? Sure, in this case it wasn’t really what I wanted, but what if I really did have a text/html document for you to download?

VB Futures section up on MSDN…

Now, to me, VB 2005 is the “future”, and anything beyond that is really just coffee-break information to read briefly…. but I guess I wouldn’t be a very good Microsoft person if I didn’t start pushing the version-after-next version of our development tools before the next version has even shipped.

So, with that in mind, check out http://msdn.microsoft.com/vbasic/future/ which, despite my comments, is quite a good pile of info on post-Whidbey VB features and even includes a download to bring LINQ features in VB 2005. Hmm… ok, I guess with that download it seems a bit more ‘current’ to me … hmph…

Pulling from MSDN… the code…

(see this post for an introduction to this topic…)

I’ve wrapped my code up into a user control that you place anywhere on your page… it handles the load of data and then you can access its properties to output the html headers and body of the pulled content. I’ve just been using Output Caching on the host page, but if you decided to cache the body/headers that would certainly work as well…

Here is an example of using the control on a bare bones page…

<%@ Page Language="VB" Debug="true" %>
<%@ OutputCache Duration="360" VaryByParam="*" %>
<%@ Register TagPrefix="dm" TagName="Pull" Src="Pull.ascx" %>
<dm:Pull id=pagePull runat="server"
QueryParam="pullURL"
DefaultURL="http://msdn.microsoft.com"/>
<html>
<head>
<%=pagePull.PageHeaders%>
</head>
<body>
<%=pagePull.PageBody%>
</body>
</html>

This simple page and the ascx are bundled up into a .zip file available here

More on “pulling” MSDN content into my site…

In my last post, I was talking about pulling my articles from MSDN into the chrome of my site. This type of system could be created using a frameset, but frames are evil, so that isn’t the approach I took. Instead, knowing a bit about the files on MSDN’s web servers, I took advantage of a special xml file that exists for most of our articles. This file is created as part of our publishing process and exists so that we can pull articles into the chrome of our developer centers (like this). It isn’t a straight xhtml file, but it is almost identical to the html content of the article itself. Knowing that this file exists, my pull code just munges the original (MSDN) URL of the requested article to figure out the underlying xml file name, then loads up that xml. Once I have the xml content, I do a bit of work to the elements, to make all the relative links correctly point back to MSDN (for the images, related articles, links into the SDK, etc…) and then output html into a placeholder on my own page.

Given a URL like this, you can remove the pull syntax (used by our developer centers) to come up with the library URL of this article, /library/en-us/dncodefun/html/code4fun12102003.asp, then apply a complex transform to produce the likely URL of the ‘behind-the-scenes’ XML file: /library/en-us/dncodefun/html/code4fun12102003.xml.

It is possible that this XML file doesn’t exist, so it is important to handle that possibility in your code. In my case, if I can’t find the XML I just redirect to the original MSDN url. If the original URL doesn’t appear to be well formed, I just give up completely and redirect to the home page of my own site.

more details and code to follow….

[Listening to: Last Chance – Jet – Get Born (01:52)]

Doing uploads with BITS

I wrote a couple of articles on BITS in the past (creating a wrapper, then background copying, then digital grandma) but it was all about downloading files. Starting with BITS 1.5, you can also upload files… is that topic of interest to folks? Just FYI, you need web server support to make this work, as detailed here.

New Coding 4 Fun Article up…

Add a Quick Poll to Your Web Site

Summary: Duncan Mackenzie describes his process to build a “Quick Poll” using Visual Basic and ASP.NET.

Recent discussions have motivated me to add some ‘anti-repeat-voting’ code to this sample… I’ve finished up most of the changes, so grab the sample from the article if you are interested and then watch this space for more information on the additions!