Converting Web Slices to RSS

Tags:

In writing, IE8 are just around the corner. One new features in IE8 which draws a lot of attention, if we listen to the IE team, is the Web Slices feature. I will not discuss if this will be useful for IE users or not, but when looking into the technology I did suddenly remember an old xslt published by our friends over at W3C. More specifically the xhtml to RSS xslt and it just hit me that Web Slices can be to great help in converting a web page into a RSS or ATOM feed.

What is a Web Slice?

A Web Slice is not something Microsoft cooked up by themselves. Web Slices are based on the micro format hAtom which represents a ATOM feed inside a html document by assigning a set of defined values to html attributes. Mostly it's about setting different pre defined CSS classes and other meta data on content which should end up in the feed.

A Web Slice look something like this (picked from the Web Slices documentation):


<div class="hslice" id="auction-update"> 
    <h2 class="entry-title">Auction Item</h2> 
    <p class="entry-content">Current bid is $66</p> 
</div>  

The CSS class named hslice on the div tag defines the start of a Web Slice. The class named entry-title on the h2 tag defines the title of the Web Slice and the class named entry-content defines the content (or description) of the Web Slice.
The id on the same tag as the class named hslice are also essential for the Web Slice since it tells the browser which position the Web Slice has on the page.

The different CSS class names defining the different parts of a Web Slice can be placed between already existing class names on the page.

What is the W3C xhtml to RSS xslt?

The W3C xhtml to RSS xslt, or Site Summaries in xhtml, are a xslt which take a valid xhtml document as an input and transforms it to a RSS feed. The input is based on a set of predefined "rules" which must apply to the xhtml document.

A RSS "item" in the W3C xslt are represented by the following mark-up in xhtml:


<div class="item"> 
    <h2>A title</h2> 
    <p>A description</p> 
</div> 

The W3C xhtml to RSS xslt rely on the semantics in a document so in this xhtml, the CSS class name item defines what should end up as an item in the RSS. Then the h2 tag defines that the content of this tag should be the title in the RSS item and the content of the p tag defines a description in the RSS item.

Doesn't this look very similar to the structure of a Web Slice / hAtom micro format?

Converting

The key here is putting content into structure and what I find fascinating about the Web Slices are how it brings a type of "hidden" structure to a web page. This can be used to more than displaying Web Slices, which (for now) only works in IE8 anyway. Here is a small xslt which will transform Web Slices in a valid xhtml document to RSS 2.0:


<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:date="http://exslt.org/dates-and-times"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:atom="http://www.w3.org/2005/Atom"
    extension-element-prefixes="date"
    exclude-result-prefixes="xsl xs xhtml">

<xsl:output method="xml"
    encoding="UTF-8"
    omit-xml-declaration="no"
    indent="yes" />

<!-- Fetch variable passed on to xslt engine -->
<xsl:param name="baseUrl">http://www.trygve-lie.com/</xsl:param>

<!-- Convert xhtml document with WebSlice (hAtom) format to RSS2 -->
<xsl:template match="xhtml:html">
    <rss version="2.0">
        <channel>
            <atom:link rel="self" type="application/rss+xml">
                <xsl:attribute name="href"><xsl:value-of select="xhtml:head/xhtml:link[@type = 'application/rss+xml']/@href"/></xsl:attribute>
            </atom:link>
            <title><xsl:value-of select="xhtml:head/xhtml:title"/></title>
            <description><xsl:value-of select="xhtml:head/xhtml:meta[@name='description']/@content"/></description>
	        <link><xsl:value-of select="$baseUrl" /></link>
            <ttl>120</ttl>
            <xsl:for-each select=".//*[contains(@class,'hslice')]">
                <xsl:sort select=".//*[contains(@class,'published')]/@title" order="descending"/>
                <item>
                    <title><xsl:value-of select=".//*[contains(@class,'entry-title')]"/></title>
                    <link><xsl:value-of select=".//*[contains(@rel,'bookmark')]/@href"/></link>
                    <guid><xsl:value-of select=".//*[contains(@rel,'bookmark')]/@href"/></guid>
                    <description><xsl:value-of select=".//*[contains(@class,'entry-content')]"/></description>
                    <pubDate><xsl:call-template name="ISO8601-to-rfc822"><xsl:with-param name="date" select=".//*[contains(@class,'published')]/@title" /></xsl:call-template></pubDate>
                </item>
            </xsl:for-each>
        </channel>
    </rss>
</xsl:template>

<!-- Convert from ISO 8601 dateformat to rfc 822 dateformat -->
<xsl:template name="ISO8601-to-rfc822">
    <!-- Input:  2009-01-25T23:20:30.45+01:00 -->
    <!-- Output: Sun, 25 Jan 2009 20:30:45 +0100 GMT -->

    <xsl:param name="date"/>

    <xsl:value-of select="date:day-abbreviation($date)"/>
    <xsl:text>, </xsl:text>
    <xsl:value-of select="date:day-in-month($date)"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="date:month-abbreviation($date)"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="date:year($date)"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="date:hour-in-day($date)"/>
    <xsl:text>:</xsl:text>
    <xsl:value-of select="date:minute-in-hour($date)"/>
    <xsl:text>:</xsl:text>
    <xsl:value-of select="date:second-in-minute($date)"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="substring($date,23,3)"/>
    <xsl:value-of select="substring($date,27,2)"/>
</xsl:template>

</xsl:stylesheet>

Fell free to download the Web Slice to RSS 2.0 xslt.

If you are familiar with the W3C xhtml to RSS xslt you will find some differences to my xslt. This xslt does, off cause, look out for the Web Slices and hAtom tags in the xhtml. The W3C xslt does require a strict semantic to be followed but this is not the case with the Web Slices / hAtom. Web Slices / hAtom can be applied to any part and tags in a document so this xslt does scan trough the whole xhtml document and find the different parts containing the Web Slices / hAtom tags.

The Web Slices / hAtom use the ISO 8601 date format and RSS 2.0 use the RFC 822 date format. This is a small drawback since xslt 1.0 has limited functionality to transform dates. To solve this I've used the EXSLT Date and Time implementation which is supported in several different xslt engines (I've tested Xalan and Saxon).
If you are able to use xslt 2.0 I strongly advise you to rewrite the date handling in this xslt to use the date functionality in xslt 2.0.

Do also note that I pick some values from the head of the xhtml document and use them in the channel head of the RSS. I use the "title" tag in the xhtml as title in the RSS and I pick the "description" in the RSS from the "meta name=description" in the xhtml. I also pick the value to the "atom:link" from the "link" tag referring to the RSS feed in the xhtml.

Example

When playing around with the Web Slices I made a small test page containing Web Slices. This RSS feed are a transformation of the test page with the above xslt.

Any benefits?

In what cases can this be to a benefit? I see one case where it can have a huge benefit.

The difference between Web Slices and a RSS feed are, in my book, that Web Slices give the user the opportunity to monitor one or several spots on a page. The traditional RSS feed gives the user the latest content applied to a page. The difference are how they are sorted; Web Slices are unsorted since the user subscribes to one or several spots on a page. A RSS feed are sorted on date (newest content first). In behavior there is little difference between the Web Slices and a RSS feed. The above xslt sorts the content on date so this is a simple task to solve.

Knowing this; Let's say we have total control over our web pages and produce valid xhtml (you do... don't you?). And we want to use Web Slices... We already know that Web Slices has to be applied to the mark-up and does in a way introduces a feed into a web page. Why not use the xhtml document as source for a RSS feed?

Instead of using different techniques (such as Rome) to produce a RSS feed parallel with the final web pages, introduce a service which take the xhtml document as input and transforms it to a RSS feed.

The benefit would lay in how easy it would be to maintain two feeds. Adding content to the Web Slices would also add the same content to the RSS feed.

Comments:

Post a Comment:

HTML Syntax: NOT allowed