How to fix & when xslt outputs html

Tags:

Call me strange but I'm one of those who actually find xml parsing and xsl transformations nice and powerful tools when it comes to web development. I'm also a strong believer in validating mark-up. But xslt and validation doesn't always go hand in hand right out of the box.

By default java are shipped with Xalan for handling xslt. Xalan has one annoying bug affecting the value for href attribute on anchors when xslt are set to html as output method.

Shortly described; & ends up as & when printed out in an href attribute on the anchor element when xslt is set to html as output method. In other words; after xsl transformation we end up with invalid html.

Problem by example

We have the following xml:


<p>
    The link is: http://www.trygve-lie.com/?foo=a&amp;bar=b<br />
    Click <a href="http://www.trygve-lie.com/?foo=a&amp;bar=b" title="A &amp; B">here</a> to see it!
</p>

If we xsl transform this xml and the output method is set html:


<xsl:output method="html" />

the result will be:


<p>
    The link is: http://www.trygve-lie.com/?foo=a&amp;bar=b<br>
    Click <a href="http://www.trygve-lie.com/?foo=a&bar=b" title="A &amp; B">here</a> to see it!
</p>

Note the & in the href. This is invalid html. Do also note that &amp; is untouched on the title attribute.

If we set the output method to xml:


<xsl:output method="xml" />

the & in the href will be correct:


<p>
    The link is: http://www.trygve-lie.com/?foo=a&amp;bar=b<br />
    Click <a href="http://www.trygve-lie.com/?foo=a&amp;bar=b" title="A &amp; B">here</a> to see it!
</p>

but; because this is xml (note the: <br/>) this will also be invalid html.

A lot of application servers (this goes for both Tomcat 6.x and Jetty which I mostly work with) and other tools (Ant as an example) use the native xslt engine shipped with Java. The native xslt engine in Java is Xalan and the problem occure when Xalan does the transformation. The above application inherit the problem from Java.

In other words; if we install Java out of the box, then install Tomcat out of the box, deploys jstl out of the box to an empty container in Tomcat and use jstl's x:transform tag in an jsp and try the examples above, we will have this problem.

The same problem will be there if we install Java out of the box, install Ant out of the box and make a build script which use the xslt task to do transformations.

Test yourself

I've made an small test case which will display this problem in an application server by using jstl's x-tags (note: the test case use jsp 2.0). Just download it, extract the files into an container and access xslt.jsp in an browser. If you have the problem described above, you should get this output.

The solution

The solution is to override the default xslt engine in Java. Xalan has this problem up to current version (version 2.7 in writing) but Saxon does not. Saxon will handle this problem correctly so one solution is to override the default in Java with Saxon.

Java comes with a feature called Endorsed Standards Override Mechanism which allows us to override the default libraries in Java. One solution could be to override the default xslt engine on the whole system by adding Saxon to Java's endorsed directory. Then the override will apply to all applications on the system.

Fixing Tomcat

One other way is to tell the application server which xslt engine to use. Tomcat 6 (I've not checked older versions) utilizes the Endorsed Standards Override Mechanism (see XML Parsers and JSE 5) by letting us override libraries in Java by adding an argument to the container during startup. This way, only Tomcat will be overrided.

To change the xslt engine from Xalan to Saxon in Tomcat we can do the following:

1 - Download Saxon 9.1.x or newer and extract the jar files to an preferred directory

2 - In your startup script to Tomcat, apply the following:
-Djava.endorsed.dirs="/path/to/where/your/extracted/saxons/jar/files/"

Then Tomcat should run with Saxon as xslt engine and jstl will use Saxon for the x tags.

To verify that Saxon are in use, you can run the small test case again and it should provide this output.

Fixing Ant

We can also apply a similar fix to Ant by telling the xslt task to use another xslt engine. To change the xslt engine from Xalan to Saxon in an Ant task, do the following:

1 - Download Saxon 9.1.x or newer and extract the jar files to an preferred directory

2 - Apply the following to the xslt task:


<xslt basedir="." destdir="." extension=".html" includes="doc.xml" style="html.xsl" force="true">
  <classpath location="/path/to/where/your/extracted/saxons/jar/files/saxon9.jar"/>
  <factory name="net.sf.saxon.TransformerFactoryImpl"/>
</xslt>

3 - If we use Ant 1.7.1 or older there is a bug in Ant which cause this to fail. The workaround is to point to Saxon when running the task so do run the task as follow:
./ant -lib /path/to/where/your/extracted/saxons/jar/files/saxon9.jar

Then Ant should do the transformation with Saxon. Do note that Saxon provides an custom Ant task for using Saxon with Ant.

These methods do also apply if Xalan actually fix the bug causing this problem and we are stuck on an old Java version with an old version of Xalan. This method can be used to upgrade Xalan.

Disclaimer

There are several other ways of fixing this problem. This "how to" only describe the fundamental problem regarding the &amp; to & which can occur and how to fix it in an very vanilla environment. Do also keep in mind that changing xslt engine can cause major problems in other parts on an application running on top of the application server. Do test!

Comments:

Post a Comment:

HTML Syntax: NOT allowed