|
Jack's Hack for the month of December, 2004:
Leveraging Servlet Filters to Serve Multipart Content By Luca Passani Over the past few years we have published a series of articles on serving Multipart messages to mobile devices (see References section for more info). MIME/Multipart is a mechanism which allows developers to 'package' mark-up and image files into a single object to be sent to a mobile device. Devices which receive a multipart message know how to extract each message component and display the complete page in one fell-swoop. End-users experience a faster system and this increases your application usability. Vodafone uses this mechanism to serve their popular Vodafone Live!™ portal menus to Multipart-enabled devices. Challenges of Multipart Unfortunately, serving multipart content introduces a few challenges to the content developer:
WURFL to the Rescue The first issue is how to tell a Multipart-enabled device from one that is not. An old acquaintance of ours comes to the rescue here: the Wireless Universal Resource FiLe (WURFL) (see References section for more info). WURFL contains a capability called multipart_support
which, given a user-agent string, can tell you if that device
supports Multipart.The values for these properties have been derived from published documentation and UAProf profiles, but also from direct observation. Many Nokia devices, for example, have buggy Multipart support. For this reason they have the multipart_support
capability set to false.
Servlet Filters: tweaking HTTP responses 'on the fly' Servlet Filters are a new possibility to mingle with HTTP requests and responses introduced with Servlet Spec 2.3. Links to extensive articles and tutorial about filters can be found in the references. For the purpose of this article, a filter is a Java program that 'sits in front' of a servlet or a static resource. A filter has a couple of nice properties:
I suspect some readers are already getting what I am trying to do: I am going to use filters and the WURFL to decide if a given device supports multipart. If it does, we get a chance to grab the mark-up in the response, grab all the image files through HTTP and dynamically build a multipart message for that device, without touching the rest of the application. Of course, there is a lot of complexity that needs to be solved to make such a filter work. The eXtreme Programming (XP) guys believe that code is the ultimate documentation. In this spirit, I'll walk you through the code step by step and explain what happens. Multipart Filter: a walk through the code Listing 1 is about initializations. We need a bunch of different packages HttpClient from the apache foundation (retrieve images through HTTP). WURFL API (to query the WURFL). XOM API (to parse the mark-up and retrieve image URLs). Tagsoup parser (to help parse the mark-up even when not well-formed!). The init() method just makes sure that the WURFL API is initialized. Another application may have initialized it already. Listing 2 is the part that queries the WURFL to understand if a given device supports multipart. To be extra sure that we are not sending multipart to an imode device, we also check for that. Theoretically, this should not be necessary, as long as all the imode devices are correctly configured in the WURFL. imode devices do not support multipart, even though this may change in the future. If a device does not support Multipart, the application flow branches off to the end of the method: chain.doFilter(request, response); This is the Filter way to say that nothing special needs to be done and the response can go back unaffected. If the device does support Multipart, we need to intercept the mark-up in the response and act on it. Without going to deep in how Filters work, we can't manipulate the response object directly. Instead, we provide our own response object that we can do things on. In listing 3, I make sure that the response is captured in a BufferedResponse object which I borrowed from a popular JSP Book (The Complete JSP Reference, by Phil Hanna). The advantage of this object is that I can get the mark-up as a simple string. I also need to grab the MIME-type, since it could be different things (WML, XHTML or even CHTML)and I'll need it later when I create my multipart. Alas, the Response.getContentType() method only works since
version 2.4 of the servlet spec, so, to make the filter work
on Tomcat4, I had to hack my way around the problem
and figure out the MIME type through the page DOCTYPE.
At this point, it's time to parse the mark-up and figure out the URLs of the pictures. I really had two choices:
The library I chose is XOM (the one used in the WURFL API). While XOM is meant to be easy and work out of the box for non-expert XML developers, it also offers a lot of power for advanced uses. In this particular case, I had two requirements: 'efficiency' (I need to parse the mark-up for every request coming from a multipart-enabled device) and 'forgiveness' (XHTML code may be not well-formed and I still want to complete the Multipart). XOM granted both wishes. While XOM's default behavior is to build an object model of the XML document to allow traversal at a later stage, one can override such behavior and provide a custom NodeFactory subclass
(a 'strategy' pattern, for those who are familiar with patterns) to perform
extra activities while building the object model itself.
I created StreamingImgLister
(StreamingImgLister.java) as a subclass of NodeFactory
and fed it into the XOM Builder().This mechanism allowed me to let XOM create my ArrayList of URLs while scanning the mark-up (no need to traverse the object model. I'm only interested in the 'img' tag and 'src' attribute). Without going into too much detail about how NodeFactory works, here is one of the very few NodeFactory methods I had to
override in StreamingImgLister.java to let XOM
know that I was only interested in 'img' elements.
public Nodes finishMakingElement(Element element) {
//get picture url
if (element.getLocalName().equals("img")) {
if (element.getAttributeValue("src") != null) {
urls.add(element.getAttributeValue("src"));
}
}
:
}
Please refer to the StreamingImgLister code and the XOM
documentation for a better understanding of this powerful mechanism.As far as 'forgiveness' is concerned, that concept is sort of alien in the XML world. You have to account for well-formedness, DTDs and entities when parsing XML, or your system will blow up. Alas, the mark-up we find out there is often non-valid or even non-well-formed. The solution to this problem was provided by a SAX parser called Tagsoup (see References section for more info): "a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short" Tagsoup does the job of parsing potentially broken XML files. To add to that, it integrates very well with XOM. While XOM tries to use the Xerces SAX2 parser by default, it gives you a chance of using a different one. Give XOM the tagsoup parser and you'll be able to handle granma's homepage even if she didn't close all the tags! The following lines (extracted from Listing 4) deliver the robust and efficient solution I was looking for (the rest is error-handling):
StreamingImgLister sil = new StreamingImgLister();
org.xml.sax.XMLReader parser = new org.ccil.cowan.tagsoup.Parser();
Builder builder = new Builder(parser, false,sil);
:
Document doc = builder.build(mark_up,"");
urls = sil.getUrls();
We have our URLs. It's time to retrieve them. Just one word of caution.
In the actual code which accompanies this article, I also have
some lines to handle the 'base' tag in the file. I removed it
from the article itself because I considered it out of scope.
You may want to use this possibility in your application. In that case,
you'll need to rebuild the filter.
I introduced a class called MultipartPiece to encapsulate
information about each downloaded multipart component (typically a picture).
The object is initialized using the (potentially relative) picture URL and the
page URL. Internally the object does the job of figuring out the absolute url
of the picture. Once the resource is downloaded, the object has space to store
the MIME type and the actual content of the resource. This comes handy
at a later stage.As far as the actual download is concerned, I used the good HttpClient library from the Apache commons (see References section for more info). In my experience they are nicer to use than the java.net.* libs. Also, they offer support for multi-thread downloads. HTTP being involved, my only option was to do the downloads in parallel. For this reason, I spawn a new thread for each URL. Admittedly, one might use a thread pooling mechanism here, but given that many JVMs and OSS recycle threads, this may be overkill. If you need to do thread pooling, I suggest that you take a look at the good libraries from Douglas Lea (which have been adopted in Java 5. These, in turn, have been retrofitted to work with JDK 1.4, in case you worry about migrating next year). Be warned that this is not a standard use of thread pooling, because you also need to track which threads are serving URLs for a certain request and which ones are serving a different request. In other words, I'll gladly leave this enhancement as an exercise for the reader. As far as the code for the actual thread is concerned, GetThread
is a direct subclass of thread which makes space for all the info it needs
to perform (httpClient, thread ID and MultipartPiece).
Here is the snippet of code that performs
the actual download (part of the run() method, of course):
int statusCode = httpClient.executeMethod(method);
if( statusCode != -1 ) {
multipartpiece.setComplete(true);
multipartpiece.setMimetype(method.getResponseHeader("content-type").toString());
byte[] body = method.getResponseBody();
multipartpiece.setBytes(body);
method.releaseConnection();
}
We just need to start all the threads (Thread.start()) and wait for
them all to complete (Thread.join()).
At this point, we have all the bits and we just need to assemble the multipart message from its components. This is a walk in the park compared to what we have been through so far. The response needs to be binary. We open an output stream and we push in all of our bits in the format and order which we learned from previous tutorials, using the string "foo" as a separator. The only thing which remains to be seen is the method to which we delegate falling back into a normal response if anything goes wrong while building our multipart. Not much magic here: Filter configuration can be found in any of the filter tutorials in the reference. For your own curiosity, here is the filter I used on my installation: This is basically it, my friends. We now have a filter that we can place in front of any servlet, JSP page or XHTML page we like for the pleasure of bandwidth-challenged mobile users out there. Enjoy! The complete code is available here (unpack the file, cd into the antbuild directly and run "ant" to build. Jakarta Ant is required). If you want to deploy the filter in action, click here to download a ZIP file for download in your Tomcat directory.(cd into your TOMCAT webapps directory, unzip and restart TOMCAT). References Multipart on Developer @ Openwave http://developer.openwave.com/dvl/support/documentation/jacks_hacks/archive/04apr.htm http://developer.openwave.com/dvl/resources/code_corner/technical_notes/multipart.htm Servlet Filters http://www.onjava.com/pub/a/onjava/2003/11/19/filters.html http://javaboutique.internet.com/tutorials/Servlet_Filters/ Filter Chapter from "The JSP 2.0 Complete Reference" by Phil Hanna http://www.philhanna.com/jspcr2/index.html WURFL http://wurfl.sourceforge.net/ http://wurfl.sourceforge.net/java/api.php XOM by Elliotte Rusty Harold http://www.cafeconleche.org/XOM/ Tagsoup Parser by John Cowan http://mercury.ccil.org/~cowan/XML/tagsoup/ Jakarta Commons HttpClient libs http://jakarta.apache.org/commons/httpclient/ Utilities for Concurrent Programming by Douglas Lea http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html Credits There are a few people I would like to thank for their precious advice. In no particular order: Michael Abato, Hallvard Trætteberg, Elliotte Rusty Harold, Simone "Simon" Bordet, Roby, Douglas Lea, John Cowan and Wolfgang Hoschek. |