SolutionsTools & SDKSupportForums Register



Quick Links
 
December 2004
 
 
Jack's Hack for the month of December, 2004:

Leveraging Servlet Filters to Serve Multipart Content

By Luca Passani

Over the past few years we have published a series of articles on serving Multipart messages to mobile devices (see References section for more info). MIME/Multipart is a mechanism which allows developers to 'package' mark-up and image files into a single object to be sent to a mobile device. Devices which receive a multipart message know how to extract each message component and display the complete page in one fell-swoop. End-users experience a faster system and this increases your application usability. Vodafone uses this mechanism to serve their popular Vodafone Live!™ portal menus to Multipart-enabled devices.

Challenges of Multipart
Unfortunately, serving multipart content introduces a few challenges to the content developer:
  • Not all devices support Multipart, so you need to make sure that no multipart message is sent to devices that are unable to extract it.
  • Pictures vary dynamically in a typical application. This makes Multipart only suitable for static pages, unless you decide to develop a complex system on your own.
  • Having to package each multipart message, detracts from the simple save and reload paradigm, which many developers are used to.
This article presents an application that lets Java programmers exploit the power of multipart and, at the same time, avoid the aforementioned issues.

WURFL to the Rescue
The first issue is how to tell a Multipart-enabled device from one that is not. An old acquaintance of ours comes to the rescue here: the Wireless Universal Resource FiLe (WURFL) (see References section for more info). WURFL contains a capability called multipart_support which, given a user-agent string, can tell you if that device supports Multipart.

The values for these properties have been derived from published documentation and UAProf profiles, but also from direct observation. Many Nokia devices, for example, have buggy Multipart support. For this reason they have the multipart_support capability set to false.

Servlet Filters: tweaking HTTP responses 'on the fly'
Servlet Filters are a new possibility to mingle with HTTP requests and responses introduced with Servlet Spec 2.3. Links to extensive articles and tutorial about filters can be found in the references. For the purpose of this article, a filter is a Java program that 'sits in front' of a servlet or a static resource. A filter has a couple of nice properties:
  • It looks at an HTTP-request before it reaches its final destination (typically a servlet, but it might be a static XHTML file, as far as the filter is concerned). Optionally, the filter can decide to do something with the request before the filter routes it on to the final servlet
  • It looks at an HTTP-response before it reaches its final destination (typically a web or WAP client). Optionally, the filter can decide to do something with the response before the filter routes it on to the client.
Of course, one can configure the filter to only apply to certain servlet or file types (this happens in web.xml. see References section for more info).

I suspect some readers are already getting what I am trying to do: I am going to use filters and the WURFL to decide if a given device supports multipart. If it does, we get a chance to grab the mark-up in the response, grab all the image files through HTTP and dynamically build a multipart message for that device, without touching the rest of the application.

Of course, there is a lot of complexity that needs to be solved to make such a filter work. The eXtreme Programming (XP) guys believe that code is the ultimate documentation. In this spirit, I'll walk you through the code step by step and explain what happens.

Multipart Filter: a walk through the code package com.openwave.developer.multipartfilter; import java.io.*; import java.net.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; //grab image files through HTTP import org.apache.commons.httpclient.*; import org.apache.commons.httpclient.methods.*; //Query the WURFL import net.sourceforge.wurfl.wurflapi.*; //to parse the WML/XHTML response import nu.xom.*; //to parse any mark-up thrown at us (even when not well formed) import org.ccil.cowan.tagsoup.Parser; : public class MultipartFilter implements Filter { protected FilterConfig config; private static CapabilityMatrix cm = null; private static UAManager uam = null; public void init(FilterConfig config) throws ServletException { this.config = config; //initialize the wurfl, unless already initialized if ( !ObjectsManager.isWurflInitialized() ) { ObjectsManager.initFromWebApplication(config.getServletContext()); System.out.println("About to initialize WURFL for use in filter"); } } /** * Called when the filter is about to be shut down. */ public void destroy() { /* noop */ } Listing 1:Initializing

Listing 1 is about initializations. We need a bunch of different packages
HttpClient from the apache foundation (retrieve images through HTTP).
WURFL API (to query the WURFL).
XOM API (to parse the mark-up and retrieve image URLs).
Tagsoup parser (to help parse the mark-up even when not well-formed!).

The init() method just makes sure that the WURFL API is initialized. Another application may have initialized it already.
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { ArrayList urls; //urls extracted from markup ArrayList mp_bits = new ArrayList(20); //multipart bits String UA; //user agent boolean do_multipart = true; HttpServletRequest http_request = (HttpServletRequest) request; String device_id; String preferred_mimetype; GetThread[] threads = null; String base_tag_href = null; String markup_mime_type = null; //float filterstart = System.currentTimeMillis(); try { cm = ObjectsManager.getCapabilityMatrixInstance(); uam = ObjectsManager.getUAManagerInstance(); //Check that capabilities multipart_support and preferred_markup are in the wurfl if (!cm.isCapabilityIn("multipart_support")) { throw new ServletException("capability 'multipart_support' not found."); } //Check the capability string if (!cm.isCapabilityIn("preferred_markup")) { throw new ServletException("capability 'preferred_markup' not found."); } //get the user agent (allow override thru queryString for debugging) if (request.getParameter("UA") != null) { UA = "" + http_request.getParameter("UA"); } else { UA = "" + http_request.getHeader("User-Agent"); } device_id = uam.getDeviceIDFromUALoose(UA); String multipart = cm.getCapabilityForDevice(device_id, "multipart_support"); String preferred_markup = cm.getCapabilityForDevice(device_id, "preferred_markup"); if ( "false".equals(multipart) ) { do_multipart = false; } if ( preferred_markup.startsWith("html_wi_imode") || preferred_markup.startsWith("html_web") ) { //Imode device or web browser: no multipart do_multipart = false; } } catch (Exception e) { String err_msg = "Error accessing WURFL: " + e.getMessage() + "\n" + e.toString(); throw new ServletException(err_msg); } //let's go for it and try to do the multipart System.out.println("Do multipart:"+do_multipart); Listing 2:Query the WURFL


Listing 2 is the part that queries the WURFL to understand if a given device supports multipart. To be extra sure that we are not sending multipart to an imode device, we also check for that. Theoretically, this should not be necessary, as long as all the imode devices are correctly configured in the WURFL. imode devices do not support multipart, even though this may change in the future.

If a device does not support Multipart, the application flow branches off to the end of the method:
	    chain.doFilter(request, response);

This is the Filter way to say that nothing special needs to be done and the response can go back unaffected.

If the device does support Multipart, we need to intercept the mark-up in the response and act on it. if (do_multipart) { // Server Response wrapper to buffer response (so we can parse the mark-up) BufferedResponseWrapper wrappedResponse = new BufferedResponseWrapper(response, config.getServletContext()); chain.doFilter(request, wrappedResponse); String mark_up = wrappedResponse.getBufferAsString(); //markup_mime_type = wrappedResponse.getContentType(); //not available before V2.4! markup_mime_type = "text/html"; int dtd_idx_wml = mark_up.indexOf("<!DOCTYPE wml PUBLIC"); int dtd_idx_html = mark_up.indexOf("<!DOCTYPE html PUBLIC"); if ( dtd_idx_wml != -1) { markup_mime_type = "text/vnd.wap.wml"; } if (dtd_idx_html != -1) { markup_mime_type = "text/html"; } if (dtd_idx_wml == -1 && dtd_idx_html == -1) { System.out.println("Cannot find DTD. This is neither WML nor XHTML"); abandon_multipart(response,mark_up,markup_mime_type); return; } Listing 3:Capture the mark-up


Without going to deep in how Filters work, we can't manipulate the response object directly. Instead, we provide our own response object that we can do things on. In listing 3, I make sure that the response is captured in a BufferedResponse object which I borrowed from a popular JSP Book (The Complete JSP Reference, by Phil Hanna). The advantage of this object is that I can get the mark-up as a simple string.

I also need to grab the MIME-type, since it could be different things (WML, XHTML or even CHTML)and I'll need it later when I create my multipart. Alas, the Response.getContentType() method only works since version 2.4 of the servlet spec, so, to make the filter work on Tomcat4, I had to hack my way around the problem and figure out the MIME type through the page DOCTYPE.
//Let's use the power of XOM to parse the mark_up and get the URLs of //images and CSS. I use the tagsoup parser to avoid problems with //DTDs downloading behind the scenes and dangling entities StreamingImgLister sil = new StreamingImgLister(); org.xml.sax.XMLReader parser = new org.ccil.cowan.tagsoup.Parser(); Builder builder = new Builder(parser, false,sil); try { long before = System.currentTimeMillis(); System.out.println("before: "+before); Document doc = builder.build(mark_up,""); long after = System.currentTimeMillis(); System.out.println("after: "+after); urls = sil.getUrls(); //base_tag_href = sil.getBase_tag_href(); System.out.println("URLS: "+urls.toString()); } // indicates a well-formedness error catch (ParsingException ex) { System.out.println("mark-up is not well-formed."); System.out.println(ex.getMessage()); //let's give up on multipart and send the mark-up as it is abandon_multipart(response,mark_up,markup_mime_type); return; } catch (IOException ex) { System.out.println(ex); abandon_multipart(response,mark_up,markup_mime_type); return; } Listing 4:Parse the mark-up


At this point, it's time to parse the mark-up and figure out the URLs of the pictures. I really had two choices:
  • treat the mark-up as a string and get Regular Expressions to do the work or
  • try to parse the mark-up the XML way using some XML API for Java.
I chose the second option for a couple of reasons: first, I was not sure that RegExps would cover all the cases one might come across (which img attributes? in which positions? can I assume that each 'img' tag appears on a single line?). Secondly, the approach I chose is more sound if we need to extend the parsing mechanism to include objects other than pictures (CSS for example).

The library I chose is XOM (the one used in the WURFL API). While XOM is meant to be easy and work out of the box for non-expert XML developers, it also offers a lot of power for advanced uses. In this particular case, I had two requirements: 'efficiency' (I need to parse the mark-up for every request coming from a multipart-enabled device) and 'forgiveness' (XHTML code may be not well-formed and I still want to complete the Multipart). XOM granted both wishes.

While XOM's default behavior is to build an object model of the XML document to allow traversal at a later stage, one can override such behavior and provide a custom NodeFactory subclass (a 'strategy' pattern, for those who are familiar with patterns) to perform extra activities while building the object model itself. I created StreamingImgLister (StreamingImgLister.java) as a subclass of NodeFactory and fed it into the XOM Builder().

This mechanism allowed me to let XOM create my ArrayList of URLs while scanning the mark-up (no need to traverse the object model. I'm only interested in the 'img' tag and 'src' attribute).

Without going into too much detail about how NodeFactory works, here is one of the very few NodeFactory methods I had to override in StreamingImgLister.java to let XOM know that I was only interested in 'img' elements.
    public Nodes finishMakingElement(Element element) {

	//get picture url
	if (element.getLocalName().equals("img")) {
	    if (element.getAttributeValue("src") != null) {
		urls.add(element.getAttributeValue("src"));
	    }
	}
      :
     }
Please refer to the StreamingImgLister code and the XOM documentation for a better understanding of this powerful mechanism.

As far as 'forgiveness' is concerned, that concept is sort of alien in the XML world. You have to account for well-formedness, DTDs and entities when parsing XML, or your system will blow up. Alas, the mark-up we find out there is often non-valid or even non-well-formed.

The solution to this problem was provided by a SAX parser called Tagsoup (see References section for more info):

"a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short"

Tagsoup does the job of parsing potentially broken XML files. To add to that, it integrates very well with XOM. While XOM tries to use the Xerces SAX2 parser by default, it gives you a chance of using a different one. Give XOM the tagsoup parser and you'll be able to handle granma's homepage even if she didn't close all the tags!

The following lines (extracted from Listing 4) deliver the robust and efficient solution I was looking for (the rest is error-handling):

	    StreamingImgLister sil = new StreamingImgLister();
	    org.xml.sax.XMLReader parser = new org.ccil.cowan.tagsoup.Parser();
	    Builder builder = new Builder(parser, false,sil);
                  :
		Document doc = builder.build(mark_up,"");
		urls = sil.getUrls();
We have our URLs. It's time to retrieve them. Just one word of caution. In the actual code which accompanies this article, I also have some lines to handle the 'base' tag in the file. I removed it from the article itself because I considered it out of scope. You may want to use this possibility in your application. In that case, you'll need to rebuild the filter. //first initialize the "multipart bits" String pageurl = http_request.getRequestURL().toString(); for (int i=0 ; i < urls.size(); i++) { mp_bits.add(new MultipartPiece((String)urls.get(i),pageurl)); } //we retrieve the files with multiple threads to minimize time try { HttpClient httpClient = new HttpClient(new MultiThreadedHttpConnectionManager()); // create a thread for each URI threads = new GetThread[mp_bits.size()]; for (int i = 0; i < threads.length; i++) { threads[i] = new GetThread(httpClient, i + 1,(MultipartPiece)mp_bits.get(i)); } // start the threads for (int j = 0; j < threads.length; j++) { threads[j].start(); } // wait for them all to finish for(int j = 0 ; j < threads.length ; j++) { threads[j].join(); } threads = null; } catch( Exception e ) { System.out.print(e.getMessage()); //let's give up on multipart and send the mark-up as it is abandon_multipart(response,mark_up,markup_mime_type); return; } Listing 5:Retrieve the URLs


I introduced a class called MultipartPiece to encapsulate information about each downloaded multipart component (typically a picture). The object is initialized using the (potentially relative) picture URL and the page URL. Internally the object does the job of figuring out the absolute url of the picture. Once the resource is downloaded, the object has space to store the MIME type and the actual content of the resource. This comes handy at a later stage.

As far as the actual download is concerned, I used the good HttpClient library from the Apache commons (see References section for more info). In my experience they are nicer to use than the java.net.* libs. Also, they offer support for multi-thread downloads.

HTTP being involved, my only option was to do the downloads in parallel. For this reason, I spawn a new thread for each URL. Admittedly, one might use a thread pooling mechanism here, but given that many JVMs and OSS recycle threads, this may be overkill. If you need to do thread pooling, I suggest that you take a look at the good libraries from Douglas Lea (which have been adopted in Java 5. These, in turn, have been retrofitted to work with JDK 1.4, in case you worry about migrating next year). Be warned that this is not a standard use of thread pooling, because you also need to track which threads are serving URLs for a certain request and which ones are serving a different request. In other words, I'll gladly leave this enhancement as an exercise for the reader.

As far as the code for the actual thread is concerned, GetThread is a direct subclass of thread which makes space for all the info it needs to perform (httpClient, thread ID and MultipartPiece). Here is the snippet of code that performs the actual download (part of the run() method, of course):
    int statusCode = httpClient.executeMethod(method);
    if( statusCode != -1 ) {
	multipartpiece.setComplete(true);
	multipartpiece.setMimetype(method.getResponseHeader("content-type").toString());

	byte[] body = method.getResponseBody();
	multipartpiece.setBytes(body);

	method.releaseConnection();
    }
We just need to start all the threads (Thread.start()) and wait for them all to complete (Thread.join()).

At this point, we have all the bits and we just need to assemble the multipart message from its components. This is a walk in the park compared to what we have been through so far. //we got everything now. It's just time to assemble the Multipart response //let's check that all the content has been retrieved correctly for (int i=0 ; i < mp_bits.size(); i++) { MultipartPiece mp = (MultipartPiece)mp_bits.get(i); if (!mp.isComplete()) { System.out.print(mp.getAbsolute_url()+" was not retrieved, aborting multipart generation"); //let's give up on multipart and send the mark-up as it is abandon_multipart(response,mark_up,markup_mime_type); return; } } response.setContentType("multipart/mixed;boundary=\"foo\""); ServletOutputStream out = response.getOutputStream(); out.write("--foo\r\n".getBytes()); //out.write("Content-Type: text/html\r\n".getBytes()); out.write("Content-Type: ".getBytes()); out.write(markup_mime_type.getBytes()); out.write("\r\n".getBytes()); out.write("Content-length: ".getBytes()); out.write(Integer.toString(mark_up.getBytes().length).getBytes()); out.write("\r\n".getBytes()); out.write("\r\n".getBytes()); out.write(mark_up.getBytes()); out.write("\r\n".getBytes()); for (int i=0 ; i < mp_bits.size(); i++) { MultipartPiece mp = (MultipartPiece)mp_bits.get(i); out.write("--foo\r\n".getBytes()); out.write(mp.getMimetype().getBytes()); out.write("Content-length: ".getBytes()); out.write(Integer.toString(mp.getBytes().length).getBytes()); out.write("\r\n".getBytes()); out.write(("Content-location: " + mp.getUrl()).getBytes()); out.write("\r\n".getBytes()); out.write("\r\n".getBytes()); out.write(mp.getBytes()); out.write("\r\n".getBytes()); } out.write("--foo--\r\n".getBytes()); out.write("\r\n".getBytes()); out.flush(); out.close(); Listing 6:Assemble the Multipart


The response needs to be binary. We open an output stream and we push in all of our bits in the format and order which we learned from previous tutorials, using the string "foo" as a separator. The only thing which remains to be seen is the method to which we delegate falling back into a normal response if anything goes wrong while building our multipart. Not much magic here: void abandon_multipart(ServletResponse response, String mark_up, String markup_mime_type) throws IOException { response.setContentType(markup_mime_type); PrintWriter out = response.getWriter(); out.print(mark_up); out.flush(); } Listing 7:This is what saves us if something goes wrong.


Filter configuration can be found in any of the filter tutorials in the reference. For your own curiosity, here is the filter I used on my installation: <filter> <filter-name>Multipart</filter-name> <filter-class>com.openwave.developer.multipartfilter.MultipartFilter</filter-class> </filter> <filter-mapping> <filter-name>Multipart</filter-name> <url-pattern>*.jsp</url-pattern> </filter-mapping> <filter-mapping> <filter-name>Multipart</filter-name> <url-pattern>*.xhtml</url-pattern> </filter-mapping> <filter-mapping> <filter-name>Multipart</filter-name> <url-pattern>*.wml</url-pattern> </filter-mapping> Listing 8:filter configuration in web.xml.


This is basically it, my friends.
We now have a filter that we can place in front of any servlet, JSP page or XHTML page we like for the pleasure of bandwidth-challenged mobile users out there. Enjoy!

The complete code is available here (unpack the file, cd into the antbuild directly and run "ant" to build. Jakarta Ant is required).
If you want to deploy the filter in action, click here to download a ZIP file for download in your Tomcat directory.(cd into your TOMCAT webapps directory, unzip and restart TOMCAT).

References
Multipart on Developer @ Openwave
http://developer.openwave.com/dvl/support/documentation/jacks_hacks/archive/04apr.htm
http://developer.openwave.com/dvl/resources/code_corner/technical_notes/multipart.htm

Servlet Filters
http://www.onjava.com/pub/a/onjava/2003/11/19/filters.html
http://javaboutique.internet.com/tutorials/Servlet_Filters/
Filter Chapter from "The JSP 2.0 Complete Reference" by Phil Hanna
http://www.philhanna.com/jspcr2/index.html

WURFL
http://wurfl.sourceforge.net/
http://wurfl.sourceforge.net/java/api.php

XOM by Elliotte Rusty Harold
http://www.cafeconleche.org/XOM/

Tagsoup Parser by John Cowan
http://mercury.ccil.org/~cowan/XML/tagsoup/

Jakarta Commons HttpClient libs
http://jakarta.apache.org/commons/httpclient/

Utilities for Concurrent Programming by Douglas Lea
http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html

Credits
There are a few people I would like to thank for their precious advice. In no particular order: Michael Abato, Hallvard Trætteberg, Elliotte Rusty Harold, Simone "Simon" Bordet, Roby, Douglas Lea, John Cowan and Wolfgang Hoschek.
 
Copyright © 2000-2008 Openwave Systems Inc.    About Us  |  Openwave  |  Terms & Conditions  |  Privacy Policy  |  Update Profile