Friday, May 01, 2009

Varnish Reverse Proxy Accelerator (Ca...

Intro:
The need to offer a mechanism to accelerate the performance of web applications could arguably be even more relevant as we move into a a more linked data approach.  With the need to possible offer multiple representations of a single resource URI to address needs for XML, RDF or XHTML+RDFa (or microformats) etc. the need to improve performance and address request from cache when possible increases.

Related:
Past efforts in this area have involved Apache Connectors to Tomcat (Ref: http://tomcat.apache.org/connectors-doc/index.html ) and also reviewing Glassfish.  Glassfish Portfolio (Ref: http://www.sun.com/software/products/glassfish_portfolio/ ) has many elements focused on performance and there is also Galssfish Grizzly (Ref: https://grizzly.dev.java.net/ ) which offers some some impressive sounding performance aspects using NIO. 

Approach:
However, to address needs of maintaining URI persistence and caching what was really needed was a "reverse-proxy accelerator".  Given that set of criteria I arrived soon at using Varnish (Ref: http://varnish.projects.linpro.no/ ).  The target application is hosted on Glassfish but obviously Varnish can reverse proxy for a number of machines and any backend talking over http.  Indeed, it's this ability that is important to me in order to address my desire to be able to alter systems and networking aspects behind Varnish and have it maintain my URI's in a "Cool URIs" style/approach (Ref: http://www.w3.org/Provider/Style/URI ). 

Following the nice tutorial by Jay Kuri, that is located on that site it is rather easy to get Varnish up and running on Linux and doing reverse proxy and caching.    I would like highlight a few things Jay, mentioned in that write up and detail a few aspects I ran into. 

One of the biggest that Jay, mentions is the issue around Varnish not wanting to cache anything that has a cookie reference.  Since almost all web based applications are going to do this as a simple session management approach (even if you are not doing accounts), Varnish by default will not cache your URIs.    Jay, goes on to recommend code like:

    if (obj.http.Cache-Control ~ "max-age") {
        unset obj.http.Set-Cookie;
        deliver;
    }

to override this behavior and respond from cache for content.  As noted this does mean that is starts to become depending on the content provider/web app developer to issue the commands to inform cache system like Varnish about the behavior we expect from them.

Setting behavior in Grails:
Using Grails (Ref: http://www.grails.org ) it's easy to set the format of our return in via the withFormat{} syntax.  Note we would want to make the approapriate entries in our Config.groovy grails.mime.types section (Ref: The "content negotiation" section of http://www.grails.org/1.0+Release+Notes )

So something like:

    withFormat {
      html {
        response.setHeader("Vary", "Accept")
        def nowPlusHour = new Date().time + 3600000
        response.addHeader("Last-Modified",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ', new Date()))
        response.addHeader("Expires",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ',
             new Date(nowPlusHour)))

        [allSites: allSites, allAutoSites: onlyAutoSites]
      }
      rdf {
          def data = modelAsRDFService.asRDF(AgeModel.findAllByLeg(params.id),
             "/loc/sites/${params.id}")

          response.setHeader("Vary", "Accept")
          response.contentType = "application/rdf+xml"
          def nowPlusHour = new Date().time + 3600000
          response.addHeader("Last-Modified",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ', new Date()))
          response.addHeader("Expires",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ',
             new Date(nowPlusHour)))
         
          response.outputStream << data
      }
    }
  }

In this code we have set the LAST-MODIFIED and EXPIRES header entries. Note in this simple example I have simply pushed the expires time ahead by one hour.  You can set this however you wish depending on your view of the relative age your resources can be and still be valid.

The VARY header is set to address some linked data best practices.  This informs the client that the representation of this resource URI can change based on how we call it.  Here it can be requested as HTML  (what is in fact XHTML+RDFa in our case..  a whole topic in itself) and RDF.

There is no builder for RDF, so I simply pass the model to a service that generates and returns this for me.   The Export Plugin (Ref: http://www.grails.org/Export+Plugin ) might be worth looking at if you are looking for various other formats to serialize your data to.  I don't set the content type for the HTML in the code above, on review I likely should though by default it is coming back as text/html so that may be fine.  

Be sure to have your RDF contain a reference to itself and its resource URI if you are doing a linked data approach. 

There is also likely other response header elements one could consider here based on the reverse proxy and caching needs.  A review of http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4 and your accelerator package may reveal other setting of value for your particular environment.

Using curl to validate behavior:
While I really love Firebug, I found it to not be the easiest to use to verify proper cache behavior by Varnish.  Rather, I used (being a bit of CLI lover), curl.  A curl request on a resource URI is made using the -D out option to capture the headers.  Two consecutive headers are shown below:

HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Server: Sun Java System Application Server 9.1
Vary: Accept
Last-Modified: Fri, 1 May 2009 16:10:00 CDT
Expires: Fri, 1 May 2009 17:10:00 CDT
Content-Type: image/png
Content-Length: 20951
Date: Fri, 01 May 2009 19:59:32 GMT
X-Varnish: 868897910
Age: 0
Via: 1.1 varnish
Connection: keep-alive

HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Server: Sun Java System Application Server 9.1
Vary: Accept
Last-Modified: Fri, 1 May 2009 16:10:00 CDT
Expires: Fri, 1 May 2009 17:10:00 CDT
Content-Type: image/png
Content-Length: 20951
Date: Fri, 01 May 2009 19:59:34 GMT
X-Varnish: 868897911 868897910
Age: 2
Via: 1.1 varnish
Connection: keep-alive

Note the two values in the X-Varnish response header entry indicating to us that the content came from the previously cached content.  If we continually saw only one number here would need to investigate why our request is being passed through to the backend server.

Conclusion:
Varnish, with a few considerations related to explicate cache control instructions provides a nice acceleration to web application performance.  As a from the start reverse proxy (vs something like Squid which started life as a forward proxy) it plays a valuable roll in linked data approaches by allowing changes to systems and networks behind the scene while allowing URI persistence to be maintained.  I've also had no issues with it with respect to 303 redirects on generic documents, also important in linked data approaches.