Friday, May 01, 2009

Varnish Reverse Proxy Accelerator (Ca...

Intro:
The need to offer a mechanism to accelerate the performance of web applications could arguably be even more relevant as we move into a a more linked data approach.  With the need to possible offer multiple representations of a single resource URI to address needs for XML, RDF or XHTML+RDFa (or microformats) etc. the need to improve performance and address request from cache when possible increases.

Related:
Past efforts in this area have involved Apache Connectors to Tomcat (Ref: http://tomcat.apache.org/connectors-doc/index.html ) and also reviewing Glassfish.  Glassfish Portfolio (Ref: http://www.sun.com/software/products/glassfish_portfolio/ ) has many elements focused on performance and there is also Galssfish Grizzly (Ref: https://grizzly.dev.java.net/ ) which offers some some impressive sounding performance aspects using NIO. 

Approach:
However, to address needs of maintaining URI persistence and caching what was really needed was a "reverse-proxy accelerator".  Given that set of criteria I arrived soon at using Varnish (Ref: http://varnish.projects.linpro.no/ ).  The target application is hosted on Glassfish but obviously Varnish can reverse proxy for a number of machines and any backend talking over http.  Indeed, it's this ability that is important to me in order to address my desire to be able to alter systems and networking aspects behind Varnish and have it maintain my URI's in a "Cool URIs" style/approach (Ref: http://www.w3.org/Provider/Style/URI ). 

Following the nice tutorial by Jay Kuri, that is located on that site it is rather easy to get Varnish up and running on Linux and doing reverse proxy and caching.    I would like highlight a few things Jay, mentioned in that write up and detail a few aspects I ran into. 

One of the biggest that Jay, mentions is the issue around Varnish not wanting to cache anything that has a cookie reference.  Since almost all web based applications are going to do this as a simple session management approach (even if you are not doing accounts), Varnish by default will not cache your URIs.    Jay, goes on to recommend code like:

    if (obj.http.Cache-Control ~ "max-age") {
        unset obj.http.Set-Cookie;
        deliver;
    }

to override this behavior and respond from cache for content.  As noted this does mean that is starts to become depending on the content provider/web app developer to issue the commands to inform cache system like Varnish about the behavior we expect from them.

Setting behavior in Grails:
Using Grails (Ref: http://www.grails.org ) it's easy to set the format of our return in via the withFormat{} syntax.  Note we would want to make the approapriate entries in our Config.groovy grails.mime.types section (Ref: The "content negotiation" section of http://www.grails.org/1.0+Release+Notes )

So something like:

    withFormat {
      html {
        response.setHeader("Vary", "Accept")
        def nowPlusHour = new Date().time + 3600000
        response.addHeader("Last-Modified",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ', new Date()))
        response.addHeader("Expires",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ',
             new Date(nowPlusHour)))

        [allSites: allSites, allAutoSites: onlyAutoSites]
      }
      rdf {
          def data = modelAsRDFService.asRDF(AgeModel.findAllByLeg(params.id),
             "/loc/sites/${params.id}")

          response.setHeader("Vary", "Accept")
          response.contentType = "application/rdf+xml"
          def nowPlusHour = new Date().time + 3600000
          response.addHeader("Last-Modified",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ', new Date()))
          response.addHeader("Expires",
            String.format('%ta, %<te %<tb %<tY %<tH:%<tM:%<tS %<tZ',
             new Date(nowPlusHour)))
         
          response.outputStream << data
      }
    }
  }

In this code we have set the LAST-MODIFIED and EXPIRES header entries. Note in this simple example I have simply pushed the expires time ahead by one hour.  You can set this however you wish depending on your view of the relative age your resources can be and still be valid.

The VARY header is set to address some linked data best practices.  This informs the client that the representation of this resource URI can change based on how we call it.  Here it can be requested as HTML  (what is in fact XHTML+RDFa in our case..  a whole topic in itself) and RDF.

There is no builder for RDF, so I simply pass the model to a service that generates and returns this for me.   The Export Plugin (Ref: http://www.grails.org/Export+Plugin ) might be worth looking at if you are looking for various other formats to serialize your data to.  I don't set the content type for the HTML in the code above, on review I likely should though by default it is coming back as text/html so that may be fine.  

Be sure to have your RDF contain a reference to itself and its resource URI if you are doing a linked data approach. 

There is also likely other response header elements one could consider here based on the reverse proxy and caching needs.  A review of http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4 and your accelerator package may reveal other setting of value for your particular environment.

Using curl to validate behavior:
While I really love Firebug, I found it to not be the easiest to use to verify proper cache behavior by Varnish.  Rather, I used (being a bit of CLI lover), curl.  A curl request on a resource URI is made using the -D out option to capture the headers.  Two consecutive headers are shown below:

HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Server: Sun Java System Application Server 9.1
Vary: Accept
Last-Modified: Fri, 1 May 2009 16:10:00 CDT
Expires: Fri, 1 May 2009 17:10:00 CDT
Content-Type: image/png
Content-Length: 20951
Date: Fri, 01 May 2009 19:59:32 GMT
X-Varnish: 868897910
Age: 0
Via: 1.1 varnish
Connection: keep-alive

HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Server: Sun Java System Application Server 9.1
Vary: Accept
Last-Modified: Fri, 1 May 2009 16:10:00 CDT
Expires: Fri, 1 May 2009 17:10:00 CDT
Content-Type: image/png
Content-Length: 20951
Date: Fri, 01 May 2009 19:59:34 GMT
X-Varnish: 868897911 868897910
Age: 2
Via: 1.1 varnish
Connection: keep-alive

Note the two values in the X-Varnish response header entry indicating to us that the content came from the previously cached content.  If we continually saw only one number here would need to investigate why our request is being passed through to the backend server.

Conclusion:
Varnish, with a few considerations related to explicate cache control instructions provides a nice acceleration to web application performance.  As a from the start reverse proxy (vs something like Squid which started life as a forward proxy) it plays a valuable roll in linked data approaches by allowing changes to systems and networks behind the scene while allowing URI persistence to be maintained.  I've also had no issues with it with respect to 303 redirects on generic documents, also important in linked data approaches. 

Monday, February 16, 2009

Neptune Database Services and Scripting:

The use of services on the net has on interesting set of issues to address with respect to acting as generic data resources.  Doing mash ups with RSS or ATOM feeds or weather or other somewhat periodic and serial data has become rather common.  Yahoo Pipes (Ref: http://pipes.yahoo.com ) and other such web applications do a good job dealing with these types of data end points. 

However, when working with scientific data sets and services like those at CHRONOS (Ref: http://portal.chronos.org/gridsphere/gridsphere?cid=res_dev ) a different set of use issues comes up.

Let's look at a basic XML-RPC styled service endpoint for the Neptune database at CHRONOS (Ref: http://www.chronos.org ):

http://services.chronos.org/xqe/public/chronos.neptune.samples.advanced?callback=displayNexus&time_range=33-34&fossil_group=Diatoms&include_dated=true&serializeAs=wrs&

Here I have used a service from the XQE engine that has set the time range to 33-34 Ma,  fossil group to Diatoms, indicated that only dated items should be used and set the output serialization to web row sets (generic XML). 

However, let's say by use of a recent example, we wanted to locate all the unique taxon names and drill holes from various time ranges and fossil types from the CHRONSO Neptune database.  We now need to loop on a set time times and names and then locate the unique, not full, set of names and locations.  We can do this with the service above, but not directly.  We need to collect and process a range of calls on that service to achieve this. 

One could develop a small program in Java, Ruby Groovy or some other language.  Dynamic languages like Groovy are especially well suited for such efforts and a a simple script like:
10   //Upper Oligocene   Sub-Series/Sub-Epoch    23.8    28.5    Oligocene 
11
//Lower Oligocene Sub-Series/Sub-Epoch 28.5 33.7 Oligocene
12
//Upper Eocene Sub-Series/Sub-Epoch 33.7 36.9 Eocene
13
//Middle Eocene Sub-Series/Sub-Epoch 36.9 49 Eocene
14
//Lower Eocene Sub-Series/Sub-Epoch 49 55.05 Eocene
15
//Paleocene Series/Epoch 55.05 65 Paleogene
16
17
import sun.net.www.http.HttpClient
18
import org.apache.commons.httpclient.methods.GetMethod
19
20
def ageRanges = ['23.8-28.5', '28.5-35.7', '33.7-36.9', '36.9-49', '49-55.05', '55.05-65']
21
def fossilTypes = ["Diatoms", "Planktonic+Foraminifera", "Radiolarians", "Calcareous+Nannoplankton"]
22
23 fossilTypes.each() {type ->
24 ageRanges.each() {range ->
25
def urlxml = "http://services.chronos.org/xqe/public/chronos.neptune.samples.advanced?callback=
displayNexus&time_range=$
{range}&fossil_group=${type}&include_synonyms=true&include_dated=true&serializeAs=wrs&"
26
27
def client = new HttpClient()
28
def get = new GetMethod(urlxml)
29 client.executeMethod(get)
30
31
def xmlSlurped = null
32
try {
33 xmlSlurped =
new XmlSlurper().parseText(get.getResponseBodyAsString())
34
35
def entries = xmlSlurped.data.currentRow
36
def legSiteHole = []
37
def bug = []
38 entries.each {entry ->
39 legSiteHole +=
"${entry.columnValue[3]}_${entry.columnValue[4]}_${entry.columnValue[5]}"
40 bug +=
"${entry.columnValue[11]}"
41 }
42
43 println
"Total tax count is ${bug.size()} and total LSH count is ${legSiteHole.size()}"
44 println
"In range " + range + " of type " + type + " found " + bug.unique().size() + " unique
taxa in "
+ legSiteHole.unique().size() + " unique leg/site/holes"
45 println range +
" " + bug.unique().size()
46 }
catch (org.xml.sax.SAXParseException e) {
47 println
"Exception parsing XML"
48 }
49 }
50 }
This script uses the Apache Commons HttpClient, but there are many way to do this.  The point I am after is not to show a best practice in using Groovy to access services  (I'm sure there are many aspects of that code that people could comment on).  Rather it is to highlight an issue related to balancing service generality and usefulness.

Note I did attempt to use both the XML and the character separated values
version via Yahoo Pipes and also the defunct Google Mashups Editor. 
However, the need to have a more programmatic interface to conduct
loops, basic array manipulation and such is beyond what these can do
easily.  While it might be possible, I did not find it easy or intuitive.  These resource seem more focused on RSS or ATOM sources and simple filtering and counting. 

Additionally the
Kepler work flow application is a bit of over kill and doesn't yet seem to
work well with REST or XML-RPC style services as well as I feel it should.   A heavy focus on WS* and other more heavy duty scientific data flow operations mean it's lightweight rapid service mashup capacity is limited (IMHO). 

Making a resource or service of general "popular" use may make it of generic interest but can limit its utility on its own to address specific scientific efforts.

Expecting generic services to be of use via the implementation of packages like Yahoo Pipes or Kepler runs into implementation aspects of those tools.  Aspects which may make use of them sufficiently complex as too discourage users. 

However, expecting people to write code like the example above, even in a language like Groovy that makes calling, using and parsing the data rather easy is also potentially unrealistic.  Not everyone likes to write code.

So, there is a dilemma for service providers of science data like CHRONOS.  Make the services too generic and you risk them being of little use to focused research communities.  Address the vertical needs of a specific community or tool interface and risk making them of limited utility to others.  Additionally addressing the needs of several vertical communities could be taxing from a human resource point of view. 

One does wonder if the creation of a domain specific language (DSL)  might have benefit to address this.  If there was enough community interest to justify it's creation a service that consumes and process a DSL would allow a sort of "custom command line interface. 

The Sloan Digital Sky Survey  SkyServer (Ref: http://cas.sdss.org/astrodr7/en/ ) addresses this by simply using SQL as that "DSL" language.  The page http://cas.sdss.org/astrodr7/en/tools/search/sql.asp allows one to structure and submit SQL directly to the database.   

However, a more focused DSL might be able to address the needs of special research groups while consuming more generic service end points that could be exposed for other to use in similar approaches to the code above, via the DSL or their own approach. 

Also the development effort in making such a DSL might be able to be spread across multiple efforts making it more appealing from a human resource point of view.

Just some ramblings on how to balance generic services (resources) vs the need of special communities for more focused and unique services.  Regardless if they are REST, XML-RPC or WS* in nature. 

Other references:
Kepler Workflow Application: http://kepler-project.org/
Google Mashup Editor  http://code.google.com/gme/   (defunct)
XPROC  http://xproc.org/   and  http://www.w3.org/TR/xproc/
GRDDL: http://www.w3.org/2004/01/rdxh/spec
DSL in Groovy:  http://docs.codehaus.org/display/GROOVY/Writing+Domain-Specific+Languages

Tuesday, December 02, 2008

REST SOA and multi-parameter databases

Thoughts on REST and SOA's for database access with multiple parameters

Following that a URI represents a "resource" then it can become an involved process to create a REST compliant approach to RDBM's based resources that allow users to define multiple parameters to request results in a Resource Oriented Architecture (ROA) approach. 

One begins to view "Plain old XML" (POX) defining parameters and constraints passed over HTTP as a REST approach.    However, this tends to border on a basic RPC approach.  Especially when the endpoint that receives the POX represents a method to run.  Thus we are more Services Oriented Architecture (SOA) now than ROA.

Alternately a URI template approach quickly becomes complex and also very RPC like when we simply define a syntax for the parsing of a URI into what is honestly still a method call.  There is also the issue that even something as simple as .../resource/facetA/facetB is different to our caching and general web architecture stack than ../resource/facetB/facetA even if the result (body) is both associative and commutative in operation with respect to these facets.  Arguably a URI doesn't need to be this way, but that seems to imply a URI template again and then it seems there is prior knowledge involved server side.     

One could argue that an  aspect of the uniform interface approach in REST is that resources are manipulated via model transforms.  So a URI represents a resource and we operate on these resources through application of transformations.  Obviously this raises visions of resources as XML and transformations being applied via XSL.  Whether this is a valid REST/ROA approach remains as an argument I suspect (or does it?).  

However, if the goal of the architecture is to allow a user to generate a result based on arbitrary parameters and constraints and then allow the capacity to pass a representation of these results along to another user as a resource then the results of the initial request must result in the generation of a new resource.  This new resource being a representation of the various parameters and constraints requested from the database. 

Defining that as a constraint of the architecture we are looking for then there are perhaps two questions to answer:

  1. If the endpoint we are passing our parameters to is a method then we are RPC and likely should just call ourselves a SOA and move on. 
    • If "yes" to 1 then could the result of this RPC/SOA call generate a resource that could then be passed along on a ROA side of this architecture?
  2. If we believe a REST approach is useful here can we define a "resource" that provided a transform or set of operations results in a set of resources that represent our "results"
    • One has to be careful not to "create" something we call a resource that simply is a method anyway and thus violates a ROA approach anyway.  
  3. Does  a REST approach imply requesting resources with individual parameters/facets and then any subset/intersection or other transform is duty of the client to perform. The results of any multi-parameter request is a set of resource links,
    not the data.  The data the client will use (plot, animate, etc.) comes
    from those resource links.
    • This puts undue (or not) burden on the client for large data sets
    • This makes passing simple representations of resulting data set to other users/machines problematic. 

However, REST is optimized for large-grain hypermedia data transfer (Ref: http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm ) and thus one arrives at this point wondering if this posting is discussing a fine grained approach to data that is not the forte of REST.

One has to consider WS* if you are allowing for RPC anyway.  If you are going to open the door for RPC, then why not just take on
the WS* stack and benefit (and suffer) from the formality of that?  One can argue about the conspiracy
of tool developers in the evolution of WS* all you want.  However, in
the end you are attempting to get a goal achieved and if your goal is fine grained parameter access to database over HTTP then perhaps the SOA/WS*/SOAP approach is what you need (even if it's not what you want).  

Clearly though WS* brings a level of complexity that is not welcome in many places.  So then, as mentioned at the start of this post, one looks to POX over HTTP via POST as an interesting approach.  

There is nothing that says the result (or an addition by product) of this service call can't include the generation of a resource or that a resource creation could be explicitly requested when passing a representation/resource is desired.  The implication being that such a resource is not implicitly generated for each service call.  The "Layered System Constraint" allows us to build up more
representations (views) into our resources to accommodate architectural
approaches needed in our effort. 

A process file representing a workflow to be conducted can be used to pass to others to allow for "gettable" results.  Or could be POST'ed back to the server (another server) to generate a "gettable" result.  How then is this "workflow document" constructed?  JSON ala CouchDB views (Ref: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views?action=show&redirect=Views ), XProc (Ref: http://www.w3.org/TR/xproc/ ), GRDDL (Ref: http://www.w3.org/2004/01/rdxh/spec ) transform links, other?  The exact nature of the construction isn't important at this level in the discussion. 

If application state is the purview of the client, then the client is
free to use any method to maintain that state.  Itself, the involved
application server or some other application server.




Implications for HATEOAS (hypertext as the engine of application state)?


  • links can be implicate or explicate
  • links in mediums other than HTML (examples)

So then if the issue of multiple parameters is one that is fine grained and best approached by SOA while a resource driven approach is naturally ROA in nature something like the following can be seen as a zeroth order start on database exposure in a hybrid SOA/ROA approach. 

ROA

Represented by URI's like:

.../res/paramter
(a large number of resources returned using HATEOAS to navigate the result tree to data)

.../res/ID
(a single resource data represented based on request type)

Can use the standard CRUD REST mapping (and all its good and bad points) to manipulate these resources and mimetype/request type approaches to alter the serialization of the data.

ROA with rendered views*

Represented by views on the resources like

.../res/age/10
(implying all elements 10Ma or older)

.../res/depth/345
(implying 345 mbsf or deeper)

../res/fossilegroup/nanno
(resources with data from fossil group Nanno)

../res/taxon/X
(resources with data about taxon X)

.../res/ics2004/earlyjurasic
(resource with data from early Jurassic as defined by ICS 2004 timescale)


SOA (WS*, POX via POST, etc.)

Standard SOA method calls via SOAP envelopes. 
(lots of baggage)

POX over HTTP

JSON plus REST

or even just pushing .JS in general over to a server ala Perserver or FeatherDB. 
* All these URI examples are based on paleontology/geology approaches associated with the kind of data CHRONOS works with and may be somewhat opaque at first glance.



Resource Oriented Architecture

"Classic" REST in a ROA approach.  URI point to individual resource which return their data.  The "view" of the data may change based on the request type and other elements of the web architecture also are there to address caching (e-tags) and other elements of scaling.  Implementation wise this could be Jersey or CouchDB or many things.   

Resource Oriented Architecture with dynamic creation of views

Adding the ability to create dynamic views is a new level.  While CouchDB supports views they have to be registered and indexed.  Other packages like Perserver (Ref: http://www.persvr.org/ ) and FeatherDB allow for views to be dynamically created and run via REST calls. The performance and scalability issues of allowing such dynamic view creation would need to be evaluated.   Also the views themselves tend to be Javascript based or based on JSON query/path expression syntax.  Though this is not always the case and it should be noted that CouchDB, though requiring prior view creation, supports a wide range of languages for views. 
ROA / SOA hybrid
If the goal as stated is to allow fine grained multi-parameter access to a database while leveraging off the scaling and ease of use aspects of REST/ROA then a review of various approaches to a ROA/SOA hybrid is beneficial.  Obviously there is a high degree of coupling between the service (method) and client in all these cases. 

Seq A  (One call, all the data returned)
Classic service call as either XML-RPC, WS*, POX over HTTP (RPC), JSON over HTTP (RPC) or however you need to communicate a request to a method and get your results back.  Whether the method is directly exposed or a more document centric approach where an XML or JSON package is processed is not relevant. 

Seq B  (One call for ID collection then N calls for matching resources)
Method call made but rather than data a collection of ID's for matching resource are returned.  The client then makes further calls for each matching resource.

Seq C (One call for ID collection, One call for collection resource, N calls for matching resources)
Method call made which results in the creation of a new resource.  The ID of that resource if returned to the client which then calls to it to gather the associated resource ID that are called to return the requested data. 

Comments on these approaches
Obviously each version is getting more and more complex and involving more and more network calls and the inherit latency that creates.  Seq C does result in a new resource that can retried via REST compliant GET and can then return a HATEOAS compliant collection of resources to be retrieved

Accessing (opening and closing) N number of resources over the network is also more involved than returning one large document of all the resources combined.  The first approach does lend itself well to a scaled out architecture though.  One might consider the utility of AtomPub (Ref: http://www.atompub.org/) in this architecture.

Not mentioned yet but important especially with creating new resource dynamically is the use of http codes (Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ).  In particular 201 Created or 303 See Other may be relevant.  Obviously using the existing web architecture as much as possible has value.  The general shunning of the web architecture by WS* services has not been to its benefit in the authors opinion. 

Conclusion
The large grained focus of REST makes it hard to accommodate access to database with a large number of parameters and criteria exposed to clients for them to construct queries against. 

SOA based approaches that are more RPC in nature match this need better but introduce strong coupling between client and server and are generally more complex to create, invoke and evolve. 

An architecture than combines ROA and SOA approaches is not impractical

  • A ROA (REST) approach for easy and highly scalable access to data based on basic parameters and a number of pre-defined "views" to the most common queries
  • Service/method calls to support multi-parameter calls to a data source to allow arbitrarily complex requests
  • Service/method calls that allow new "views" to be registered when it is felt than they represent a new and potentially popular view to the data or when a passable, REST compliant resource creation is desirable

How an implementation of this might be constructed is a future topic.
take care
Doug

Refs used for this document:
http://www.slideshare.net/alan.dean/separating-rest-facts-from-fallacies-presentation/
http://josh-in-antarctica.blogspot.com/2008/11/restful-query-urls-cont.html
http://www.eherenow.com/soapfight.htm
http://www.informationweek.com/blog/main/archives/2008/08/rest_vs_soap_ro.html
http://www.25hoursaday.com/weblog/2008/08/24/RESTfulJSONBringingRESTAndRPCCloserTogether.aspx

Tuesday, November 18, 2008

REST via URI's and Body Representations

I've been thinking about REST URI's and their structure for a while now and been talking with my friend Josh about these.  As an exercise to try to codify my own views on this I have written up some of my thoughts on this topic. 

URI templates:
First order approach to REST is the URI used each day via GET
 
Good:
  • Can become book marks (thus IM'd, twitterd, and emailed to ones content)
  • Simple and easy to code to in a variety of languages and API's
  • Classic resource representation
Bad:
  • Limited ability to pass multiple parameters
  • Limited ability to deal with more than one facet  (at least with the example template below, any template could be established, but then everyone calling needs to know it)
  • Require a template even to pass one parameter
  • Can easily start to try and carry to much info with all sorts of verbs and values

So by example a URI template might look like:

.../resource/facet   (where the facet is some value like ../lithology/sand)

Then a template could define something like

.../resource/facet/[match]   Given a single value after facet, try to match it (equals or substring?)
.../resource/facet/[/min/max]  Given two values, assume they are a numerical range to search for

Of course, the issues already start to build up.  In the single value case we could just try to match but if we want a sub-string match then we are assuming we are not in a numerical environment.  Or we are requiring the implementation of the service to check for primitive data types which may or may not work.  In the second case our min max has to be sure to address numeric primitives (easy to check ) but also we may want to address non-numeric ranges (January/March)

There is also an interesting use of language that can be done here.  Where resource (singular) requires an id to define a single resource.  Conversely, a plural resources then would indicate an additional facet (either plural or singular) to carry the template forward.

So:
.../resources/sand   (all resources with sand)
.../resource/34/sand  (sand attribute (facet) of resource 34)

The use of singular and plural attributes in URI templates seems logical.  As a complete aside I have often wondered if a more Latin style grammar would allow more descriptive URI's to be formed since word order is less important and meaning is carried in word itself.  Totally impractical though of course. 

URI templates with representation in the body
So if our template is not enough, exploit the fact that the web architecture doesn't just pass URI's but additional elements in a request/response including headers and a body(representation).

So now our template (../resource/facet) might get a POST call with a payload of XML or JSON (pick your flavor I guess, how about Microformats in REST ).  That payload is free to define a wide variety of parameters and actions.

Good:
  • Far more flexible in terms of defining what we want to do with a resource facet
  • A response payload can define downstream (next) actions to take.  Thus giving us a kind of client moderated work flow (ie, 201 created or 303 see other next steps to take)
Bad
  • These workflows really can't be very complex since we are "waiting around" for the completion with our session.  Really only good for events which are "quick". *
  • The XM or JSON schema/language for the payload has to be agreed on before hand (though this is really true of the URI template too)

As an aspect of this the service might establish (via the 201 or 303 codes) a new URI that is a GET'able representation of the response of this (complete with etag and everything). The use of status codes (Ref:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ) needs to be
integral part of any REST approach and REST clients need to realize
that if they are going to use the web architecture to get a resource
they must look for and deal with the status codes of that architecture.


However, there are issues with the service provider caching requests and responses:

  • How long does the service provider keep this?
  • A service may be called thousands of times by a single person in a sinle session, it doesn't scale (does it?) to keep and etag them all
  • Just let the web arch do the caching

Perhaps, one might allow this stored representation of the response to be the URI plus the POST'ed request and then if some does a GET on that URI representing this earlier POST request allow whatever web caching there is to address that scaling  Ie:

POST request to ../resource/facet 
response could be the actual data with a 201 created URI for future referencing.  That new URI, say (../resource/request/[id]) could then in the future be called via GET.  All the service provider is storing in that case is the URI plus the request payload, since the results would be rebuilt by the service engine (and then available for near term caching via existing web architecture).  Etags (hash sums) could be used to ensure requests are to a specific resource. 

One element I have been working to deal with in REST approaches is the whole "hypermedia as engine of application state".  It's a guiding principle and yet sometime hard to understand what exactly it is or if an approach is in compliance with it at times.

One approach would seem to be to say that all elements of a response must define down stream references as URI's or other (are there other?) hypermedia compliant references.  I am not sure if if an img tag or microformat or RDFa embedded content is compliant with this "application state" approach.  Also, how exactly does this impact the use use of XML or JSON.  While XML could be embedded into XHTML or similar effort done with microformats, does the use of JSON in the response to a REST URI mean that "hypermedia as engine of application state" is not even an option?  JSON embedded in XHTML or microformats perhaps might address this but then one argument the JSON proponents make is that they are tossing out the cruft.  One could also place the URI into the JSON as a value.  (Josh, I think you are looking into this..  I will be curious to read about)

Indeed it's also interesting to look at GRDDL and note how it approaches the walking and extraction of information (RDF) from documents (like XHTML) when looking at issues of REST. 

take care
Doug

Ref:
http://josh-in-antarctica.blogspot.com/2008/10/restful-query-urls.html
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
http://oreilly.com/catalog/9780596529260/
http://www.infoq.com/articles/tilkov-rest-doubts
http://www.infoq.com/articles/mark-baker-hypermedia
http://microformats.org/wiki/rest/json#Proposals
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
http://www.w3.org/2004/01/rdxh/spec

*  With regard to only good for quick work flows..  one must agree that it is easy to send in an event that may take a long time to run and simply reqister a message to a queue (ATOM feed perhaps) that has the current status and the next event on status update to undertake.  This turns the burden of monitoring and moving messages through the queue to the client in this case, but it does allow for a disconnected work flow in compliance with web architectural approaches. 

Thursday, November 06, 2008

JSR-311 and JPA with Intellij 8

IntelliJIDEA 8 recently came out and I wanted to take the time to try out a couple of interesting technologies as a means to test out the new version.  Specifically JSR-311 (Via Jersey ) and Java Persistence API (JPA) via Hibernate.  I created a simple application using these two technologies.   The data store I used in this was a PostgreSQL database with lithological data.  Note I actually did this with version 8.0 M1 of IntelliJ (build 8664) and I will go on the assumption nothing changed in this regard to the 8.0 final annouced just this last day or so. 

The first step was to register up the database as a datasource.  Under the TOOLS menu is the DATA SOURCE... menu item.  Selecting this allow us to register our database as a source through the dialog below:


Once this is done we can go ahead and create out new project.  You wouldn't have to register a data source first, but for sake of a canonical "10 minute demo" the net seems to love it makes life a bit easier as we are about to see.

Go ahead and start a new project with FILE -> NEW PROJECT.  Use the default "Create project from scratch" and name your project.  We will be building a "Java Module", the default option when creating a project.  Pick the default option for the src directory and when you get the "New Project" window selections options like in figure 2 below:


I did NOT select "Hibernate" at this stage, though I did select it under the pull down for the "JavaEE Persistence" option.  Also, under "Web Application" -> "WebServices" select the "Jersey" option.

The project will start to be fleshed out and you will be prompted to import the database schema (assuming you selected the "Import database schema" option).  Now select the source you created via the "Data Source Properties" dialog.  For me this was "Lith".   This database is just a test database for now so it consists of only a few tables with no defined relations at this time.  For purposes of this simple demo that should be fine. 

Go ahead and select a datasource and define a package name for the entities to be generated into. 



You will be prompted that the OR mapping is about to be generated, go ahead and let it start. Depending on your database and your machine this will take a minute or so. 

Once this was generated I was confronted with an Error that the org.hibernate.ejb.HibernatePersistence class/package could not be resolve for its reference in the persistence.xml file that was generated for us by this process.  Long story short I went to http://www.hibernate.org/6.html and downloaded the annotations, entitymanager and core (distribution) packages from here.  

Setting up the libraries (as is often the case) was the only real tedious part and in the end the library collection looked like the following figure:
NOTE:   I have to wonder if I would have selected "Hibernate" in the "New Project" dialog above (leaving the import and class generation option unchecked) if Intellij would have imported the Needed Hibernate libraries for me.
NOTE 2:  Don't forget to add in your database driver too  (not that I did that or anything)   



Parallel to all this JPA/Hibernate stuff Intellij has created a simple JSR-311 (Jersey) class for us with the following default structure:

package example; 

import com.sun.net.httpserver.HttpServer;
import com.sun.jersey.api.container.httpserver.HttpServerFactory;
import java.io.IOException;

import javax.ws.rs.GET;
import javax.ws.rs.ProduceMime;
import javax.ws.rs.Path;

// The Java class will be hosted at the URI path "/helloworld"
@Path("/helloworld")
public class HelloWorld {
// The Java method will process HTTP GET requests
@GET
// The Java method will produce content identified by the MIME Media type "text/plain"
@ProduceMime("text/plain")
public String getClichedMessage() {
// Return some cliched textual content
return "Hello World";
}

public static void main(String[] args) throws IOException {
HttpServer server = HttpServerFactory.create("http://localhost:9998/");
server.start();

System.out.println("Server running");
System.out.println("Visit: http://localhost:9998/helloworld");
System.out.println("Hit return to stop...");
System.in.read();
System.out.println("Stopping server");
server.stop(0);
System.out.println("Server stopped");
}
}

I wont bother to break down the structure or go into the various annotations used in JSR-311.  You can Google up quite a bit of material on all that of far higher quality than I could produce.  Starting at the Jersey site (https://jersey.dev.java.net/ ) is as good a place as any as I think it's likely the most evolved JSR-311 implementation at this time. 

For simplicity we will leave the main() method alone and modify a few other elements in this class.  First I changed the class level @Path annotation to:

@Path("/lith);

and also added in a create and close method for the JPA EntityManagerFactory.  For fun I modded the getClicedMessage to parse out the URI path sent to it via a:

@Path("location/{latlong}")  annotaion along with a @PathParam("latlong") String latlong annotation.  The later requires the javax.ws.rs.PathParam import. 

So our final code looks like this  (interesting parts in bold):

package example;

import com.sun.jersey.api.container.httpserver.HttpServerFactory;
import com.sun.net.httpserver.HttpServer;

import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.persistence.Query;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.ProduceMime;
import java.util.List;

@Path("/lith")
public class HelloWorld {

    private EntityManagerFactory emf = null;

    protected void createEMF() {
        emf = Persistence.createEntityManagerFactory("NewPeristenceUnit");
    }

    protected void closeEMF() throws Exception {
        emf.close();
    }


    public String doQuery(String latlong) {
        createEMF();
        StringBuffer sb = new StringBuffer();
        EntityManager em = emf.createEntityManager();
        Query query = em.createQuery("select c."+latlong+" from CmpEntity c");
        List<Double> list = query.getResultList();
        for (Double c : list) {
             sb.append(latlong + ": " + c.toString() + "\n");
        }
        try {
            closeEMF();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return sb.toString();
    }

    @GET
    @Path("location/{latlong}")
    @ProduceMime("text/plain")
    public String getClichedMessage(@PathParam("latlong") String latlong) {
        return doQuery(latlong);
    }


    public static void main(String[] args) throws Exception {

        HttpServer server = HttpServerFactory.create("http://localhost:9998/");
        server.start();

        System.out.println("Server running");
        System.out.println("Visit: http://localhost:9998/helloworld");
        System.out.println("Hit return to stop...");
        System.in.read();
        System.out.println("Stopping server");
        server.stop(0);
        System.out.println("Server stopped");
    }
}

A few things to note:
  • The List<Double> list = query.getResultList();  is not good(tm) as we have an unchecked assignment of java.util.List to java.util.List<java.lang.Double>    Better would be to use something like List<CmpEntity> in this case and alter our query to something like Query query = em.createQuery("select c from CmpEntity c");  However, I ran into some issues with return in my entity class being null.  Perhaps I needed to allow certain columns to be null, I am not certain of that, but I was more interested in the path than resolving the query.  Likely I will resolve this issue as the next step in this experiment. 
  • Parallel to the aboe the point my List<Double> work since I know that the return from the query is a Double
  • I altered things a bit in the above example so that I could use both /lith/location/longitude and /lith/location/latitude as calling URI's.   This works of course since I know ahead of time what my column names were that I wanted to use for this test and that they were of type Double to address the above two points.  This is whay I use the @Path and @PathParam annotations in the getClichedMessage method and strip out and pass along elements of the URI. 
  • Like you would NOT want (ie DO NOT WANT) to use the column names in your mapped data source in your REST URI.  You would establish your URI template to whatever degree you wished and then use a more logical approach to generating the structure and content of your services and their replies.  This post is talking about the pipeline and is not worried about the very important architectural issue of mapping resources to URI's.  Take a look at my friends Josh's post on that topic and the nice InfoQ How to GET a cup of Coffee posting about all that.

One last important point in all this.  I got an error: No identifier specified for entity.  When I looked at my generated entity classes all the columns recieved a @Basic and @Column(...) annotation.  At least one however needs to have a @Id annotation.  Reference the Hibernate FAQ at http://www.hibernate.org/329.html#A6 for more info.  I had a unique integer column I could do this to and so I alterd my effected entity classes with @Id @Column(...) on one of the methods.  You, however, should not see this as I suspect it was due to the fact I never botherd to set a unique key column in that table on my database (don't tell my DBA). 

That's about it.  At this point you should be able to run the project and make calls and get data out.  The built in REST client for IntelliJIDEA 8 is quite nice though perhaps not as advanced as the nice rest-client project.  In my case I parsed out the URI just for the sake of some fun.  With all the elements connected one can move on to more interesting "real world" applications of the JSR-311 and JPA API's used in this simple demo.


Thursday, June 19, 2008

Sproutcore on Ubuntu 8.04

I was interested in the Sproutcore (ref: http://www.sproutcore.com) package that got some attention from the Apple developers conference. (AppleInsider article / MacBreak weekly podcast where Sproutcore is talked about)

Getting it to work on Ubuntu (verison 8.04 for me) took a few steps I thought I would place here. Of course first you need to make sure you have Ruby gems and Rails installed.

apt-get install rubygems rails

one items you may miss to install though is the dev package. Be sure to do

apt-get install ruby1.8-dev

This will show itself when Ruby trys to build native extensions with an error like:

extconf.rb:1:in `require': no such file to load -- mkmf (LoadError)

once you have this done you can do:

gem install sproutcore

and accept all its questions (it will take a while as several packages need to be installed)

At this point you will need to:

cd /var/lib/gems/1.8/gems/sproutcore-0.9.10/bin
chmod 755 *

to make these scripts executable and also add this location to your path with:

export PATH=/var/lib/gems/1.8/gems/sproutcore-0.9.10/bin:$PATH

or just add them to your shell init scripts like .bashrc

At this point you can start down the tutorial examples at: http://www.sproutcore.com/documentation/hello-world-tutorial/

take care
Doug

[update]
After talking with my friend Josh I thought I should add in a little note pointing to information related to Objective-J by 280North of 280Slides development as another data point in some of this Javascript talk.

Friday, June 06, 2008

5 Groovy / Grails recommendations

So I was asked about what my top five recommendations would be for Groovy/Grails resources so I thought I would post them here to get feed back and other ideas from the community. I decided to break it up into five categories.

1) Web sites
Of course the top two would have to be the main Groovy and Grails sites themselves. Not just because they are the home of the respective projects but because both truly are good resources with plenty of examples, documentations and links to mailing lists and other resources. Others not associated with elements already to be placed in following categories might be the Spring and Hibernate sites.

2) Podcasts
Sven Haige started the Grails podcast and was recently joined by Glen Smith. They do a wonderful job with the podcast and it's a great resource for the community.
The Java Posse is high water mark in technical podcasts and I also enjoy Software Engineering Radio.

3) RSS feeds
There are a lot of blogs, rss feeds, etc. that a person can track down. The following three aggregate such feeds and provide a good starting point for locating feeds you resonate with:
http://groovyblogs.org http://groovy.dzone.com/ http://www.groovyongrails.com/

4) Software
Focusing on IDE's the first I would have to recommend is IntelliJ IDEA as its Grails/Groovy support is top notch. It's commercial, but well worth the cost and I believe the best of the those out there for Groovy Grails work period.

Following this I would (personally) would go withNetbeans 6.1 as it also now has Grails support. However, you need to use 6.1 development builds. The plugin home page is at: http://plugins.netbeans.org/PluginPortal/faces/PluginDetailPage.jsp?pluginid=6265

Eclipse also has support and in fact was the first to do so if I recall correctly. However, I have not heard much about it and suspect its support has not been a high priority. Many love Textmate (for good reason) and its various bundles and while on the non-IDE side of software one could likely do a lot with Jedit as well (I have).

Both Teamcity and Hudson provide build environments and code coverage (by way of plugins for Hudson) for Grails projects. Hudson requires a little work for setting up Grails builds but there are some nice blog posts on such efforts. Again the IntelliJ produced Teamcity shines in its Grails support right out of the box.

5) Books
Groovy in Action is a wonderful resource to have as is The Definitive Guide to Grails. There are several new Groovy books coming out and more likely to follow as the Groovy/Grails community continues to grow.

Would love to hear other ideas/comments.
Doug