Tuesday, December 02, 2008

REST SOA and multi-parameter databases

Thoughts on REST and SOA's for database access with multiple parameters

Following that a URI represents a "resource" then it can become an involved process to create a REST compliant approach to RDBM's based resources that allow users to define multiple parameters to request results in a Resource Oriented Architecture (ROA) approach. 

One begins to view "Plain old XML" (POX) defining parameters and constraints passed over HTTP as a REST approach.    However, this tends to border on a basic RPC approach.  Especially when the endpoint that receives the POX represents a method to run.  Thus we are more Services Oriented Architecture (SOA) now than ROA.

Alternately a URI template approach quickly becomes complex and also very RPC like when we simply define a syntax for the parsing of a URI into what is honestly still a method call.  There is also the issue that even something as simple as .../resource/facetA/facetB is different to our caching and general web architecture stack than ../resource/facetB/facetA even if the result (body) is both associative and commutative in operation with respect to these facets.  Arguably a URI doesn't need to be this way, but that seems to imply a URI template again and then it seems there is prior knowledge involved server side.     

One could argue that an  aspect of the uniform interface approach in REST is that resources are manipulated via model transforms.  So a URI represents a resource and we operate on these resources through application of transformations.  Obviously this raises visions of resources as XML and transformations being applied via XSL.  Whether this is a valid REST/ROA approach remains as an argument I suspect (or does it?).  

However, if the goal of the architecture is to allow a user to generate a result based on arbitrary parameters and constraints and then allow the capacity to pass a representation of these results along to another user as a resource then the results of the initial request must result in the generation of a new resource.  This new resource being a representation of the various parameters and constraints requested from the database. 

Defining that as a constraint of the architecture we are looking for then there are perhaps two questions to answer:

  1. If the endpoint we are passing our parameters to is a method then we are RPC and likely should just call ourselves a SOA and move on. 
    • If "yes" to 1 then could the result of this RPC/SOA call generate a resource that could then be passed along on a ROA side of this architecture?
  2. If we believe a REST approach is useful here can we define a "resource" that provided a transform or set of operations results in a set of resources that represent our "results"
    • One has to be careful not to "create" something we call a resource that simply is a method anyway and thus violates a ROA approach anyway.  
  3. Does  a REST approach imply requesting resources with individual parameters/facets and then any subset/intersection or other transform is duty of the client to perform. The results of any multi-parameter request is a set of resource links,
    not the data.  The data the client will use (plot, animate, etc.) comes
    from those resource links.
    • This puts undue (or not) burden on the client for large data sets
    • This makes passing simple representations of resulting data set to other users/machines problematic. 

However, REST is optimized for large-grain hypermedia data transfer (Ref: http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm ) and thus one arrives at this point wondering if this posting is discussing a fine grained approach to data that is not the forte of REST.

One has to consider WS* if you are allowing for RPC anyway.  If you are going to open the door for RPC, then why not just take on
the WS* stack and benefit (and suffer) from the formality of that?  One can argue about the conspiracy
of tool developers in the evolution of WS* all you want.  However, in
the end you are attempting to get a goal achieved and if your goal is fine grained parameter access to database over HTTP then perhaps the SOA/WS*/SOAP approach is what you need (even if it's not what you want).  

Clearly though WS* brings a level of complexity that is not welcome in many places.  So then, as mentioned at the start of this post, one looks to POX over HTTP via POST as an interesting approach.  

There is nothing that says the result (or an addition by product) of this service call can't include the generation of a resource or that a resource creation could be explicitly requested when passing a representation/resource is desired.  The implication being that such a resource is not implicitly generated for each service call.  The "Layered System Constraint" allows us to build up more
representations (views) into our resources to accommodate architectural
approaches needed in our effort. 

A process file representing a workflow to be conducted can be used to pass to others to allow for "gettable" results.  Or could be POST'ed back to the server (another server) to generate a "gettable" result.  How then is this "workflow document" constructed?  JSON ala CouchDB views (Ref: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views?action=show&redirect=Views ), XProc (Ref: http://www.w3.org/TR/xproc/ ), GRDDL (Ref: http://www.w3.org/2004/01/rdxh/spec ) transform links, other?  The exact nature of the construction isn't important at this level in the discussion. 

If application state is the purview of the client, then the client is
free to use any method to maintain that state.  Itself, the involved
application server or some other application server.




Implications for HATEOAS (hypertext as the engine of application state)?


  • links can be implicate or explicate
  • links in mediums other than HTML (examples)

So then if the issue of multiple parameters is one that is fine grained and best approached by SOA while a resource driven approach is naturally ROA in nature something like the following can be seen as a zeroth order start on database exposure in a hybrid SOA/ROA approach. 

ROA

Represented by URI's like:

.../res/paramter
(a large number of resources returned using HATEOAS to navigate the result tree to data)

.../res/ID
(a single resource data represented based on request type)

Can use the standard CRUD REST mapping (and all its good and bad points) to manipulate these resources and mimetype/request type approaches to alter the serialization of the data.

ROA with rendered views*

Represented by views on the resources like

.../res/age/10
(implying all elements 10Ma or older)

.../res/depth/345
(implying 345 mbsf or deeper)

../res/fossilegroup/nanno
(resources with data from fossil group Nanno)

../res/taxon/X
(resources with data about taxon X)

.../res/ics2004/earlyjurasic
(resource with data from early Jurassic as defined by ICS 2004 timescale)


SOA (WS*, POX via POST, etc.)

Standard SOA method calls via SOAP envelopes. 
(lots of baggage)

POX over HTTP

JSON plus REST

or even just pushing .JS in general over to a server ala Perserver or FeatherDB. 
* All these URI examples are based on paleontology/geology approaches associated with the kind of data CHRONOS works with and may be somewhat opaque at first glance.



Resource Oriented Architecture

"Classic" REST in a ROA approach.  URI point to individual resource which return their data.  The "view" of the data may change based on the request type and other elements of the web architecture also are there to address caching (e-tags) and other elements of scaling.  Implementation wise this could be Jersey or CouchDB or many things.   

Resource Oriented Architecture with dynamic creation of views

Adding the ability to create dynamic views is a new level.  While CouchDB supports views they have to be registered and indexed.  Other packages like Perserver (Ref: http://www.persvr.org/ ) and FeatherDB allow for views to be dynamically created and run via REST calls. The performance and scalability issues of allowing such dynamic view creation would need to be evaluated.   Also the views themselves tend to be Javascript based or based on JSON query/path expression syntax.  Though this is not always the case and it should be noted that CouchDB, though requiring prior view creation, supports a wide range of languages for views. 
ROA / SOA hybrid
If the goal as stated is to allow fine grained multi-parameter access to a database while leveraging off the scaling and ease of use aspects of REST/ROA then a review of various approaches to a ROA/SOA hybrid is beneficial.  Obviously there is a high degree of coupling between the service (method) and client in all these cases. 

Seq A  (One call, all the data returned)
Classic service call as either XML-RPC, WS*, POX over HTTP (RPC), JSON over HTTP (RPC) or however you need to communicate a request to a method and get your results back.  Whether the method is directly exposed or a more document centric approach where an XML or JSON package is processed is not relevant. 

Seq B  (One call for ID collection then N calls for matching resources)
Method call made but rather than data a collection of ID's for matching resource are returned.  The client then makes further calls for each matching resource.

Seq C (One call for ID collection, One call for collection resource, N calls for matching resources)
Method call made which results in the creation of a new resource.  The ID of that resource if returned to the client which then calls to it to gather the associated resource ID that are called to return the requested data. 

Comments on these approaches
Obviously each version is getting more and more complex and involving more and more network calls and the inherit latency that creates.  Seq C does result in a new resource that can retried via REST compliant GET and can then return a HATEOAS compliant collection of resources to be retrieved

Accessing (opening and closing) N number of resources over the network is also more involved than returning one large document of all the resources combined.  The first approach does lend itself well to a scaled out architecture though.  One might consider the utility of AtomPub (Ref: http://www.atompub.org/) in this architecture.

Not mentioned yet but important especially with creating new resource dynamically is the use of http codes (Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ).  In particular 201 Created or 303 See Other may be relevant.  Obviously using the existing web architecture as much as possible has value.  The general shunning of the web architecture by WS* services has not been to its benefit in the authors opinion. 

Conclusion
The large grained focus of REST makes it hard to accommodate access to database with a large number of parameters and criteria exposed to clients for them to construct queries against. 

SOA based approaches that are more RPC in nature match this need better but introduce strong coupling between client and server and are generally more complex to create, invoke and evolve. 

An architecture than combines ROA and SOA approaches is not impractical

  • A ROA (REST) approach for easy and highly scalable access to data based on basic parameters and a number of pre-defined "views" to the most common queries
  • Service/method calls to support multi-parameter calls to a data source to allow arbitrarily complex requests
  • Service/method calls that allow new "views" to be registered when it is felt than they represent a new and potentially popular view to the data or when a passable, REST compliant resource creation is desirable

How an implementation of this might be constructed is a future topic.
take care
Doug

Refs used for this document:
http://www.slideshare.net/alan.dean/separating-rest-facts-from-fallacies-presentation/
http://josh-in-antarctica.blogspot.com/2008/11/restful-query-urls-cont.html
http://www.eherenow.com/soapfight.htm
http://www.informationweek.com/blog/main/archives/2008/08/rest_vs_soap_ro.html
http://www.25hoursaday.com/weblog/2008/08/24/RESTfulJSONBringingRESTAndRPCCloserTogether.aspx