2008 Summer School

Building Web Services

In this exercise, we will gain experience in building and deploying web services, both in Python (using CherryPy) and Java (using Tomcat and Axis). The NVOSS software package contains all the infrastructure we require.

CherryPy

CherryPy is a Python library that allows you to build web applications by just writing normal code and not having to worry about any of the web infrastructure stuff (such as explicitly handling HTTP requests and responses or serialising and deserialising objects).

The basics

The obligatory simplest application, Hello World, looks like this:

import cherrypy

class HelloWorld(object):
  def index(self):
    return "Hello World!"
  index.exposed = True

cherrypy.quickstart(HelloWorld())

You can try the above code example by saving it to a file called HelloWorld.py and then just typing (making sure that you have run the NVOSS setup script first):

> python HelloWorld.py
[25/Jun/2008:14:26:17] HTTP Serving HTTP on http://0.0.0.0:8080/
CherryPy Checker:
The Application mounted at '' has an empty config.

This will start a web server on your machine listening on port 8080. If you now point a web browser at http://localhost:8080, you should see the familiar greeting.

What's happening?

When your browser points at a URL such as http://localhost:8080/mydata/galaxy1.jpg, it is actually making a (HTTP GET) request for the resource on the server identified by the path part of the URL - in this case, the image identified by /mydata/galaxy1.jpg. The server receiving the request knows how to resolve (dereference) the identifier and return the appropriate resource. Note that though normal practice is to map the identifier straight onto the server file system, this does not have to be the case.

When the path identifies something that looks like a directory, e.g. /mydata, the convention is that the server will try to return a resource called index.html relative to the path, i.e. in this case, the resource identified as /mydata/index.html. This holds even when no path is apparently specified, e.g. http://localhost:8080, as this is actually a request for the root resource that maps to /.

CherryPy maps incoming request URLs to Python objects (classes and methods) defined in the application code. When you launch your application, you tell CherryPy which object is going to be the root resource, i.e. what gets mapped to /. In the example above, this is an instance of the HelloWorld() class:

root = HelloWorld()

Note that the quickstart() method used in the example is a convenience method that sets the root object in addition to performing other tasks.

You configure the URL-to-object mapping by defining a tree of objects relative to the root object. Any object attached to the root object is said to be published, i.e. it is accessible to the internal mapping routine. When CherryPy receives a URL, it tries to find the best match object in the tree to the path. For example, if the above tree was extended thus:

root.data = SomeDataApp()
root.data.service = SomeService()

then the URL http://localhost:8080/data/service would get mapped to an instance of SomeService(). However, for the object actually to be callable from the Web it also needs to be exposed. This is done by setting the exposed attribute on the particular object:

<object>.exposed = True

You can see this done on the index method in the above Hello World example.

When the matched object is (an instance of) a class, CherryPy tries, by default, to call a method called index() within the class. This is analagous to a server returning index.html for a "directory" URL mentioned above.

RESTful services

With a RESTful service call, the URL identifies the resource in which we are interested and the HTTP method what we want to do with it, e.g. return it (HTTP GET) or remove it (HTTP DELETE). A natural way of exposing such a service is to map the HTTP method onto a Python method of the same name, i.e. GET, DELETE, etc. CherryPy can be configured to do this:

import cherrypy

class SomeRestfulService(object):
  exposed = True

  def GET(self, *args, **kwargs):
  # Code to return the resource identified by the URI

  def DELETE(self, *args):
  # Code to delete the resource identified by the URI

cherrypy.quickstart(SomeRestfulService(),
  config = {'/' : {'request.dispatch': cherrypy.dispatch.MethodDispatcher()}})

Note that the class attribute exposed will expose all methods in the class.

The path part of the URL is broken into a set of tokens using / as a delimiter and this is available through the Python tuple variable *args, e.g. the path a/b/c/1/2/3 becomes the Python tuple ('a', 'b', 'c', '1', '2', '3'). Any query in the URL gets mapped to the Python dictionary **kwargs with the query field as the keyword and query value as the value, e.g. the query ?fruit=apple becomes {'fruit': 'apple'}. Note that a URL containing a fragment, e.g. index#part3 will throw a HTTP 400 Bad Request exception.

Each method should return the appropriate response, either a resource or a message giving the success status of the request, i.e. was the appropriate resource created, updated or deleted properly. One way to do this is to use HTTP Response codes, e.g. we might respond with a HTTP 404 for a request to do something with a resource that we cannot identify:

import os
class SomeRestfulService(object):
  ...
  def GET(self, *args, **kwargs):
    identifier = '/'.join(args)
    if os.file.exists(identifier):
      file = open(identifier).read()
      return file
    else:
      raise cherrypy.HTTPError(404)

Example service
Let's imagine that we want to create a service where we can annotate astronomical images, i.e. tag them like Flickr. We are going to want to be able to:

Each of these operations maps onto a different HTTP method:

Operation

HTTP method

Upload an image

PUT

Add a comment

POST

Get an image and its comments

GET

Delete an image and its comments

DELETE

Here is the source code for our service (also available as ImageAnnotationService.py in the $NVOSS_HOME/python/src/webservice directory):

import cherrypy
import os, base64, time

def noBodyProcess():
  cherrypy.request.body = cherrypy.request.rfile
  cherrypy.request.process_request_body = False

cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', 
                                             noBodyProcess)

class ImageAnnotationService(object):
  exposed = True
  imgdir = "FILL THIS IN"
  commentdir = "FILL THIS IN"

  def PUT(self, *args):
    try:
      dataLength = int(cherrypy.request.headers.get('Content-Length') or 0)
      data = cherrypy.request.rfile.read(dataLength)
      imgfile =  self.imgdir + args[0]
      open(imgfile, 'w').write(data)
      return "<html><body><p>Image saved</p></body></html>"
    except:
      raise cherrypy.HTTPError(500)

  def POST(self, *args):
    try:
      dataLength = int(cherrypy.request.headers.get('Content-Length') or 0)
      data = cherrypy.request.rfile.read(dataLength)
      commentfile =  self.commentdir + args[0]
      comment = "<p>%s: %s</p>]n" % (time.ctime(), data) 
      open(commentfile, 'a').write(data)
      return "<html><body><p>Comment added</p></body></html>"
    except:
      return cherrypy.HTTPError(500)

  def GET(self, *args, **kwargs):
    try:
      identifier = '/'.join(args)
      imgfile = open(self.imgdir + identifier).read()
      encodedImg = base64.b64encode(imgfile)
      comments = open(self.commentdir + identifier.read()
      page = '<html><body>'
      page += '<img src="data:image/jpg;base64,' + encodedImg + '">'
      page += '<hr/>'
      page += comments
      page += '</body></html>'
      return page
    except:
      raise cherrypy.HTTPError(404)

  def DELETE(self, *args):
    try:
      identifier = '/'.join(args)
      os.remove(self.imgdir + identifier)
      os.remove(self.commentdir + identifier)
    except:
      return cherrypy.HTTPError(404)

cherrypy.quickstart(SomeRestfulService(),
  config = {'/' : {'request.dispatch': cherrypy.dispatch.MethodDispatcher(),
    'tools.noBodyProcess.on': True}})

Note that Windows users might need to make sure that there is a double backward slash '\\' at the end of the paths they set for imgdir and commentdir. Note also that the data: URI scheme used in the <img> tag is not supported by IE7.

We start the service as per normal:

> python ImageAnnotationService.py
[05/Aug/2008:11:20:34] ENGINE Listening for SIGHUP.
[05/Aug/2008:11:20:34] ENGINE Listening for SIGTERM.
[05/Aug/2008:11:20:34] ENGINE Listening for SIGUSR1.
[05/Aug/2008:11:20:34] ENGINE Bus STARTING
[05/Aug/2008:11:20:34] ENGINE Started monitor thread '_TimeoutMonitor'.
[05/Aug/2008:11:20:34] ENGINE Started monitor thread 'Autoreloader'.
[05/Aug/2008:11:20:35] ENGINE Serving on 127.0.0.1:8080
[05/Aug/2008:11:20:35] ENGINE Bus STARTED

and we can use the command line utility curl to test it (Note that Windows users can get a copy of curl here):

> curl -X PUT --data-binary @$NVOSS_HOME/python/data/whirlpool.jpg \
  "http://localhost:8080/whirlpool" 
> curl -X POST -d "This is an exquisite image" "http://localhost:8080/whirlpool"

and then point a browser at: http://localhost:8080/whirlpool. We'll leave the testing of deleting the image as an exercise for the reader.

SOAP services

SOAP services are inherently more complex than RESTful services because of all the business with the SOAP messages and the fact that the operation (method) names are not constrained in any way. However, I've written a little library to take care of all the details and make exposing a SOAP service with CherryPy as straightforward as a RESTful one. It's called CherryPyWebService.py and you just import it as a normal Python module:

import CherryPyWebService

The class that represents your service needs to inherit from an object called SoapService:

class MyService(SoapService):

The constructor for the class has to include a bit of boilerplate:

  def __init__(self):
    baseURL = "http://localhost:8080/MyService"
    super(MyService, self).__init__(baseURL)

where you would replace the URL and the service name.

Finally any method that you want to expose as a service operation justs needs to take a bit of preamble in the form of a decorator called wsmethod specifying the input and output types of the method variables, and, optionally, the name of the return variable (the default is retval):

  @wsmethod(type1, type2, ..., _returns = type3, _responseVariable = 'answer')
  def someMethod(num1, num2, ...):
    ...

Obviously the number of input variables needs to match the number of input types specified and the two orderings also need to be equal. Base types are represented thus:

Type

Representation

integer

int

real

float

long

long

string

str

boolean

bool

If you want to return an array of a particular type then you just declare this by putting "[]" around the specific type, e.g. if you want to return an array of strings:

@wsmethod(int, str, _returns = [str])

If you want to use a custom class as an input or output variable then you just specify the name of the class as its representation. You will also have to write some code either describing the internal structure of the class so that the CherryPyWebService module knows how to serialize it to and from XML or by providing a custom serializer yourself. For example, let's assume we have a method (called register) which takes a name (string) and a magnitude (integer) and returns a Star object:

  @wsmethod(str, int, _returns = Star)
  def register(name, mag):
    ...

where Star is defined as:

class Star():
  def __init__(self, name, mag):
    self.name = name
    self.mag = mag

To let the CherryPyWebService module serialize this, the class has to inherit from a class called Serializer and we have to add an inner class to the class definition called datamodel specifying the type of each subcomponent of the class:

class Star(Serializer):
  class datamodel:
    name = str
    mag = int

  def __init__(self, name, mag):
  ...

We could then define another class called Galaxy such that:

class Galaxy(Serializer):
  class datamodel:
    stars = Array(Star)
...

Note that within the datamodel inner class, you have to declare arrays of objects via the Array() structure rather than square brackets, [].

If you wanted to write your own serializer instead of using the CherryPyWebService module's one then you need provide four methods:

  @classmethod
  def to_xml(self, values, name = "retval"):
  """Returns an XML representation of the class"""

  @classmethod
  def from_xml(self, element):
  """Converts the XML element representation into a class instance"""

  @classmethod
  def get_datatype(self, withNamespace = False):
  """Return the name of the root element in the XML representation of
    the class including any namespace if appropriate"""

  @classmethod
  def schemaEntry(self):
  """Return an array containing the XML Schema components - elements,
    complexTypes, simpleTypes, etc. - for this class that would be
    found in its XML Schema description."""

Since we expect that quite a few services will be employing VOTable either as an input or output variable, we've already written a serializer for it called VOTableSoap. So if want to use VOTable, you have to describe it as VOTableSoap:

@wsmethod(VOTableSoap, _returns = VOTableSoap)

VOTableSoap can be used exactly like the Python VOTable class with which you are already familiar.

The WSDL file for the service is autogenerated by the CherryPyWebService module and available using the standard convention of appending ?wsdl to the endpoint address of the service, e.g.:

http://localhost:8080/MyService?wsdl

Example service
Let's imagine that we want to create a service that takes a VOTable containing the apparent magnitude and parallax of a set of stars and adds an extra column containing the absolute magnitude.

Here is the source code for our service (also available as AbsoluteMagnitudeService.py in the $NVOSS_HOME/python/src/webservice directory:

import cherrypy, math, VOTable
from CherryPyWebService import SoapService, VOTableSoap, wsmethod

class AbsoluteMagnitudeService(SoapService):

  def __init__(self):
    baseURL = "http://localhost:8080/AbsoluteMagnitudeService"
    super(AbsoluteMagnitudeService, self).__init__(baseURL)

  @wsmethod(VOTableSoap, _returns = VOTableSoap, _responseVariable = 'VOTABLE')
  def calcAbsoluteMagnitude(self, VOTABLE):
    newField = VOTable.VONode(('', 'FIELD'))
    newField.addAttribute((('', 'name'), 'V_absmag'))
    newField.addAttribute((('', 'datatype'), 'float'))
    newField.addAttribute((('', 'arraysize'), '*'))
    table = VOTABLE.getTables()
    table[0].addNode(newField)
    for row in VOTABLE.getDataRows():
      data = VOTABLE.getData(row)
      absMag = float(data[1]) + 5.0 * math.log10(float(data[2])) - 10
      newTD = VOTable.VONode(('', 'TD'))
      newTD.addNode(str(absMag))
      row.addNode(newTD)
    return VOTABLE

root = AbsoluteMagnitudeService()

if __name__ == '__main__':
  cherrypy.quickstart(AbsoluteMagnitudeService(),
    config = {'/' : {'request.dispatch': cherrypy.dispatch.MethodDispatcher()}})

Again we start the service as per normal:

> python AbsoluteMagnitudeService.py
[07/Aug/2008:14:15:37] ENGINE Listening for SIGHUP.
[07/Aug/2008:14:15:37] ENGINE Listening for SIGTERM.
[07/Aug/2008:14:15:37] ENGINE Listening for SIGUSR1.
[07/Aug/2008:14:15:37] ENGINE Bus STARTING
[07/Aug/2008:14:15:37] ENGINE Started monitor thread '_TimeoutMonitor'.
[07/Aug/2008:14:15:37] ENGINE Started monitor thread 'Autoreloader'.
[07/Aug/2008:14:15:38] ENGINE Serving on 127.0.0.1:8080
[07/Aug/2008:14:15:38] ENGINE Bus STARTED

and we will use curl to send a sample SOAP message to the service:

> curl -X POST -d @test.soap "http://localhost:8080/AbsMagService"
<calcAbsMagResponse>
  <VOTABLE version="1.1">
    <RESOURCE>
      <TABLE name="I/239/hip_main" nrows="22951">
        <FIELD arraysize="*" datatype="float" name="V_absmag" />
        <DESCRIPTION>The Hipparcos Main Catalogue\vizContent{timeSerie}</DESCRIPTION>
        <PARAM arraysize="*" datatype="char" name="-ref" value="VOTx29114" />
        <PARAM arraysize="*" datatype="char" name="-out.max" value="50000" />
        <FIELD datatype="int" name="HIP" ucd="meta.id;meta.main" width="6">
          <DESCRIPTION>Identifier (HIP number) (H1)</DESCRIPTION>
          <VALUES null="-2147483648" />
        </FIELD>
        <FIELD datatype="float" name="Vmag" precision="2" ucd="phot.mag;em.opt.V" unit="mag" width="5">
          <DESCRIPTION>? Magnitude in Johnson V (H5)</DESCRIPTION>
        </FIELD>
        <FIELD datatype="float" name="Plx" precision="2" ucd="pos.parallax.trig" unit="mas" width="7">
          <DESCRIPTION>? Trigonometric parallax (H11)</DESCRIPTION>
        </FIELD>
        <DATA>
          <TABLEDATA>
            <TR>
              <TD>-0.207829612603</TD>
              <TD>1168</TD>
              <TD>4.79</TD>
              <TD>10.01</TD>
            </TR>
        ...

Tomcat and Axis

In Java, we commonly use the combination of the Tomcat servlet container and the Axis library to build SOAP-based web services. RESTful web services can be built using Tomcat and the Jersey library (see here for more details) which is the reference implementation for JAX-RS (JSR 311), the Java API for RESTful Web Serivces. Jersey is currently still an early access implementation as JAX-RS is not yet an approved Java Specification so we will not consider it any further here.

If we want to write a service whose WSDL file already exists, e.g. it is an IVOA standard or an implementation exists in another language, we can use an AXIS tool called wsdl2java to generate skeleton code that we can then fill in with the necessary business logic. For example, if we wanted to write a Java version of the Python example SOAP service then we could generate the skeleton Java classes with (assuming that the Python service is running):

> java org.apache.axis.wsdl.WSDL2Java -o src --server-side \
--skeletonDeploy true \
"http://localhost:8080/AbsMagService?wsdl"

This will produce a number of classes in a subdirectory under the current working directory. In particular, a skeleton class (src/localhost/AbsMagService/soap/AbsMagService_BindingSkeleton.java) and an implementation template (src/localhost/AbsMagService/AbsMagService_BindingImpl.java) will be generated. We would code in the details of what the service should do into the implementation template. After compiling it, we would be ready to deploy it.

If we do not have a WSDL to hand but do have a Java interface that our service would implement then we can generate a WSDL from this and then skeleton code from the WSDL. A suitable Java interface for the Python example SOAP service might be (this is available as $NVOSS_HOME/java/src/webservice/src/AbsMagService.java):

import net.ivoa.www.xml.VOTable.v1_1.*;

public interface AbsMagService {

    public VOTABLE calcAbsMag(VOTABLE vot);

}

Note that for convenience this makes use of the VOTable code that is generated from the Python service WSDL. In reality, you could substitute whatever other VOTable library you wanted. To generate a WSDL from this, we compile it and then use the wsdl2java tool:

> cd $NVOSS_HOME/java/src/webservice/src
> javac AbsMagService.java
> java org.apache.axis.wsdl.Java2WSDL -o absmagservice.wsdl \
-l "http://localhost:8080/axis/services/AbsMagService" \
-n "urn:AbsMagService" -y WRAPPED \
--importSchema "file://$NVOSS_HOME/python/src/webservice/votable.xsd" \
-p"net.ivoa.www.xml.VOTable.v1_1" "http://www.ivoa.net/xml/VOTable/v1.1" \
AbsMagService

where -o indicates the name of the output WSDL file, -l indicates the location of the service (where it will be deployed), -n is the target namespace of the WSDL file, -y specifies the WSDL type (doc/lit wrapped in this case), --importSchema tells Java2WSDL to use an already existing XML Schema instead of generating one of its own, -p sets up a mapping from the Java package name for the VOTable code to its XML Schema namespace, and the final argument is our Java interface. Note that Windows users should not specify the drive part (C:\) of the schema location with the file:// scheme and may have to preface their path with an additional /.

From our WSDL we can now generate the skeleton code for the service - just to make sure that there are no version clashes, we'll also delete any previously generated code in the directory:

>  rm -rf net localhost 
> java org.apache.axis.wsdl.WSDL2Java -o . -d Session -s -S true absmagservice.wsdl

where -o specifies where the skeleton classes will be generated, -s and -S are just shorthand for --server-side and --skeletonDeploy respectively and -d indicates how the service will be deployed. The various files should now exist in a subdirectory called AbsMagService_pkg:

> ls AbsMagService_pkg
AbsMagService.java			AbsMagServiceSoapBindingSkeleton.java
AbsMagServiceService.java		AbsMagServiceSoapBindingStub.java
AbsMagServiceServiceLocator.java	deploy.wsdd
AbsMagServiceSoapBindingImpl.java	undeploy.wsdd

The file that we shall put some code into is the implementation template, AbsMagServiceSoapBindingImpl.java:

package AbsMagService_pkg;

import net.ivoa.www.xml.VOTable.v1_1.*;
import org.apache.axis.message.*;

public class AbsMagServiceSoapBindingImpl implements AbsMagService_pkg.AbsMagService{
    public net.ivoa.www.xml.VOTable.v1_1.VOTABLE calcAbsMag(net.ivoa.www.xml.VOTable.v1_1.VOTABLE in0)
        throws java.rmi.RemoteException {
        try {
            RESOURCE resource = in0.getRESOURCE(0);
            TABLE table = resource.getTABLE(0);
            TABLEDATA tableData = table.getDATA().getTABLEDATA();
            TR[] newTR = new TR[tableData.getTR().length];
            int count = 0;
            for (TR r : tableData.getTR()) {
                TD[] tds = r.getTD();
                TD[] newTD = new TD[4];
                for (int i = 0; i < 3; i++) newTD[i] = tds[i];
                double appMag = Double.parseDouble(tds[1].get_any()[0].getAsString());
                double plx = Double.parseDouble(tds[2].get_any()[0].getAsString());
                double absMag = appMag + 5.0 * Math.log10(plx) - 10.0;
                newTD[3] = new TD(new MessageElement[] {new MessageElement(new Text(
                            String.valueOf(absMag)))}, EncodingType.none);
                newTR[count] = new TR();
                newTR[count++].setTD(newTD);
            }
            tableData.setTR(newTR);
        } catch (Exception e) {
            e.printStackTrace(System.err);
        }
        return in0;
    }
}

We will use ant to compile this, combine all the Java class files into a jar file, copy this to the Axis deployment directory and edit the file containing information about Axis deployed services:

> cd $NVOSS_HOME/java/src/webservice
> ant deploy

We can make sure that our service got deployed properly by looking at the list of deployed services at: http://localhost:8080/axis/services. If the service does not appear at first, it may be necessary to bounce Tomcat and also try redeploying:

> bouncetomcat
> ant deploy

Finally we can test our deployed service with curl again:

> curl -X POST -d @test.soap -H "SOAPAction: calcAbsMag" "http://localhost:8080/axis/services/AbsMagService"
<?xml version="1.0" encoding="UTF-8"?>
  <soapenv:Envelope
    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <soapenv:Body>
      <calcAbsMagResponse
        xmlns="http://localhost:8080/AbsMagService/soap/">
        <ns1:calcAbsMagReturn version="1.1" xmlns:ns1="urn:AbsMagService">
          <ns2:RESOURCE xmlns:ns2="http://www.ivoa.net/xml/VOTable/v1.1">
            <ns2:TABLE name="I/239/hip_main">
              <ns2:DESCRIPTION>The Hipparcos Main Catalogue\vizContent{timeSerie}</ns2:DESCRIPTION>
              <ns2:PARAM arraysize="*" datatype="char" name="-out.max" value="50000"/>
              <ns2:FIELD datatype="float" name="Plx" precision="2" ucd="pos.parallax.trig" unit="mas" width="7">
                <ns2:DESCRIPTION>? Trigonometric parallax (H11)</ns2:DESCRIPTION>
              </ns2:FIELD>
              <ns2:DATA>
                <ns2:TABLEDATA>
                  <ns2:TR>
                    <ns2:TD>1168</ns2:TD>
                    <ns2:TD>4.79</ns2:TD>
                    <ns2:TD>10.01</ns2:TD>
                    <ns2:TD>-0.20782961260340826</ns2:TD>
                  </ns2:TR>
                ...

The NVO Summer School is made possible through the support of the National Science Foundation.