JAVOT: a Java Parser for VOTable

This page describes efforts at Caltech to build a parser for the VOTable XML format. Please send comments and questions to Roy Williams, and perhaps also to the VOTable discussion group. Currently we have concentrated on reading rather than editing tables; though there is plenty of scope within this framework for such extensions.

The current release of JAVOT can be obtained at http://us-vo.org/VOTable/JAVOT/JAVOT.zip.

All of the packages dicussed below have reference documentation at http://us-vo.org/VOTable/JAVOT/JAVOT/doc

1. VOTableUtil: Automatically Generated API

For the first step, we have used an automated code-generation tool called Breeze XML Binder to make an initial functionality. The generated API contains objects derived from the DTD, with the exact mapping from the DTD to the API described in the Breeze documentation on creating Java code.. While the generation of the API from the DTD can only be done by those who have bought the software, the generated jar files may be freely distributed.

There are objects such as Resource and Table, corresponding to the elements RESOURCE and TABLE of the VOTable dialect of XML. The DTD states that each RESOURCE can contain any number of TABLEs -- therefore we expect there to be a method of the Resource objects called getTables() which returns a Vector of Table objects, and another called getTableCount() for the number of these.

The Table object, for example, includes a getField() method, which returns a java.util.Vector of Field objects, and from a Field, we can find the UCD attribute with getUcd(). As an example, this code fragment tries to find which field of a given table has a given UCD:

    for(int i=0; i<table.getFieldCount(); i++){
        Field field = (Field)table.getFieldAt(i);
        String u = f.getUcd();
        if(u != null && u.equals("POS_EQ_RA_MAIN"))
           System.out.println("Field " + i + " is for RA");
    }

The VOTableUtil package is described exactly and formally by the Javadoc documentation.

2: The VOTableWrapper object

This convenience class makes it easy to parse VOTable documents directly from a file or a URL. The code can be used like this:

	String fileOrUrl;
	PrintWriter out = new PrintWriter(System.out, true);
	VOTableWrapper votw = new VOTableWrapper(fileOrUrl, out);

	if(votw.getLastError() != null){
		System.err.println("No VOTable found " + fileOrUrl);
		System.err.println("Last error is " + votw.getLastError());
		System.exit(1);
	}

	Votable v = votw.getVotable();
The PrintWriter is the way in which diagnistics are passed. It can be null if no diagnistics are wanted. Because we are expecting this code to be used in servlets, it would not be appropriate to hard-code System.out.

When the VOTableWrapper constructor returns, there may have been a error.

It is the user's responsibility to check for errors!.

This can be done with the getLastError() method.

3: The VOSingleTable object

The following is an API sketch of an additional helper object that is under construction at Caltech. It is semantically equivalent to a single table (not the hierarchy that VOTable can represent, of Resources, Tables, Parameters, Links etc). The automatically constructed Table object carries the table metadata effectively through the collection of Fields, it is not so good at the data section. A FITS file is represented only by its filename, or a base64 encoded version of the binary data; even if the data is represented by pure XML (Tabledata), there is still no parsing of strings to double, byte, float etc.

VOSingleTable is constructed from a Table object:

    VOSingleTable(Table t)

and it has methods that extract data from the table in any desired form. Given the datatypes of the fields in the table, the Data section of the table is read, and a binary representation built. First some information about the table:

Now we come to the actual extraction of typed data from the table. There are two principles here. One is that data is copied from the XML tree and stored at maximal density in memory, so that for example a float takes only 4 bytes and a bit 1/8 of a byte. The other principle is that any primitive type can be converted to any other type. Thus we can apply the getFloatArray method to a table cell that contains integers (37 converted to 37.0), and all the other combinations. Complex types are converted to real by taking the real part; real converted to long/integer/short/byte by standard Java casting convention; integers are converted to logical/bit by mapping zero to false/0 and all others to true/1.

There is an additional data extraction method getStringArray, which attempts to format each object as a String, for example the float quantity 37.0 may be converted to "37.000000".

When data is read in from pure XML (Tabledata), each table cell (TD element) yields a string, that is to be parsed into a sequence of primitives. In general, these are tokenized by whitespace, as explained in section 4.1 of the VOTable document, and each token parsed to the correct primitive. An exception to this is character data, where separation by whitespace should not be used: an array of 5 characters is read properly if the XML contains "Apple", and the array should not be entered as "A p p l e".

We can imagine a matrix of conversion methods, from any of the values of the datatype attribute in the XML document, to any of the methods listed below that convert to whatever Java type is required. This means, for example, that a code manipulating "Declination" can get the data as double even though it may have been either double or float in the source table.

The data extraction methods are of the form:

In each case, the contents of a table cell that correspond to the given ifield, irecord will be converted (if necessary) to the desired type, put into an array and returned.

Note that there are less built-in Java types than VOTable types. Thus a DoubleComplex array of 5 elements is mapped to a double array of 10 elements -- the real and imaginary elements interlaced. The Bit datatype is mapped to Java byte and the user should pick out the individual bits -- however it can also be mapped to a boolean array at the expense of memory.

4: Obtaining the Software

The current release of JAVOT can be obtained at http://us-vo.org/VOTable/JAVOT/JAVOT.zip.

Instructions

Now build your own application, using one of the sample programs as a guide.