VOTable Java Parser

This page describes efforts at Caltech to build a parser for the VOTable XML format. Please send comments and questions to Roy Williams, and perhaps also to the VOTable discussion group. Currently we have concentrated on reading rather than editing tables; though there is plenty of scope within this framework for such extensions.

VOTableUtil: Automatically Generated API

For the first step, we have used an automated code-generation tool called Breeze XML Binder to make an initial functionality. The generated API contains objects derived from the DTD, such as Resource and Table. This API is called VOTableUtil, and the complete javadoc is available at http://www.us-vo.org/VOTable/Parser/VOTableUtil/doc/. While the generation of the API from the DTD can only be done by those who have bought the software, the generated jar files may be freely distributed.

The Table object, for example, includes a getField() method, which returns a java.util.Vector of Field objects, and from a Field, we can find the UCD attribute with getUcd(). As an example, this code fragment tries to find which field of a given table has a given UCD:

    for(int i=0; i<table.getFieldCount(); i++){
        Field field = (Field)table.getFieldAt(i);
        String u = f.getUcd();
        if(u != null && u.equals("POS_EQ_RA_MAIN"))
           System.out.println("Field " + i + " is for RA");
    }

Download here

To try out the VOTableUtil classes, download these files: Make sure your path includes the Java compiler javac and the Java runtime java. Make sure your CLASSPATH includes the three jar files. Compile with:
    javac SampleApp.java
then run with the sample XML file:
    java SampleApp cover.xml
and you should see that the XML has been read, parsed, reformatted, and output.

Adding Value: the VOSingleTable

The following is an API sketch of an additional helper object that is under construction at Caltech. It is semantically equivalent to a single table (not the hierarchy that VOTable can represent). The automatically constructed Table object carries the table metadata effectively through the collection of Fields, it is not so good at the data section. A FITS file is represented only by its filename, or a base64 encoded version of the binary data; even if the data is represented by pure XML (Tabledata), there is still no parsing of strings to double, byte, float etc.

We propose to build an object VOSingleTable. It is constructed from a Table object:

    VOSingleTable(Table t)

and it has methods that extract data from the table in any desired form. Given the datatypes of the fields in the table, the Data section of the table is read, and a binary representation built. First some information about the table:

Now we come to the actual extraction of typed data from the table. The guiding principle here is that any primitive type can be converted to any other type. Thus we can apply the getFloatArray method to a table cell that contains integers (37 converted to 37.0), and all the other combinations. Complex types are converted to real by taking the real part; real converted to long/integer/short/byte by standard Java casting convention; integers are converted to logical/bit by mapping zero to false/0 and all others to true/1.

There is an additional data extraction method getStringArray, which attempts to format each object as a String, for example the float quantity 37.0 may be converted to "37.000000".

When data is read in from pure XML (Tabledata), each table cell (TD element) yields a string, that is to be parsed into a sequence of primitives. In general, these are tokenized by whitespace, as explained in section 4.1 of the VOTable document, and each token parsed to the correct primitive. An exception to this is character data, where separation by whitespace should not be used: an array of 5 characters is read properly if the XML contains "Apple", and the array should not be entered as "A p p l e".

The data extraction methods are of the form:

In each case, the contents of a table cell that correspond to the given ifield, irecord will be converted (if necessary) to the desired type, put into an array and returned.

Note that there are less built-in Java types that VOTable types. Thus a DoubleComplex array of 5 elements is mapped to a double array of 10 elements -- the rael and imaginary elements interlaced. The Bit datatype is mapped to Java byte and the user should pick out the individual bits -- however it can also be mapped to a boolean array at the expense of memory.