VOTable Java Parser
This page describes efforts at Caltech to build a parser for the
VOTable XML format. Please send comments and
questions to Roy Williams, and perhaps
also to the VOTable discussion group.
Currently we have concentrated on reading rather than editing tables; though
there is plenty of scope within this framework for such extensions.
VOTableUtil: Automatically Generated API
For the first step, we have used an automated code-generation tool called
Breeze XML Binder to make an initial
functionality. The generated API contains objects derived from the DTD, such
as Resource and
Table.
This API is called VOTableUtil, and the complete javadoc is available at
http://www.us-vo.org/VOTable/Parser/VOTableUtil/doc/. While the generation
of the API from the DTD can only be done by those who have bought the software,
the generated jar files may be freely distributed.
The Table object, for example, includes a getField() method,
which returns a java.util.Vector of Field objects,
and from a Field, we can find the UCD attribute with getUcd().
As an example, this code fragment tries to find which field of a given
table has a given UCD:
for(int i=0; i<table.getFieldCount(); i++){
Field field = (Field)table.getFieldAt(i);
String u = f.getUcd();
if(u != null && u.equals("POS_EQ_RA_MAIN"))
System.out.println("Field " + i + " is for RA");
}
Download here
To try out the VOTableUtil classes, download these files:
Make sure your path includes the Java compiler javac and the Java
runtime java. Make sure your CLASSPATH includes the three
jar files. Compile with:
javac SampleApp.java
then run with the sample XML file:
java SampleApp cover.xml
and you should see that the XML has been read, parsed, reformatted, and output.
Adding Value: the VOSingleTable
The following is an API sketch of an additional helper object that is under
construction at Caltech. It is semantically equivalent to a single table
(not the hierarchy that VOTable can represent). The automatically
constructed Table object carries the table metadata effectively
through the collection of Fields, it is not so good at the data
section. A FITS file is represented only by its filename, or a base64 encoded
version of the binary data; even if the data is represented by pure XML
(Tabledata), there is still no parsing of strings to double, byte,
float etc.
We propose to build an object VOSingleTable. It is constructed from
a Table object:
VOSingleTable(Table t)
and it has methods that extract data from the table in any desired form.
Given the datatypes of the fields in the table, the Data
section of the table is read, and a binary representation built.
First some information about the table:
- int getFieldCount()
This method retrieves the number of fields (also called columns) in the table;
this is the same as the automatically generated Table method
getFieldCount().
- int getRecordCount()
The number of records in the table may require reading
and parsing a large data file, or it may be very simple, depending on the
implementation of VOSingleTable.
- boolean isVariableLength(int ifield)
If the arraysize attribute of the corresponding FIELD
element contains an asterisk, then this is true. It means that the number of
primitives in a column of the table varies from record to record.
- int primitiveSize(int ifield)
A convenience method, essentially it implements Table 1 in the VOTable
document, mapping the datatype attribute to the number of bytes.
For example if the datatype of this field is "double", then this method
returns 8.
- int primitiveCount(int ifield, int irecord)
The number of primitives contained in the table at field ifield
and record irecord. If the field is of variable length, then this
may be different for each record, but if the field is not variable length, then
the irecord argument is ignored.
Now we come to the actual extraction of typed data from the table. The
guiding principle here is that any primitive type can be converted
to any other type. Thus we can apply the getFloatArray method to
a table cell that contains integers (37 converted to 37.0), and all
the other combinations. Complex types are converted to real by taking the real
part; real converted to long/integer/short/byte by standard Java casting
convention; integers are converted to logical/bit by mapping zero to false/0
and all others to true/1.
There is an additional data extraction method getStringArray,
which attempts to format each object as a String, for example the float
quantity 37.0 may be converted to "37.000000".
When data is read in from pure XML (Tabledata), each table cell
(TD element) yields a string, that is to be parsed into a sequence
of primitives. In general, these are tokenized by whitespace, as explained
in section 4.1 of the VOTable document, and each token parsed to the correct
primitive. An exception to this is character data, where separation by whitespace
should not be used: an array of 5 characters is read properly if the XML
contains "Apple", and the array should not be entered as "A p p l e".
The data extraction methods are of the form:
- boolean[] getLogicalArray(int ifield, int irecord)
- byte[] getBitArray(int ifield, int irecord)
- byte[] getByteArray(int ifield, int irecord)
- short[] getShortArray(int ifield, int irecord)
- int[] getIntArray(int ifield, int irecord)
- long[] getLongArray(int ifield, int irecord)
- char[] getCharArray(int ifield, int irecord)
- char[] getUnicodeCharArray(int ifield, int irecord)
- float[] getFloatArray(int ifield, int irecord)
- double[] getDoubleArray(int ifield, int irecord)
- float[] getFloatComplexArray(int ifield, int irecord)
- double[] getDoubleComplexArray(int ifield, int irecord)
In each case, the contents of a table cell that correspond to the given
ifield, irecord will be converted (if necessary) to the desired type,
put into an array and returned.
Note that there are less built-in Java types that VOTable types.
Thus a DoubleComplex array of 5 elements is mapped to a double array of
10 elements -- the rael and imaginary elements interlaced. The Bit datatype
is mapped to Java byte and the user should pick out the individual bits --
however it can also be mapped to a boolean array at the expense of memory.