JAVOT: a Java Parser for VOTable
This page describes efforts at Caltech to build a
parser for the VOTable XML
format. Please send comments and questions to Roy Williams, and perhaps also to the VOTable discussion group. Currently we
have concentrated on reading rather than editing tables; though there is plenty
of scope within this framework for such extensions.
The current release of JAVOT can be obtained at
http://us-vo.org/VOTable/JAVOT/JAVOT.zip.
All of the packages dicussed below have reference documentation at
http://us-vo.org/VOTable/JAVOT/JAVOT/doc
1. VOTableUtil: Automatically Generated API
For the first step, we have
used an automated code-generation tool called Breeze XML Binder to make an initial
functionality. The generated API contains objects derived from the DTD,
with the exact mapping from the DTD to the API
described in the
Breeze documentation on creating Java code..
While the generation of the API from the DTD can only be done by those who have
bought the software, the generated jar files may be freely distributed.
There are objects such as
Resource
and Table,
corresponding to the elements RESOURCE and TABLE of the VOTable
dialect of XML. The DTD states that each RESOURCE can contain any
number of TABLEs -- therefore we expect there to be a method of the Resource
objects called getTables() which returns a Vector of Table objects,
and another called getTableCount() for the number of these.
The Table object, for example, includes a getField()
method, which returns a java.util.Vector of Field objects, and
from a Field, we can find the UCD attribute with getUcd(). As
an example, this code fragment tries to find which field of a given table has a
given UCD:
for(int i=0; i<table.getFieldCount(); i++){
Field field = (Field)table.getFieldAt(i);
String u = f.getUcd();
if(u != null && u.equals("POS_EQ_RA_MAIN"))
System.out.println("Field " + i + " is for RA");
}
The VOTableUtil package is described exactly and formally by
the Javadoc documentation.
2: The VOTableWrapper object
This convenience class makes it easy to parse VOTable documents directly from a file
or a URL. The code can be used like this:
String fileOrUrl;
PrintWriter out = new PrintWriter(System.out, true);
VOTableWrapper votw = new VOTableWrapper(fileOrUrl, out);
if(votw.getLastError() != null){
System.err.println("No VOTable found " + fileOrUrl);
System.err.println("Last error is " + votw.getLastError());
System.exit(1);
}
Votable v = votw.getVotable();
The PrintWriter is the way in which diagnistics are passed.
It can be null if no diagnistics are wanted. Because we are expecting
this code to be used in servlets, it would not be appropriate to hard-code
System.out.
When the VOTableWrapper constructor returns, there may have been a error.
It is the user's responsibility to check for errors!.
This can
be done with the getLastError() method.
3: The VOSingleTable object
The following is an API sketch of an
additional helper object that is under construction at Caltech. It is
semantically equivalent to a single table (not the hierarchy that VOTable can
represent, of Resources, Tables, Parameters, Links etc).
The automatically constructed Table object carries the
table metadata effectively through the collection of Fields, it is not
so good at the data section. A FITS file is represented only by its filename, or
a base64 encoded version of the binary data; even if the data is represented by
pure XML (Tabledata), there is still no parsing of strings to double,
byte, float etc.
VOSingleTable is constructed from
a Table object:
VOSingleTable(Table t)
and it has methods that extract data from the table in any desired form.
Given the datatypes of the fields in the table, the Data section of the
table is read, and a binary representation built. First some information about
the table:
- Table getTable()
This method
returns the same VOTableUtil.Table that was used to construct the VOSingleTable .
- int getFieldCount()
This method
retrieves the number of fields (also called columns) in the table; this is the
same as the automatically generated Table method
getFieldCount().
- int getRecordCount()
The number of
records in the table may require reading and parsing a large data file, or it
may be very simple, depending on the implementation of VOSingleTable.
- boolean isVariableLength(int
ifield)
If the arraysize attribute of the corresponding
FIELD element contains an asterisk, then this is true. It means that
the number of primitives in a column of the table varies from record to
record.
- int primitiveSize(int ifield)
A
convenience method, essentially it implements Table 1 in the VOTable document,
mapping the datatype attribute to the number of bytes. For example if
the datatype of this field is "double", then this method returns 8.
- int primitiveCount(int ifield, int
irecord)
The number of primitives contained in the table at field
ifield and record irecord. If the field is of variable
length, then this may be different for each record, but if the field is not
variable length, then the irecord argument is ignored.
Now we come to the actual extraction of typed data from the
table.
There are two principles here. One is that data is copied from the XML
tree and stored at maximal density in memory, so that for example a float
takes only 4 bytes and a bit 1/8 of a byte. The other principle is that
any primitive type can be converted to
any other type. Thus we can apply the getFloatArray method to a table
cell that contains integers (37 converted to 37.0), and all the other
combinations. Complex types are converted to real by taking the real part; real
converted to long/integer/short/byte by standard Java casting convention;
integers are converted to logical/bit by mapping zero to false/0 and all others
to true/1.
There is an additional data extraction method getStringArray, which
attempts to format each object as a String, for example the float quantity 37.0
may be converted to "37.000000".
When data is read in from pure XML (Tabledata), each table cell
(TD element) yields a string, that is to be parsed into a sequence of
primitives. In general, these are tokenized by whitespace, as explained in
section 4.1 of the VOTable document, and each token parsed to the correct
primitive. An exception to this is character data, where separation by
whitespace should not be used: an array of 5 characters is read properly if the
XML contains "Apple", and the array should not be entered as "A p p l e".
We can imagine a matrix of conversion methods, from any of the values
of the datatype attribute in the XML document, to any of
the methods listed below that convert to whatever Java type is
required. This means, for example, that a code manipulating "Declination"
can get the data as double even though it may have been either
double or float in the source table.
The data extraction methods are of the form:
- boolean[] getBooleanArray(int ifield, int irecord)
- byte[] getBitArray(int ifield, int irecord)
- byte[] getByteArray(int ifield, int irecord)
- short[] getShortArray(int ifield, int irecord)
- int[] getIntArray(int ifield, int irecord)
- long[] getLongArray(int ifield, int irecord)
- char[] getCharArray(int ifield, int irecord)
- char[] getUnicodeCharArray(int ifield, int
irecord)
- float[] getFloatArray(int ifield, int irecord)
- double[] getDoubleArray(int ifield, int irecord)
- float[] getFloatComplexArray(int ifield, int
irecord)
- double[] getDoubleComplexArray(int ifield, int
irecord)
In each case, the contents of a table cell that correspond to the given
ifield, irecord will be converted (if necessary) to the desired type,
put into an array and returned.
Note that there are less built-in Java types than VOTable types. Thus a
DoubleComplex array of 5 elements is mapped to a double array of 10 elements --
the real and imaginary elements interlaced. The Bit datatype is mapped to Java
byte and the user should pick out the individual bits -- however it can also be
mapped to a boolean array at the expense of memory.
4: Obtaining the Software
The current release of JAVOT can be obtained at
http://us-vo.org/VOTable/JAVOT/JAVOT.zip.
Instructions
- Unzip the file. Bring up a command window, and do cd JAVOT.
- You will need to figure out where your Java installation is.
For Windows users, edit and then execute the script
Setup.bat. For Unix users, edit Setup.source and then
type source Setup.source. The editing of this file involves
the variable JAVA_HOME, the directory where your Java lives.
- Run the CompileAll script, which should need no modification.
Unix users should do sh CompileAll.bat.
This results in the Java sources being compiled and then collected into jar files in the
- Run the RunAll script, which runs the two sample programs,
BuildHtml and TestPrimMatrix. The former parses the table,
then outputs it as HTML. The latter is a regression test: it
reads in a table that has all primitive
types, then writes the output in all primiitve types. You can, for example, see how
a DoubleComplex is converted to a Boolean.
Now build your own application, using one of the sample programs as a guide.