Science With the Virtual Observatory
2006 Summer School

VOTables

Michael Fitzpatrick (NOAO)



Introduction and Background

VOTable is an XML-based format for representing tabular data (often astronomical catalogs) in a uniform manner across all VO software and services. XML permits the use of industry standard tools and software, but more importantly allows us to capture (and structure) the rich set of metadata associated with the data, columns of the table or an entire hierarchy of resources.

The immediate ancestors of the format are Astrores developed at CDS and the eXtensible Scientific Interchange Language (XSIL). Several of the NVOSS faculty were heavily involved in the definition of the format and VOTable was one of the first standards to be adopted by all IVOA partners.



The VOTable Document Format -- An Overview

The formal specification of VOTables is maintained at the IVOA site http://www.ivoa.net/Documents/latest/VOT.html and we'll only review the highlights of the format here.

Key features of VOTables include:

The degree to which any particular service makes use of the flexibility allowed by VOTable will vary greatly, however the majority of data and compute services utilize only a small fraction of this flexibility. Users writing general-purpose client software should however allow for the full range of capability to deploy a robust application; users running VOTable-aware applications should be cautious that not all of the information available in the VOTable is supported by the application. Data providers writing VOTables should be aware that not all VOTable parsers or client applications support all features of the format and should avoid using "exotic" features for critical data.



A Sample VOTable

<?xml version="1.0"?>
<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
  <COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
  <RESOURCE name="myFavouriteGalaxies">
    <TABLE name="results">
      <DESCRIPTION>Velocities and Distance estimations</DESCRIPTION>
      <PARAM name="Telescope" datatype="float" ucd="phys.size;instr.tel"
        unit="m" value="3.6"/>
      <FIELD name="RA" ucd="pos.eq.ra;meta.main" ref="J2000"
        datatype="float" width="6" precision="2" unit="deg"/>
      <FIELD name="Dec" ucd="pos.eq.dec;meta.main" ref="J2000"
        datatype="float" width="6" precision="2" unit="deg"/>
      <DATA>
        <TABLEDATA>
          <TR><TD>010.68</TD><TD>+41.27</TD></TR>
          <TR><TD>287.43</TD><TD>-63.85</TD></TR>
        </TABLEDATA>
      </DATA>
    </TABLE>
  </RESOURCE>
</VOTABLE>

Looking at the specification and the example above we find that the data model for a VOTable is composed of:

To help take some of the mystery out of a VOTable, let's break down the sample in places and explain each element in more detail.


The VOTABLE Element

This is the root element of the XML document tree and there can therefore be only one occurance in a file. The children of a VOTABLE will typically include as part of the metadata:

Additionally, and the part that we're usually really interested in, one or more:

In simple terms a RESOURCE is a set of related tables. The RESOURCE is recursive (it can contain other RESOURCE elements), which means that the set of tables making up a RESOURCE may define a more complex structure of the data.

A RESOURCE may have one or both of the name or ID attributes; it may also be qualified by type="meta", meaning that the resource is descriptive only (does not contain any actual data in any of its sub-elements). Finally, the RESOURCE element may have a utype attribute to link the element to some external data model (introduced in version 1.1, see section 4.5 of the spec)


The TABLE Element

A TABLE may contain descriptive metadata elements such as

The data itself begins with the <DATA> tag described below. Table cells must appear in the same order as the <FIELD> definitions and all records must have the same format. Empty cells are simply denoted with an empty XML tag (e.g. "<td/>").


The FIELD Element

A FIELD is a description of a table column that may contain additional descriptive elements such as LINK, VALUES or DESCRIPTION. Attributes of the FIELD are used to specify:


The PARAM Element

A PARAM is like a FIELD but keeps a constant value. It has the same set of attributes as FIELD and can be thought of as a global definition that applies to the entire RESOURCE. A PARAM is typically used to define a global value for the RESOURCE (e.g. query parameters, internal parameters used by the service, etc) and may be defined in specific units or tied to a more meaningful UCD by using attributes.


The INFO Element

An INFO is a restricted class of PARAM. It is mainly an informative value (e.g. the status return, number of rows in the table, etc) and is limited to supporting only name and value attributes.

DATA

The DATA element is unique in a TABLE, however a RESOURCE may contain multiple TABLEs. There are three possible formats for the data:

FITS and BINARY formats must contain a <STREAM> element, for instance

<TABLE> 
  <FIELD  ....>
  <DATA>
    <FITS extnum="2">
      <STREAM encoding="gzip" href="ftp://archive.nvo.org/myfile.fits.gz"/>
    </FITS>
  </DATA> 
</TABLE> 

The GROUP Element

The GROUP element is a relatively new feature meant to allow the logical association of FIELD and PARAM metadata elements. As you can see from the example this form of association is not unusual in astronomical tables. However, the GROUP element is not supported by all current VOTable readers/writers (As of this writing, the STIL Java library from Starlink and C++ Parsers from VO-India appear to be the only software supporting GROUP element.)


<TABLE name="Nutation and Abberation"">
  <FIELD name="Date">
  <GROUP name="Nutation">
    <FIELD name="Nut_Long">
    <FIELD name="Nut_Obl">
  </GROUP>
  <GROUP name="Abberation">
    <GROUP name="Equinox 1950">
       <FIELD name="Abber_C_1950">
       <FIELD name="Abber_D_1950">
    </GROUP>
    <GROUP name="Equinox 1955">
       <FIELD name="Abber_C_1955">
       <FIELD name="Abber_D_1955">
    </GROUP>
  </GROUP>
  <DATA>
      :
  </DATA>
</TABLE>
Nutation and Abberation
Date Nutation Abberation
in
Long.
in
Obl.
19501955
CD CD
Oct 1 +16 231+02 23318 6392 56918 6362 594
2 +16 231+02 23318 6392 56918 6362 594
3 +16 231+02 23318 6392 56918 6362 594
4 +16 231+02 23318 6392 56918 6362 594
5 +16 231+02 23318 6392 56918 6362 594

Parsers that do not support GROUP elements generally just ignore it. You can see from the above table that without the GROUP there are still six columns defined in the table and full access to the data is still allowed, however the association between the columns is lost. In some cases the UCD can provide a means of defining a FIELD more precisely, in other cases a descriptive and unambiguous name element would be a wise choice for a data provider.


Reading a VOTable in Java

In the $NVOSS_HOME/java/dev/readvotable directory you'll find a sample program using the VOTwrap interface for parsing the file. Below we show a slightly modified version of that same task to be used as a basis for discussion during the presentation.

package readvotable;
import edu.jhu.pha.ivoa.*;
import java.io.*;
import java.text.*;

class ReadVotable {
 public static void main(String[] args) throws Exception {
   // Assume we were given an argument....
   readVot(args[0]); 
 }

 public static void readVot(String fname) throws Exception{
    InputStream is = new FileInputStream(fname);	
    VOTWrap.VOTable vot = VOTWrap.createVOTable(is);

    VOTWrap.Resource res = vot.getResource(0);
    VOTWrap.Table tab =  res.getTable(0);
    int fcount = tab.getFieldCount();
    int rcount = tab.getTableData().getTRCount();

     for (int f=0; f < fcount; f++) {
       VOTWrap.Field field = tab.getField(f);
       System.out.print (field.getName()+":"+field.getUCD()+" ");
    }

    System.out.println();
    System.out.println("There are "+fcount+" fields on "+rcount+" rows:");

     for (int r=0; r < rcount; r++) {
      VOTWrap.TR row = tab.getTableData().getTR(r);
       for (int f=0; f < fcount; f++) {
         VOTWrap.TD td = row.getTD(f);
         System.out.print (td.getPCDATA()+"   ");
      }
      System.out.println();
    }
 }
}

Exercise

Beginning with the above program or the original code, modify the task to read a VOTable and output one or more HTML tables (i.e. only the data or separate tables for the metadata and tabular data). The java print() or println() would be used to write out the additional markup.

Hint:There is a strong analogy between the xml elements definining a RESOURCE, its table column headers, and the data for each row, and the representation of these in standard HTML. Post solution here


An XSLT Transformation Example

XSLT stylesheets provide a powerful means of reading and converting XML files to some other format (typically text or HTML, but possibly also another XML document). Syntax for procedures, iterators and conditional expressions make XSLT in many ways an actual programming language, and for our needs one ideally suited to handling VOTable documents. Sytlesheets may be applied from within languages such as Java, or using commandline tools such as xsltproc.

Consider the XSLT file below that converts a VOTable to the corresponding HTML table as we described in the above exercise. Note the mix of HTML tags, the embedded XSLT elements and syntax (i.e. those beginning with the "xsl:" namespace), and in particular the VOTable elements to be processed.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/VOTABLE">

<html><body>
 <xsl:for-each select="RESOURCE/TABLE">
  <table border="1">
  <tr> <xsl:for-each select="FIELD">
     <td><b><xsl:value-of select="@name" /> </b></td>
  </xsl:for-each> </tr>

  <xsl:for-each select="DATA/TABLEDATA/TR">
       <tr>
        <xsl:for-each select="TD">
          <td width="120"><xsl:value-of select="." /></td>
        </xsl:for-each>
       </tr>
  </xsl:for-each>
  </table>
 </xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
produces...
RADec
010.68+41.27
287.43-63.85
Exercise:
  • Cut and paste the sample VOTable above and using your favorite editor save to a file called e.g. sample.xml
  • Cut and paste the sample XSLT stylesheet to the left and save to a file called e.g. sample.xsl
  • Apply the stylesheet to the VOTable to produce the HTML table:
  • (on Unix):
    • % xsltproc sample.xsl sample.xml
Homework:
  • Modify the stylesheet to extract the PARAM values into a separate table before printing out the data table. See this file for a modified XSLT solution, and this file for the results, if you get stuck.

The use of XSLT as a means of developing small utility tasks for use on both the client and server side shouldn't be overlooked: Simply counting the number of rows in a table, extracting a column of interest, looking for an error return flag, or converting a table for web presentation are simple things that can be done in XSLT and will generally be faster to run than the equivalent task in a language such as Java due to the reduced startup overhead. This makes it ideal for CGI server scripting or in the development of AJAX web applications.


Tools For Examining VOTables

The introduction to VO applications later in the School will mention a number of tools that make use of VOTables for doing image overlays (Aladin), plotting (VOPlot) or general interaction and visualization (TOPCAT). These tools are all included in the NVOSS software distribution and students are encouraged to experiment with them.


Command-line VOTable Tools

In developing your own applications or services you may or may not need to ever actually parse or create a VOTable yourself. In doing science with VO data and your own legacy code, it may be more convenient to simply convert or manipulate a VOTable into some form that is easier to use. An excellent set of tools to do this is the STILTS command-line tools available from Starlink.

STILTS contains the following commands:

tcopy - Table format converter
Converts tables between formats.
tpipe - Generic table pipeline processing utility
Powerful command providing row selection, sorting, column rearrangement, algebraic data manipulation, statistical calculations, metadata display, format conversion, etc.
votcopy - VOTable encoding translator
Copies VOTable data leaving the structure intact but changing the data encoding between TABLEDATA, BINARY and FITS.
votlint - VOTable validity checker
Checks whether a VOTable document conforms to the standard, reporting on many aspects beyond simple conformance to the schema/DTD.

Other tools are also available such as a table concatenator and crossmatcher, more information can be found at http://www.star.bris.ac.uk/~mbt/stilts/sun256/cmdUsage.html and all commands are available with the version included in the NVOSS software. The distribution also contains the jar file with the programatic interface to each of these procedures.


Converting VOTables to Other Formats

It isn't always possible to directly read a VOTable in the preferred user environment, either because the software meant to ingest the data can't easily be converted to parse XML, or because the user only needs a subset of the data in the table (e.g. just the RA/Dec). XML is ideally suited to conversion to other formats however the hierarchical nature of VOTable doesn't always map to some other format.

The NVOSS software distribution contains the STILTS utility that can be used to convert and process VOTables to other formats such as simple text output, CSV, or various flavors of FITS tables. It also contains an "encoding translator" that will convert one type of VOTable to another (e.g. one with binary encoding that might not be supported by your parser of choice to one with TABLEDATA). Other forms of processing of the table are also permitted, e.g. to select columns/rows in a table, create/delete columns based on an expression using existing columns, and even to output a table directly to a MySQL database.

As an example, to convert VOTable to CSV or a FITS file one would use:

% stilts tcopy messier.xml messier.fits

The filename extensions are used as clues to the format, however there are arguments for creating specific formats (see http://www.star.bris.ac.uk/~mbt/stilts/sun256/outFormats.html. Note: In many of the conversions one should be aware that the UCD of a column is lost in the output format and only the 'name' attribute is used. Since the name is up to the data provider it complicates the reading of tables from generalized data sources.

Likewise, selection within a table can be accomplished using commands such as

% stilts tpipe cmd='keepcols POS_EQ_RA_MAIN POS_EQ_DEC_MAIN' in.xml out.txt
% stilts tpipe in=survey.fits out=ascii \
  cmd='select "skyDistance(hmsToRadians(RA),dmsToRadians(DEC), \
       hmsToRadians(2,28,11),dmsToRadians(-6,49,45) < 5 * ARC_MINUTE"'
In the first example we simply select the RA/Dec column of a table and output them in a text format. The second looks a little more complicated but may be handy in the Project phase of the NVOSS as it effectively implements a cone search around a given point within a table (in this case objects within 5 arcmin of (02:28:11,-06:49:45).

VOTable Resources

Parsers

Name Description Link
Java VOTWrap read $NVOSS_HOME/java/dev/ivoaclient/src/ivoa
JAVOT read http://www.us-vo.org/VOTable/JAVOT/
SAVOT read/write/edit http://cdsweb.u-strasbg.fr/devcorner.gml
STIL (Starlink Tables Infrastructure Library) read/write/edit http://www.starlink.ac.uk/stil/
VOTable Java Parser based on XML Schema read/write http://spider.ipac.caltech.edu/staff/jchavez/public/votable_parser.html
VOTable Java Streaming Writer write http://vo.iucaa.ernet.in/~voi/votableStreamWriter.htm
C++ C++ Parser read http://vo.iucaa.ernet.in/~voi/cplusparser.htm
Perl VOTable Perl Modules read/write http://heasarc.gsfc.nasa.gov/classx/votable/
VOTable::DOM format/print http://monet.ncsa.uiuc.edu/~rplante/VO/VOTable-DOM.pm

Applications

Name Description Link
Aladin Interactive software sky atlas to visualize digitized images, to superimpose entries from astronomical catalogs http://aladin.u-strasbg.fr/
conVOT For converting ASCII or FITS tables to VOTable format http://vo.iucaa.ernet.in/~voi/conVOT.htm
Mirage Data visualization tool and exploratory analysis http://www.bell-labs.com/project/mirage/
STILTS Command-line votable format conversion and processing http://www.star.bris.ac.uk/~mbt/stilts/
TOPCAT Table visualization and editing http://www.starlink.ac.uk/topcat/
Treeview Viewer for hierarchical structures of astronomical data http://www.starlink.ac.uk/treeview/
VOPlot Visualization tool for VOTable data http://vo.iucaa.ernet.in/~voi/voplot.htm
VotFilter an XML filter for OpenOffice Calc to read and write VOTable files http://services.china-vo.org/vofilter/
RVS Server-side display of n-dimensional astronomical images http://www.atnf.csiro.au/vo/rvs/