Science With the Virtual Observatory |
VOTable is an XML-based format for representing tabular data (often astronomical catalogs) in a uniform manner across all VO software and services. XML permits the use of industry standard tools and software, but more importantly allows us to capture (and structure) the rich set of metadata associated with the data, columns of the table or an entire hierarchy of resources.
The immediate ancestors of the format are Astrores developed at CDS and the eXtensible Scientific Interchange Language (XSIL). Several of the NVOSS faculty were heavily involved in the definition of the format and VOTable was one of the first standards to be adopted by all IVOA partners.
The formal specification of VOTables is maintained at the IVOA site http://www.ivoa.net/Documents/latest/VOT.html and we'll only review the highlights of the format here.
Key features of VOTables include:
| Pure XML | The table contains only the XML elements allowed by the VOTable spec within a TABLEDATA element. |
| Simple binary | The data in a particular cell may be encoded as a CDATA xml element within a TABLEDATA, or a STREAM element referring to a remote data source if contained within a BINARY element. |
| FITS Binary Tables | A data table may contain a STREAM element defining an access reference to a FITS binary table format file. Support for accessing a specific extension is provided and column metadata may be included as part of the VOTable. |
The degree to which any particular service makes use of the flexibility allowed by VOTable will vary greatly, however the majority of data and compute services utilize only a small fraction of this flexibility. Users writing general-purpose client software should however allow for the full range of capability to deploy a robust application; users running VOTable-aware applications should be cautious that not all of the information available in the VOTable is supported by the application. Data providers writing VOTables should be aware that not all VOTable parsers or client applications support all features of the format and should avoid using "exotic" features for critical data.
<?xml version="1.0"?>
<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
<RESOURCE name="myFavouriteGalaxies">
<TABLE name="results">
<DESCRIPTION>Velocities and Distance estimations</DESCRIPTION>
<PARAM name="Telescope" datatype="float" ucd="phys.size;instr.tel"
unit="m" value="3.6"/>
<FIELD name="RA" ucd="pos.eq.ra;meta.main" ref="J2000"
datatype="float" width="6" precision="2" unit="deg"/>
<FIELD name="Dec" ucd="pos.eq.dec;meta.main" ref="J2000"
datatype="float" width="6" precision="2" unit="deg"/>
<DATA>
<TABLEDATA>
<TR><TD>010.68</TD><TD>+41.27</TD></TR>
<TR><TD>287.43</TD><TD>-63.85</TD></TR>
</TABLEDATA>
</DATA>
</TABLE>
</RESOURCE>
</VOTABLE>
Looking at the specification and the example above we find that the data model for a VOTable is composed of:
To help take some of the mystery out of a VOTable, let's break down the sample in places and explain each element in more detail.
This is the root element of the XML document tree and there can therefore be only one occurance in a file. The children of a VOTABLE will typically include as part of the metadata:
Additionally, and the part that we're usually really interested in, one or more:
In simple terms a RESOURCE is a set of related tables. The RESOURCE is recursive (it can contain other RESOURCE elements), which means that the set of tables making up a RESOURCE may define a more complex structure of the data.
A RESOURCE may have one or both of the name or ID attributes; it may also be qualified by type="meta", meaning that the resource is descriptive only (does not contain any actual data in any of its sub-elements). Finally, the RESOURCE element may have a utype attribute to link the element to some external data model (introduced in version 1.1, see section 4.5 of the spec)
A TABLE may contain descriptive metadata elements such as
The data itself begins with the <DATA> tag described below. Table cells must appear in the same order as the <FIELD> definitions and all records must have the same format. Empty cells are simply denoted with an empty XML tag (e.g. "<td/>").
A FIELD is a description of a table column that may contain additional descriptive elements such as LINK, VALUES or DESCRIPTION. Attributes of the FIELD are used to specify:
A PARAM is like a FIELD but keeps a constant value. It has the same set of attributes as FIELD and can be thought of as a global definition that applies to the entire RESOURCE. A PARAM is typically used to define a global value for the RESOURCE (e.g. query parameters, internal parameters used by the service, etc) and may be defined in specific units or tied to a more meaningful UCD by using attributes.
An INFO is a restricted class of PARAM. It is mainly an
informative value (e.g. the status return, number of rows in the table, etc)
and is limited to supporting only name and value attributes.
The DATA element is unique in a TABLE, however a RESOURCE may contain multiple TABLEs. There are three possible formats for the data:
| TABLEDATA | Pure XML table composed of TR elements to define the rows and TD elements for each cell |
| FITS | FITS binary table, may contain an extnum attribute to refer to a particular extension of a stream. Header keywords are typically encoded as PARAMs in the metadata. |
| BINARY | for efficient transfer, data are encoded (e.g. base64 gzip, or dynamic when the MIME type will be specified by the service) |
FITS and BINARY formats must contain a <STREAM> element, for instance
<TABLE>
<FIELD ....>
<DATA>
<FITS extnum="2">
<STREAM encoding="gzip" href="ftp://archive.nvo.org/myfile.fits.gz"/>
</FITS>
</DATA>
</TABLE>
The GROUP element is a relatively new feature meant to allow the logical association of FIELD and PARAM metadata elements. As you can see from the example this form of association is not unusual in astronomical tables. However, the GROUP element is not supported by all current VOTable readers/writers (As of this writing, the STIL Java library from Starlink and C++ Parsers from VO-India appear to be the only software supporting GROUP element.)
<TABLE name="Nutation and Abberation"">
<FIELD name="Date">
<GROUP name="Nutation">
<FIELD name="Nut_Long">
<FIELD name="Nut_Obl">
</GROUP>
<GROUP name="Abberation">
<GROUP name="Equinox 1950">
<FIELD name="Abber_C_1950">
<FIELD name="Abber_D_1950">
</GROUP>
<GROUP name="Equinox 1955">
<FIELD name="Abber_C_1955">
<FIELD name="Abber_D_1955">
</GROUP>
</GROUP>
<DATA>
:
</DATA>
</TABLE>
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Parsers that do not support GROUP elements generally just ignore it. You can see from the above table that without the GROUP there are still six columns defined in the table and full access to the data is still allowed, however the association between the columns is lost. In some cases the UCD can provide a means of defining a FIELD more precisely, in other cases a descriptive and unambiguous name element would be a wise choice for a data provider.
In the $NVOSS_HOME/java/dev/readvotable directory you'll find a sample program using the VOTwrap interface for parsing the file. Below we show a slightly modified version of that same task to be used as a basis for discussion during the presentation.
package readvotable;
import edu.jhu.pha.ivoa.*;
import java.io.*;
import java.text.*;
class ReadVotable {
public static void main(String[] args) throws Exception {
// Assume we were given an argument....
readVot(args[0]);
}
public static void readVot(String fname) throws Exception{
InputStream is = new FileInputStream(fname);
VOTWrap.VOTable vot = VOTWrap.createVOTable(is);
VOTWrap.Resource res = vot.getResource(0);
VOTWrap.Table tab = res.getTable(0);
int fcount = tab.getFieldCount();
int rcount = tab.getTableData().getTRCount();
for (int f=0; f < fcount; f++) {
VOTWrap.Field field = tab.getField(f);
System.out.print (field.getName()+":"+field.getUCD()+" ");
}
System.out.println();
System.out.println("There are "+fcount+" fields on "+rcount+" rows:");
for (int r=0; r < rcount; r++) {
VOTWrap.TR row = tab.getTableData().getTR(r);
for (int f=0; f < fcount; f++) {
VOTWrap.TD td = row.getTD(f);
System.out.print (td.getPCDATA()+" ");
}
System.out.println();
}
}
}
Beginning with the above program or the original code, modify the task to read a VOTable and output one or more HTML tables (i.e. only the data or separate tables for the metadata and tabular data). The java print() or println() would be used to write out the additional markup.
Hint:There is a strong analogy between the xml elements definining a
RESOURCE, its table column headers, and the data for each row, and the
representation of these in standard HTML. Post solution here
XSLT stylesheets provide a powerful means of reading and converting XML files to some other format (typically text or HTML, but possibly also another XML document). Syntax for procedures, iterators and conditional expressions make XSLT in many ways an actual programming language, and for our needs one ideally suited to handling VOTable documents. Sytlesheets may be applied from within languages such as Java, or using commandline tools such as xsltproc.
Consider the XSLT file below that converts a VOTable to the corresponding HTML table as we described in the above exercise. Note the mix of HTML tags, the embedded XSLT elements and syntax (i.e. those beginning with the "xsl:" namespace), and in particular the VOTable elements to be processed.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/VOTABLE">
<html><body>
<xsl:for-each select="RESOURCE/TABLE">
<table border="1">
<tr> <xsl:for-each select="FIELD">
<td><b><xsl:value-of select="@name" /> </b></td>
</xsl:for-each> </tr>
<xsl:for-each select="DATA/TABLEDATA/TR">
<tr>
<xsl:for-each select="TD">
<td width="120"><xsl:value-of select="." /></td>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
| produces... |
| ||||||
Exercise:
|
||||||||
The use of XSLT as a means of developing small utility tasks for use on
both the client and server side shouldn't be overlooked: Simply counting
the number of rows in a table, extracting a column of interest, looking for
an error return flag, or converting a table for web presentation are simple
things that can be done in XSLT and will generally be faster to run than the
equivalent task in a language such as Java due to the reduced startup overhead.
This makes it ideal for CGI server scripting or in the development of AJAX
web applications.
The introduction to VO applications later in the School will mention a number of tools that make use of VOTables for doing image overlays (Aladin), plotting (VOPlot) or general interaction and visualization (TOPCAT). These tools are all included in the NVOSS software distribution and students are encouraged to experiment with them.
In developing your own applications or services you may or may not need to ever actually parse or create a VOTable yourself. In doing science with VO data and your own legacy code, it may be more convenient to simply convert or manipulate a VOTable into some form that is easier to use. An excellent set of tools to do this is the STILTS command-line tools available from Starlink.
STILTS contains the following commands:
tcopy
- Table format convertertpipe
- Generic table pipeline processing utility votcopy
- VOTable encoding translator votlint
- VOTable validity checker Other tools are also available such as a table concatenator and crossmatcher, more information can be found at http://www.star.bris.ac.uk/~mbt/stilts/sun256/cmdUsage.html and all commands are available with the version included in the NVOSS software. The distribution also contains the jar file with the programatic interface to each of these procedures.
It isn't always possible to directly read a VOTable in the preferred user environment, either because the software meant to ingest the data can't easily be converted to parse XML, or because the user only needs a subset of the data in the table (e.g. just the RA/Dec). XML is ideally suited to conversion to other formats however the hierarchical nature of VOTable doesn't always map to some other format.
The NVOSS software distribution contains the STILTS utility that can be used to convert and process VOTables to other formats such as simple text output, CSV, or various flavors of FITS tables. It also contains an "encoding translator" that will convert one type of VOTable to another (e.g. one with binary encoding that might not be supported by your parser of choice to one with TABLEDATA). Other forms of processing of the table are also permitted, e.g. to select columns/rows in a table, create/delete columns based on an expression using existing columns, and even to output a table directly to a MySQL database.
As an example, to convert VOTable to CSV or a FITS file one would use:
% stilts tcopy messier.xml messier.fits
The filename extensions are used as clues to the format, however there are arguments for creating specific formats (see http://www.star.bris.ac.uk/~mbt/stilts/sun256/outFormats.html. Note: In many of the conversions one should be aware that the UCD of a column is lost in the output format and only the 'name' attribute is used. Since the name is up to the data provider it complicates the reading of tables from generalized data sources.
Likewise, selection within a table can be accomplished using commands such as
% stilts tpipe cmd='keepcols POS_EQ_RA_MAIN POS_EQ_DEC_MAIN' in.xml out.txt
% stilts tpipe in=survey.fits out=ascii \
cmd='select "skyDistance(hmsToRadians(RA),dmsToRadians(DEC), \
hmsToRadians(2,28,11),dmsToRadians(-6,49,45) < 5 * ARC_MINUTE"'
In the first example we simply select the RA/Dec column of a table and
output them in a text format. The second looks a little more complicated
but may be handy in the Project phase of the NVOSS as it effectively
implements a cone search around a given point within a table (in this
case objects within 5 arcmin of (02:28:11,-06:49:45).
| Name | Description | Link |
|---|---|---|
| Aladin | Interactive software sky atlas to visualize digitized images, to superimpose entries from astronomical catalogs | http://aladin.u-strasbg.fr/ |
| conVOT | For converting ASCII or FITS tables to VOTable format | http://vo.iucaa.ernet.in/~voi/conVOT.htm |
| Mirage | Data visualization tool and exploratory analysis | http://www.bell-labs.com/project/mirage/ |
| STILTS | Command-line votable format conversion and processing | http://www.star.bris.ac.uk/~mbt/stilts/ |
| TOPCAT | Table visualization and editing | http://www.starlink.ac.uk/topcat/ |
| Treeview | Viewer for hierarchical structures of astronomical data | http://www.starlink.ac.uk/treeview/ |
| VOPlot | Visualization tool for VOTable data | http://vo.iucaa.ernet.in/~voi/voplot.htm |
| VotFilter | an XML filter for OpenOffice Calc to read and write VOTable files | http://services.china-vo.org/vofilter/ |
| RVS | Server-side display of n-dimensional astronomical images | http://www.atnf.csiro.au/vo/rvs/ |