The National Virtual Observatory
2006 Summer School

Building a SkyNode Server

Tom McGlynn (NASA/GSFC)

Introduction.

This talk describes the issues involved in setting up a SkyNode server in the Virtual Observatory. SkyNode's allows outside users sophisticated access to data tables. Currently users will not normally interact directly with a SkyNode. SkyNodes are accessed through SkyPortals. SkyPortals can relay a request to the appropriate SkyNode for simple queries, and can break down a complex request that involves multiple nodes into a series of requests against these nodes.


This diagram shows how the SkyPortal may initiate a request to several SkyNodes. As part of the process of deciding how to handle the full query, current SkyPortals will send metadata and cost requests directly to each of the nodes to assess the best order for the query may be done.

SkyNodes are defined as Web Services, and their interface is defined through a WSDL file that gives the details of the specific services. Currently the WSDL describes SOAP as well CGI POST and GET interfaces to the SkyNode. These differ little in functionality: the differences occur in how the data is transported to the responding task and how the data is wrapped. We shall only consider the SOAP interfaces in the following discussion.

This presentation discusses the general issues of building a SkyNode. These include the needed pre-requisite capabilities, the functions a SkyNode needs to support, conversion of the query inputs to something the local database can use and handling of the special ADQL concepts. We then describe the FullSkyNode package included in the NVOSS2006 software distribution. This package enables users to rapidly set up SkyNodes if a few basic criteria are met.

SkyNode Prerequisites

There are really only two substantial prerequisites for building a SkyNode: a queryable dataset, and a Web server.

The data provider needs to have some data to provide! It is likely that the provider has a relational database system running, but this is not explicitly required. It is typically far easier to put data into one of the free relational databases, than to build the translations that would be required to support queries of a table in some native FITS or other file format. The underlying organization of the data is largely invisible to the outside world.

Current SkyNodes all include tables with positions on the sky. This is not required by the protocol and it is perfectly feasible to create a non-positional node (though perhaps calling it a SkyNodes would be a bit of a misnomer then). A SkyNode may support one or more tables, but it is assumed that one table is the primary positional table. Special positional constraints can be applied to this table, and positional matches with other SkyNodes are possible. For current SkyNodes this table is normally a table of object positions. It is perfectly possible to have a table of observations as the 'primary' table, but this is not currently done. Many SkyNodes have both object and observation tables included, but the object table has always been set as the primary table.

Since SkyNodes run as Web services, the provider must be able to modify the configuration of some Web server to run their SkyNode. The Web server need not be on the same machine as the database. Most databases are easily accessible via network links. A typical configuration may have the Web server on some publicly visible machine (perhaps in a DMZ on sites with more sophisticated security postures), and the database server behind a firewall, visible to the Web server, but not to the general public.

SkyNode Functionality

The current (v1.01) documentation for SkyNodes describes functionality at the requirements level. A more useful to developers, though far less friendly, description of the interface is given by the WSDL returned from SkyNodes. We treat the SkyNode services developed at JHU and STScI as reference nodes. These were the first SkyNodes available are are still likely the most commonly used. E.g., the WSDL returned from the GSC2.2 SkyNode can be be retrieved at the URL http://galex-dev.stsci.edu/GSC23Skynode/nodeb.asmx?wsdl. Even this is not sufficient to build a functioning SkyNode. The discussion later in this document and in the final section describes specific issues that must be addressed by the SkyNode developer that are not discussed in the documentation nor indicated in the WSDL.

The following metadata functions are to be supported by SkyNodes:

Return WSDL describing the node
As with most Web services this is done by appending with ?wsdl to the base URL of the Web Service.
Tables
Return metadata on all of the tables in the SkyNode. The metadata includes the name, description and number of rows in the table.
Table table_name
Return metadata on a specific table.
Columns table_name
Return metadata on all columns of a specified table. The metadata may include the name, description, type and UCD of the column.
Column table_name column_name
Return metadata on a specific column of specific table.
Formats
Return the formats supported by the SkyNode. SkyNodes are required to support VOTables, and the software we describe later uses only this format.
Functions
Return the metafunctions supported by the SkyNode. No use of this function is known.
FootPrint region_specification
Return the overlap between the input footprint and the region specified. If a node does not wish to implement this it can simply return the input region specification (implying all-sky coverage).
GetAvailability
Return a description of when the service is available. No use of this function is known.
In addition to these metadata queries, there are three more complex services.
PerformQuery adql_query format
Execute a query on the local machine and return the results in the requested format.
QueryCost adql_query
Estimate the cost of this query. Normally the cost is an estimate of the number of rows the query will return. (Executing the query is a perfectly valid approach). This method will be used by SkyPortals in estimating how to organize a query most efficiently.
ExecutePlan exec_plan
If appropriate relay the plan to a downline node and ingest the results. Run the query on the local machines and return the results.

A SkyNode that supports only the PerformQuery function is sometimes called a 'basic' SkyNode. A 'full' SkyNode supports all of three of the major services. The software described below supports full SkyNode capabilities. There is an implicit requirement on full SkyNodes that the a request can generate a temporary table in the database. If this is not possible, then a provider may wish to disable the ExecutePlan functionality. Most databases allow for temporary tables which are automatically deleted when the database session in which they were created terminates even for users who cannot create permanent tables.

Handling the metadata functions are relatively straightforward. A data provider may choose to put this in the code, read from configuration files, or query the database (especially for the column and table information). Some of the metadata functions are not currently used.

Handling the last three methods is more complex and we deal with the issues they raise in the following sections. PerformQuery and ExecutePlan run queries on the SkyNode and return results as VOTables or possibly in proprietary formats. The QueryCost function may use some heuristic in estimating the query cost, but the simplest implementation is to execute the query and count the results.

Handling an Execution Plan

The execution plan is a relatively simple structure. Basically it has a set of rows, one for each node that participates in the query. Each row has a node identifier, a URL for the node, and a query to be executed on this node. [There's a little bit more information for the execution plan as a whole, but we don't need to worry about this here.] The SkyPortal sends the execution plan to the URL specified in the first row. That SkyNode is expected to recognize that it is the current node by looking at the node identifier for the node, and see that the id in the execution plan matches the node id.

It is not the role of the SkyNode to do anything with the execution plan. That's fully the purview of the SkyPortal.

Once a node locates itself in the ExecutePlan, the node needs to check to see if there are subsequent nodes. If there are, then the execution plan is sent unaltered to the node specified in the next row. The syntax allows for fallback URLs if the primary URL fails, e.g., if a given SkyNode has multiple mirrors. It is unclear if this have ever been used.

The downline SkyNode should return a table as output. The current SkyNode is expected to ingest this table into the local database. To avoid tables cluttering the database it is best to use temporary tables which will be deleted automatically when the current session terminates.

The node then executes its own query. The local query will only refer to local tables, and to the one uploaded table.

Once the SkyNode is finished with the local query it simply prints out the results (in the agreed format) on the standard output. The upline node, or portal, will be waiting for them.

Handling the Query

The queries included in requests to the SkyNode are sent as an XML document essentially giving a fully parsed version of the ADQL query. Translating this query into something that the local database can handle requires a number of transformations.

Automated tools can fairly quickly regenerate an SQL-like ADQL query. The SkyNode implementation below uses an XSL transformation. The ADQL received by SkyNodes is somewhat simpler than that described in the ADQL documentation. SkyNodes should never see node designators in the ADQL. E.g., a user might enter

select * from SDSS:photoprimary a ...
in the SkyPortal, but when the request comes to the SDSS node, the query will look like
select * from photoprimary a ...
The portal has taken care of making sure that the query sent to each node involves only the tables it has access to.

Also, any XMATCH seen at a SkyNode will have exactly two tables involved. Multiple XMATCH's are broken up into a series of queries at different nodes.

The user may need to address the two ADQL special features: REGION and XMATCH. We discuss these below showing how these can be translated into more or less standard SQL. For very large databases more specialized and optimized solutions are likely to be desirable, but the approach here illustrates issues that any SkyNode needs to address.

Regions

A region may have been specified using a REGION constraint in the WHERE clause of the database. Only circular regions are currently used. The ADQL will have an element in the WHERE clause looking like: REGION('CIRCLE J2000 12.3 15.6 10') This indicates that we are only to search within 10 arcminutes of the J2000 position (RA,Dec)=(12.3,15.6) where the coordinate values are in decimal degrees.

Note that the REGION does not indicate the table or parameters that are involved in this selection. This is up to the SkyNode provider. (Hopefully this will be more explicit in future versions of SkyNodes.)

Assuming RA and Dec columns in decimal degrees this can be replaced using the Haversine formula for the distance between two points of a sphere.

d = 2 asin([sin( (d0-d1)/2 ) ]2 + cos(d0)cos(d1)[ sin( (a0-a1)/2 )]2 )
Rearrange this to be a little simpler:
sin(d/2) = sin( (d0-d1)/2 )2 + cos(d0)cos(d1)sin( (a0-a1)/2 )2
The Haversine formula is much better behaved at small separations than the classic Law of Cosines.

Following the example above sin(10'/2) = .0014..., cos(15.6) = .963... and there are 0.017... radians per degree so the constraint can be expressed in SQL as

( power(sin(0.017*(15.6-dec)/2),2) + 0.963*cos(0.017*dec)*power(sin( 0.017*(12.3-ra)/2), 2)) >= 0.0014
While this constraint is not all that complex (though it looks a lot nastier when the numbers are written to their full precision), it is also not very friendly for using any indices the table might have on RA and Dec. One may get better performance if the additional constraints are added:
ra between min_ra and max_ra and dec between min_dec and max_dec
where the min's and max's define a box that encloses the circle. This kind of simple constraint will be used by query optimizers if the are RA or Dec are keys or have indices in the table.

There are a couple of subtleties here. If the circle wraps over the 360 to 0 transition in RA, then the between on the RA should be replaced with a not between. E.g., if we want the data between RA=359 and 1 then the maximum is 359 and the minimum is 1 and we wish to exclude everything between 1 and 359.

It is fairly easy to compute the min and max Dec values, but getting the min and max RA is a little less obvious. Just assuming that the delta we wish to apply in rights ascension is just Dra=Ddec/cos(dec) turns out to be a bit naive. The maximum range in longitude for a circle of radius r drawn around a point at latitude b0 occurs at a latitude b such that sin(b) = sin(b0)/cos(r) This implies that the maximum offset in RA is

Dra = sqrt(cos(r)2 - sin(b0)2)/cos(b0)
Of course if either of the poles is included in the region then there is no constraint on the RA alone.

Some tables may represent coordinate positions with unit vectors in addition to or in place of standard coordinates.

Px = sin(ra)*cos(dec) Py = cos(ra)*cos(dec Pz = sin(dec)
With these columns the region specification is easier to translate into an SQL constraint. The dot product of two vectors is proportional to the magnitudes of the vectors times the cosine of the angle between them. We can get the unit vector to the center of our region C with components Cx,Cy,Cz. If have the unit vectors for each row in the database Px,Py,Pz then we can replace the Region constraint with
Px*Cx+Py*Cy+Pz*Cz < cos(r)
where r is the radius of the region. The magnitudes of the unit vectors are by 1 by definition.

Handling XMATCH

Handling XMATCH's requires more work, since the SkyPortal makes many implicit assumptions. What needs to be done depends upon where the current SkyNode is in the set of nodes. XMATCH's will only be found when we are processing an ExecutePlan since they always involve multiple nodes.
A brief review of XMATCH
The XMATCH algorithm is described in detail at the OpenSkyQuery portal site. Here we give a short, but perhaps more intuitive description. Consider a set of unit vectors on the unit sphere. Take the average -- in three space -- of those vectors. The average vector will always lie within the sphere. As the points are spread further over the sphere the average vector will get shorter and shorter.

The XMATCH algorithm takes advantage of this and accepts matches from multiple catalogs when the average of the points from all the catalogs being joined is longer than some criterion. The math in the reference above suggest an appropriate value for that criterion. When a catalog has very precise positions, it gets weighted highly in building the average. Lower precision catalogs are weighted less. The weight is essentially the inverse square of the positional accuracy expressed in radians. Since we only deal with averages, not two way joins, the algorithm is clearly symmetric with regard to any number of participating tables. Because of the way the variances add, it turns out to be possible to do the calculation as a series of two-way joins without losing this symmetry.

Adding in the XMATCH columns
An XMATCH is always between the current node and the results uploaded from the immediately downline node. Thus the bottom node never executes an XMATCH. However the bottom node and all nodes except the top node must return six special parameters as columns in the results. The columns must have the following names.
xmatch_a
The sum of the weights of the unit vectors so far
xmatch_ax, xmatch_ay, xmatch_az
The weighted sums of the unit vectors.
pk
An integer (normally intended to be distinct)
id
Another integer (normally intended to be distinct)
These parameters are usually not specified in the selection list, but they must be returned.

For the bottom node if we do not keep the positions as unit vectors, we need to compute the positions from the coordinates (as described in the Region specification) and multiply by the table weight. Note that the table weight is typically a large number, of order 1010 for a table with a positional accuracy of a few arcseconds. For intermediate nodes, we need to add the weighted values from the current node to the values we read from the previous node. (Note the exception in anticorrelations below).

Current SkyPortals do not normally request the unit vectors and weights from the top node, so these need not be supplied there.

Two integer fields are also needed. These PK and ID fields are used to identify rows in the cross-match at some nodes, but the values do not seem to be crucial.

The SkyPortal normally requests an XMATCH_CHISQ value from the top node. This is simply the difference between the magnitude of the vector represented by (xmatch_ax,xmatch_ay,xmatch_az) and the sum of the weights, xmatch_a (see below in Perform the XMATCH for how we might compute this).

Performing the XMATCH
For all nodes except the bottom node, there will be something like XMATCH(a,up) < 3.5 in the query. To replace this with standard SQL is a bit tedious. It is a bit easier if the table has unit vector columns, but basically we need to replace this bit of the ADQL with something like:
power(up.xmatch_a+weight) - power(up.xmatch_ax + weight*sin(a.ra)*cos(a.dec),2) - power(up.xmatch_ay + weight*cos(a.ra)*cos(a.dec),2) - power(up.xmatch_az + weight*sin(a.dec),2) < 3.5*3.5
Here we've left out the conversions from degrees to radian to keep the expression a little clearer. This is just the chi-square calculation.
Anticorrelations
One of the useful capabilities of an XMATCH is to do anticorrelations. When a SkyNode sees an anticorrelation it will always be the local table that will be correlated against, e.g., the node will see XMATCH(up,!a), never XMATCH(a, !up).

An anticorrelation is most conveniently done in two steps. First do the standard XMATCH, and then return all the uploaded results that were not matched in the first step. This can be done by saving the first result as a temporary table, or by using a subselect.

In the initial query we replace the entire selection list with a unique key for the uploaded table information. Presumably this could be the mysterious PK or ID fields, but it is probably best to generate a key as one uploads these data directly to be sure of uniqueness. So

select a,bunch,of,fields,... from localtable a, upload up where XMATCH(up,!a) < 3.5 AND other criteria
is replaced by the query
select _upload_key_ from localtable a, upload up where XMATCH(up,a) < 3.5 and other criteria

These results (i.e., the column of matched keys) can be stored in a temporary table (using syntax that depends upon the local database). Then we execute a second query

select a,bunch,of,fields from upload up where _upload_key not in (select _upload_key from upload)
If you don't like temporary tables you can put the entire first query in the subselect in the second.

Note that all the the remaining criteria in the WHERE clause are dropped from the second query. They should have already been applied. Also, there should be no columns that reference the local table in the selection list. It is not clear how such columns could be filled. Anticorrelations are filters that drop rows from the database but never add anything.

Note that this means that when we do an anticorrelation we do not (since we cannot) add information from the current table to the xmatch_a, xmatch_ax, ... columns. The values in the uploaded table are passed on unchanged if there is a higher level node.

The FullSkyNode Software

A Java-based SkyNode package is supplied in the Summer School software distribution in the $NVOSS_HOME/java/src/fullskynode directory. This package supports all of the major functions of the SkyNode: it can query the local database or handle an execution plan sent from another SkyNode. This section discusses how to adapt this software to easily bring up new SkyNodes.

This software implements all major functions for SkyNodes and is intended to replace all the FullSkyNode package released last year, and the BasicSkyNode package released in the first summer school. It does not use any Java environment outside of what is provided with the Java 1.5 JRE. Neither AXIS nor any other Web services framework is required. Java 1.5 has substantially reduced the time needed to start up applications and running as a CGI rather a servlet does not significantly slow this application.

The SkyNode specification makes a distinction between basic and full nodes. This package supports the full SkyNode implementation.

Additional Prerequisites

The software package requires the following to be able to work:

Organization of the software installation

The FullSkyNode package requires the installation of 5 files. We describe each in turn. The ANT script provided in the source directory is set to install a SkyNode that points to the MySQL database built elsewhere in the Summer School. For a production installation a provider may wish to modify this script to do the installation. The notes indicate how users might tweak the provided files for different circumstances. The Summer School Software package comes with Ant scripts that will compile and install the five files described above as a CGI script using the TOMCAT server also included in the package. Note that although this software is in Java it runs as a CGI command not a Java servlet. It does not need a TOMCAT server, we have just used that one because it is available. It can run with Apache or any Web server that can initiate CGI scripts.

The ANT script and the vo.settings file supplied with the distribution can be modified to adapt the default settings to a given SkyNode. A USAGE file indicates how a user can set up a SkyNode with the MySQL SPOCS database set up earlier in the Summer School. Note that the ANT install script will transform the vo.settings, skynode.wsdl and sn.pl found in the source directory as it installs them into the CGI directory. Fields of the form $XXXX$ are replaced. The XSL file is not currently edited by the ANT script.

Run Time Configuration

The Full SkyNode uses the fields of the SkyNode class to control its actions. Users can override the defaults at run time using a vo.settings files which is found in the same directory as the CGI script. It can be edited there, or they they can edit the vo.settings file found in the FullSkyNode source directory before the SkyNode is installed using ANT.

Each line of the settings file is either blank, a comment (begins with #) or overrides a single setting. A line looks like:

key=value[,value,...]
where key is the case-insensitive name of the field to be overriden, and value is the value to be overriden. If the field is an array, multiple values are be separated by commas. Commas are not allowed within the individual values. We use mixed case in giving the keywords to clarify their meaning, but the keys in the settings file are case insensitive. The case of the values may be significant.

Key settings include:

WSDLFile
The WSDL service description file. This is the path to the WSDL file from directory in which the CGI script is run. This file is only used when a request comes for the WSDL description (i.e, http://some.node/../sn.pl?wsdl). It is not used within the program at all, nor are any classes built from it.
XSLFile
The XSL file used to translate from the XMLified query to an ADQL string. Users can modify this file to adapt to the local SQL dialect, or they may prefer (as is done in the current software) to manipulate the ADQL string version.
MaxString
The Maximum length of a string. This is used when ingesting string values from a downline node. Strings will be truncated at this length.
AdqlURI
The URI used to identify the very of ADQL used. This may be needed in future versions if the ADQL is manipulated before being sent to downline nodes. It is not currently used.
VOTableURI
The URI used to identify the version of the VOTable used. This is used when the VOTable result is sent upline. Currently it indicates the version 1.1 of the VOTable standard is being used.
jdbcURL
The JDBC URL that identifies the database to connect to. The syntax of this URL varies among databases. Here are some of the variants (note that a given system may support more than one variant).
  • Oracle: jdbc:oracle:thin:@<SERVERNAME>:<PORT>:<DBNAME>
  • Sybase: jdbc:sybase:Tds:<SERVERNAME>:<PORT>/<DBNAME>
  • SQL Server: jdbc:microsoft:sqlserver://<SERVERNAME>:<PORT>
  • MySQL: jdbc:mysql://<SERVERNAME>:<PORT>/<DBNAME>
  • Postgres: jdbc:postgresql://<SERVERNAME>:<PORT>/<DBNAME>
JDBCDriver
The class name of the JDBC Driver class. This is specific to a given database.
  • SQL Server: com.microsoft.jdbc.sqlserver.SQLServerDriver
  • MySQL: org.gjt.mm.mysql.Driver
  • Oracle 8i: oracle.jdbc.driver.OracleDriver
  • Oracle 9i: oracle.jdbc.OracleDriver
  • Postgres: org.postgresql.Driver
  • Sybase com.sybase.jdbc2.jdbc.SybDriver
jdbcName
The account name for the database account used in queries. Note that this is the database account name, not the system account name.
jdbcPwd
The password for the account used in queries. Note that this is the password for the database account, not the system account.
TempTableName
The name used in creating temporary tables. Special strings may be needed. Some database systems use "create temporary table" to create a temporary table. Others use special names. The software tries the "create temporary table" syntax first and if that fails assumes that a name beginning with '#' is to be a temporary table.
TargetNode
The Identifier of the SkyNode. This is given as the ShortName in the registry entry for the SkyNode. It must be unique. Note that the SkyPortal will normally send this as uppercase.
raName
The name of the RA column in the database. This column is assumed to have the decimal J2000 (or ICRS) right ascension for the given row. This is the RA column that will be used in resolving REGION and XMATCH constraints.
decName
The name of the Dec column in the database. This column is assumed to have the decimal J2000 (or ICRS) declination of the table. This is the declination column that will be used in resolving REGION and XMATCH constraints.
TableName
An array of table names that are to be part of the SkyNode. If more than one table name is to be specified, separate them with commas.
TableDescription
An array of table descriptions that are to be part of the SkyNode If more than one table is to be specified, separate the values with commas. Note that a comma cannot be escaped to be part of the description.
TableSize
An array giving the size of the tables. This is often specified as -1 for each table.
PositUncert
The positional uncertainty of objects in the SkyNode. In arcseconds. This value is used in a XMATCH to allow simpler, index friendly, search criteria to be added to the XMATCH. It's value should be coupled with the TableWeight. Note that this is given in arcseconds, but the TableWeight is the inverse square of this in radians. 1 radian=200000 arc seconds.
TableWeight
The weight of the SkyNode in an XMATCH. This is equal to 1/PositUncert^2 when the PositUncert is expressed in radians. Increasing the TableWeight has the effect of decreasing the match radius.
LogFile
The name of a file in which details (and some debugging information) will be stored. By default this file "sn.log". If you are having problems with the SkyNode look at the log to see where it might be having difficulties. The code will run fine if the log file cannot be created, but you will have a more difficult time resolving problems.
Example vo.settings file
# This is a template for the vo.settings file used for the run-time # configuration of the SkyNode. The case of the key fields (to # the left of the =) does not matter, but the value fields to the # right may be case sensitive. # # Fields beginning and ending with $ will be updated by the build # script automatically as part of installation. # Other options may be set. See the fields of the SkyNode.java class. # The node by which this SkyNode will be known by portal services. # (should match the ShortName in the VO registry entry for the SkyNode) TargetNode=TSTHSRC3 # The names of the tables to be included in the SkyNode (comma separated) TableName=spocs, spocs_ext # The descriptions of the tables to be included in the SkyNode (comma separated) tabledesc=SPOCS Table, SPOCS Extension # The number of rows in the tables to be included in the SkyNode (comma separated) tableSize=-1,-1 # The JDBC URL that points to the SkyNode jdbcURL=jdbc:mysql://localhost/SPOCS # The directory portion of the SkyNode URL cgibase=/cgi-bin/vo/sn3/ # The directory in which the SkyNode CGI script is to reside cgihome=/skyview/htdocs/cgi-bin/vo/sn3/ # The XMATCH positional uncertainty in arc-seconds. posituncert=10 # The XMATCH table weight. This should be roughly the # inverse square of the positional uncertainty expressed in radians. # Note that 1" = 5e-6 radians. tableweight=1.e9 # The database name used to access the SkyNode. jdbcName=skynode # The database password used to access the SkyNode # It is recommended that this be changed here and in the adduser.sql file. jdbcPwd=nvo

Configuring Multiple SkyNodes

Given the constraints on how tables are used in SkyNodes, a data provider may wish to set up multiple nodes pointing to different tables in the database. The easiest way to accomplish this with the software provided is to create a separate directory in the CGI area for each SkyNode. The CGI script, settings and WSDL file will need to be customized for each node, but the XSL file and the JAR file can be shared by all of them. The settings files need to indicate which tables each node is to use and may have different table weights if the tables have different positional accuracies. The WSDL files need the appropriate URL.

Modifying the code.

The FullSkyNode software may be adapted to address specific issues in a given installation. The Java code is available and may be used and modified without any restrictions. Note that there are likely errors in the system and a few features have yet to be implemented. The Java classes used in the library are in the net.ivoa.skynode package with a couple of utilities from the net.ivoa.util package. The only other Java code needed not found in the system libraries in Java 1.5 is the JDBC JAR file for the database that is to be queried.

The local Java classes (in the net.ivoa.skynode package) are listed below.

SkyNode
This class is primarily definitions of constants used throughout the SkyNode software. There are also static methods that update the compiled settings with values from the vo.settings file.
FullNode
This is the main class for the SkyNode. It initiates the process of parsing and responding to a request. It handles the request for the SkyNode WSDL and provides a central logging facility.
SoapParser
This class parses the SOAP request and passes the elements to be processed by other class.
SoapWrapper
This class provides the appropriate SOAP wrappers for results.
DBQuery
This class handles the low level communications with the database using JDBC. It also handles database dependent features. Currently this includes handling temporary tables and how a query should be formatted to return only a limited number of rows.
Query
This class is responsible for translating the query from the input ADQL into SQL that the database can handle. It translates the REGION and XMATCH constraints.
MetaQuery
This class handles the metadata queries of the database.
Ingester
This class reads a VOTable from a downline node and ingests it into a table in the local database.
Encoder
This class writes a VOTable given the results of the current query.
DOMUtil
This class provides a few utility functions that for handling XML representations of data.
XSLTransform
This class invokes an XSL transformation on an XML document.
BreakError
This class is used to terminate SAX parsing of an XML document.
FootPrintQuery
This class responds to the FootPrint request. The current implementation simply returns the input region.
PlanExecuter
This class processes an Execution Plan.
QueryCost
This class returns the cost of a query -- currently just the number of rows in the query.

Some changes can be relatively trivial. E.g., if the Right Ascension is stored in decimal hours rather than decimal degrees, then a factor of 15 needs to be added several places in the code.

If the positions are stored in galactic coordinates, then the changes required are rather more substantial. While it's very easy to handle the REGIONS, since we can simply translated the J2000 input to Galactic coordinates, doing an XMATCH is rather more complex. Users may note that the database is perfectly happy doing the complex arithmetic transformations, so it would be perfectly feasible to write the SQL that transforms the L and B in the RA and Dec (or perhaps does this for the unit vectors). Alternatively the data provider may choose to add RA and Dec columns to the table.

If positions are already stored in unit vector fashion, then the handling of REGION and XMATCH can be substantially simplified.

The code uses XSL transformations in Java to transform from an XML to a string representation for the query. A user can update this XSL to more closely mimic the local database. E.g., the supplied code might translated something at select top 10 ... but this XSL file can be modified to translate that to the MySQL compatible syntax of select ... limit 10

Future Developments In SkyNodes

The SkyNode and ADQL standards are the subject of intense interest currently in the both the NVO and the IVOA generally. A number of suggestions have been made as to how to improve and generalize the SkyNodes. These include new ADQL functions (e.g., a function to make it easy to specify the angular distance between two points), sophisticated region handling, more complex query topologies than the simple linear topology of the current query plans, and changes to the way the query is encoded. These should greatly enhance the power of SkyNodes.

It is unlikely that there will be substantial changes before the end of the year, and we intend that should there be changes, new versions of the FullSkyNode code will be available to adapt to these changes with minimal effort on the data providers part. E.g., a SkyNode may need to update the JAR file used.