The National Virtual Observatory |
This session is an introduction to the Open SkyQuery service.
Open SkyQuery is a VO service that allows you to query a set of distributed astronomical catalogs in a federated fashion, e.g. finding all objects in a particular region of the sky that have been detected by SDSS, 2MASS and GALEX, where the respective catalogs are standalone resources.
For a catalog to be usable by the Open SkyQuery service, it needs to have a SkyNode interface. For the catalog to be available to the service, the SkyNode has to be registered in the VO registry.
The SkyNode service supports the operations that the Open SkyQuery service requires such as querying catalog metadata (e.g. column names) and formats, estimating query costs and executing specified queries and returning the results in a requested format.
SkyNodes were explicitly designed to support distributed queries. When the Open SkyQuery service receives a query, all the relevant SkyNode services are first queried for the number of rows that meet the query region constraints. A query plan is then created in such a way that the smallest SkyNode is executed first and it sends the results on to the next SkyNode size to do the first crossmatch and so on.
Open SkyQuery and SkyNodes are queried using ADQL (Astronomical Dataset Query Language), which is essentially a subset of SQL (SQL92) with additional support for spatial queries. A typical ADQL query looks like:
SELECT o.objid, o.ra, o.r, o.type, t.objId
FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary y
WHERE REGION('Circle J2000 181.3 -0.76 6')
AND XMATCH(o, t) < 2.5
AND o.type = 3
The REGION extension allows the specification of regions of the sky, in this case, a circle of radius 6 arcmins centred on RA = 181.3, Dec = -0.76. The syntax is controlled by the STC (Space-Time Cooridnates) standard.
The XMATCH extension specifies a chi-squared likelihood that objects from the respective different catalogs are the same object based on their positional coincidence. The algorithm employed to determine the likelihood is specified by the SkyNode specification as:
If the measurement errors on the positions are Gaussian then χ2 represents the inverse statistical variance of (x, y, z). The XMATCH function returns 1/χ after χ2 has been minimized for the most likely positions. Thus the constraint in the sample query above means that a combination of detections will be rejected if its standard deviation is more than 2.5.
We'll start our tour of Open SkyQuery with a look at the Simple Query Form which we reach by clicking on Simple Query in the menu bar.
This is designed to support simple queries against a single catalog and crossmatches involving up to three catalogs. So let's imagine that we are interested in finding objects that have been detected in the optical and the radio but not in the near-infrared. Looking at the form, you see a list of the available catalogs (SkyNodes) on the left - they are colour-coded according to wavelength regime and clicking on the little (i) icon to the right of each item will bring up a pop-up box giving more information about that particular catalog. Clicking on the name of the catalog will bring up a list of the different tables in that catalog and then clicking on a table, the fields within it.
For our purposes, SDSS, FIRST and TWOMASS are probably a good bet for the catalogs that we might want to use so we select these and hit Next.
The next form allows us to generate our query. We uncheck the TWOMASS catalog as we are interested in non-detections from that catalog. If we wanted we could also change the value of sigma to get more or less stringent crossmatching but 3.5 sigma is a reasonable value so we'll leave it as is. Since this is intended to be exploratory science, we can leave the query region as is but let's increase the search radius to 30 arcmins to ensure that we get a decent number of objects.
We could also place some constraints on preselected photometric parameters from each catalog but we won't here. We click on Generate Query and get the ADQL syntax corresponding to our query showing both the XMATCH and Region extension functions. We're happy with this so we can go ahead and click on Submit Query. If we were not and wanted to change it in any way then we would need to click on Update Query before submitting it.
We get a page back showing the results of our query which we can save in the format that is most useful to us.
Now that we've familiarised ourselves with the basics, let's have a look at the Advanced Query Form which we reach by clicking on Advanced Query in the menu bar.
This allows us to build our own queries from scratch or edit existing ones. The list on the right gives nine sample queries that we can adapt for our own ends. Clicking on a query will bring up its ADQL representation in the main window which we can then edit by clicking on the Edit tab.
Let's select the Brown Dwarf Search as our example and change the region it's going to search over. We click on Submit when we have finished editting.
When we submit a query, the right panel changes to show its status. Each SkyNode that is participating in the query is listed and has a different colour associated with it, depending on its current status, e.g. pink if the SkyNode is waiting for results from another SkyNode downstream in the query execution plan and blue if it has completed its part of the overall query. When the full query has completed, we can view the results and save them in the format of our choice or see them plotted in the browser via a VOPlot applet.
We can also build queries by using the + next to each SkyNode in the left panel. This puts a piece of ADQL syntax into the query box and if the Build tab is selected, each component of the query is clickable, bringing up appropriate controls for it, e.g. clicking on SELECT brings up a control to add extra columns to the query. The Region button at the bottom of the panel will add a Region specification to the query and this can then be clicked on to change the syntax to how you want it.
You're not limited to just using existing catalogs for crossmatching purposes as Open SkyQuery allows to you to upload your own data. Clicking on the Import tab on the menu bar brings up the XMatch Table Import Form.
You can import data stored locally in a file or cut and paste directly into the text window. You need to specify the format of the data and also of the object positions. Once the data has been imported, you will see an additional MyData SkyNode in the left-hand panel under which will be the a table containing the data you provided with the name you specified. This SkyNode can then be used in a query like any other SkyNode. Note, however, that MyData does not support Region clauses: if a Region clause is used in a query containing MyData then it will be applied to other SkyNodes in the query but MyData will ignore the statement.
Finally we'll mention that Open SkyQuery has a SOAP-based web services interface and so can be used programmatically from a client. More information can be found here and in the NVO Book.
There are a number of current limitations on Open SkyQuery that it is worth be awaring of:
Queries are limited to a maximum of 5000 rows (see here for more details). As the order in which records are returned from catalogs can change from query to query, it is advisable to specify the region and/or other constraints such that the number of objects returned is within this limit.
It is only possible to specify one Region per query
Only files smaller than 100Mb in size can uploaded.
Here is some recommended further reading:
On Open SkyQuery: Chapter 13 of the NVO Book and the Open SkyQuery website
On SkyNodes: Chapters 53 and 54 of the NVO Book
On ADQL: Chapter 36 of the NVO Book
The NVO Summer School is made possible through the support of the National Science Foundation.
![]() |