2008 Summer School

DAL Clients: Scripting Data Access with Python

This exercises presented here are companions to the presentation Data Accesss Layer Clients (Python) (PDF, PPT). An extensive discussion of the Data Access Layer can be found in Section 5 of the NVO Book.

Part 1. Simple Cone Search

Additional Reading: See Chapters 45 and 46 of the NVO Book for more information on access Cone Search services.

Exercise 0. Find some Cone Search services.

We're going to start by finding some Cone Search services interactively via the Registry Portal. The simple keyword search box works well for this if we include "ConeSearch" as one of the keywords. For example, try entering "ConeSearch nearby galaxies" and clicking the Search button.

Resources that are marked "Catalog" under the "categories" column support the Cone Search interface. What we need to access these ourselves (i.e. not through the portal) is the accessURL. This column is not shown by default; to see it, go to the bottom of the results page where the full set of columns is listed. Click on the "accessURL" name (and wait for the page to update). Then return to the top of the table and scroll to the far right to see the accessURLs.

Notice also the value of the "capabilityClass" column. This is column is how we matched against our "ConeSearch" input keyword.

The other way to see the access URL is to click on the "Full Record" link at the left of the desired resource. When you see the record, click on the "[+]" next to the "Simple Cone Search" section near the bottom of the page. The base URL shown under "Available endpoints for the standard interface" is the access URL. We will select, copy, and paste this URL to use it in our next exercise.

Exercise 1: Your First Cone Search Client

  1. With your mouse select and copy the access URL into the Location window at the top of your browser.
    Never done this before? Here's what you do:
    • Click or double-click on the URL in the Location box at the top of your browser.
    • Hit the backspace key to empty the box.
    • Click and drag your mouse over all of the characters that make up the access URL.
    • Right-click and select "Copy" from the pop-up memory.
    • In the Location box back at the top, Right-click and select "Paste"
    The base URL will appear.
  2. Type in the search parameters at the end of the access URL. Here's a cone roughly on NGC 4258: "RA=184.5&DEC=47.2&SR=0.25"
    Note: DAL access URLs must end with either a "?" or an "&" character. If yours doesn't, append a "?" before the query parameters if it doesn't already have a "?" somewhere in the URL. If a "?" is already there, append an "&" before adding your query parameters.
  3. Hit Return! Do you eventually see a VOTable?

Exercise 2: Accessing a Cone Search Client from Python using VOLib

The VOlib python library is a simple and easy to use set of modules for accessing "simple" services and parsing the output. These modules are located in the python/src/volib subdirectory of the software distribution; however, once you have loaded the environment, this library available automatically regardless of your directory. Here's some of what's in the library:

VOTable -- provides both a generic XML parser (VOTable.VOXML) and a VOTable parser (VOTable.VOTable)
Sesame -- queries the CDS object name resolver
ConeSearch -- queries a Cone Search service
SIAP -- queries a Simple Image Access service
SSAP -- queries a Simple Spectral Access service
An example of using the generic XML parser can be found in dalclients-xml.py. To run it, first save it to disk. Then pass it a VOTable on the command line by typing...
python dalclients-xml.py votable_filename

We will now use the ConeSearch module to query a cone search service. The dalclients-cs.py script queries the 2MASS point source catalog and looks like this:

import VOTable
import ConeSearch
cs = ConeSearch.ConeSearch("http://irsa.ipac.caltech.edu/cgi-bin/Oasis/CatSearch/nph-catsearch?CAT=fp_psc&")

try:
    vot = cs.getVOTable(RA=184.5, DEC=47.2, SR=0.25)
    desc = vot.getDescription()
    print desc

    fields = vot.getFields()
    print "Table Columns:"
    print "Name         \tUCD             \tunit"
    print "-------------\t----------------\t-----"
    for f in fields:
        print f.getAttribute("name"),"      \t",f.getAttribute('ucd'),'   \t',f.getAttribute('unit')
    print
    
    # getColumnIdx() will find the column where a string matches the name, 
    #  ID, UCD, or UType.  
    id = vot.getColumnIdx('ID_MAIN')
    ra = vot.getColumnIdx('POS_EQ_RA_MAIN')
    dec = vot.getColumnIdx('POS_EQ_DEC_MAIN')

    # print the name of the column and the position of the column in a row
    print "Id (%d)      \tRA (%d)   \tDEC (%d)" % (id, ra, dec)
    print "------       \t----------\t--------"

    # now print each row
    for row in vot.getDataRows():
        data = vot.getData(row)
        print "%s\t%s\t%s" % (data[id], data[ra], data[dec])
    
except Exception, e:
    print "Sorry, there was a failure: " + str(e)

Now try a few things yourself:

  1. Run the script yourself
  2. Change the service access URL to a different service
  3. One reason to use python to query the service is that you might want to do some analysis on the data. Try adding a column to the data table that gets printed that computes, say, the J-K magnitude.

Part 2. Simple Image Access

Exercise 1: Finding and Testing SIA Services

Try searching for Simple Image Access services in the registry, this time using "SimpleImageAccess" to select these services. Then try executing it in your browser. Here's an example:

http://adil.ncsa.uiuc.edu/cgi-bin/voimquery?survey=f&POS=184.5,47.2&SIZE=0.5

Exercise 2: Accessing a Cone Search Client from Python

We can use the SIAP module to access images from an archive. Try running dalclients-sia.py, which looks like this:
import SIAP
import urllib

sia = SIAP.SIAP("http://archive.stsci.edu/siap/search.php?id=galex_atlas&")

try:
    vot = sia.getVOTable(RA=184.5, DEC=47.2, SIZE=0.5)
    desc = vot.getDescription()
    print desc

    # getColumnIdx() will find the column where a string matches the name, 
    #  ID, UCD, or UType.  
    id = vot.getColumnIdx('VOX:Image_Title')
    ra = vot.getColumnIdx('POS_EQ_RA_MAIN')
    dec = vot.getColumnIdx('POS_EQ_DEC_MAIN')
    fmt = vot.getColumnIdx('VOX:Image_Format')
    url = vot.getColumnIdx('VOX:Image_AccessReference')

    # print the name of the column and the position of the column in a row
    print "Id (%d)      \tRA (%d)   \tDEC (%d)\tFormat (%d)\tURL (%d)" % \
        (id, ra, dec, fmt, url)
    print "------       \t----------\t--------\t-----------\t-------------"

    # now print each row
    for row in vot.getDataRows():
        data = vot.getData(row)
        print "%s\t%s\t%s\t%s\t%s" % (data[id], data[ra], data[dec], data[fmt], data[url])

    # fetch an image
    data = vot.getData(vot.getDataRows()[0])

    out = open(data[id], 'w')
    try:
        imagestream = urllib.urlopen(data[url])
        out.write(imagestream.read())
    finally:
        out.close()
    
except Exception, e:
    print "Sorry, there was a failure: " + str(e)
Notice that after printing the table, we fetched the first image using the urllib module.

Now try a few things yourself:

  1. Run the script yourself
  2. Change the service access URL to a different service
  3. Change the query to look for images in computer graphics formats by adding a "FORMAT='GRAPHICS'" parameter.

Consider these questions:

  1. How would we download all FITS images returned in the table?
  2. Does the script need to be made more robust for your chosen service?

Part 3. Simple Spectral Access

How could you adapt the dalclients-sia.py script to use the SSAP module and fetch spectra?

 


The NVO Summer School is made possible through the support of the National Science Foundation.