NVO HOME
National Virtual Observatory
ICON
VIM: Visual Integration and Mining
Hosted By
CALTECH HOME

What is VIM?

VIM is a tool for researching multiple sky positions simultaneously. Each source becomes a row in a table, with catalog, image cutouts, and spectral information. This information is drawn from all the published surveys of the astronomical literature.

 

VIM assumes that an astronomer wants information about a specific set of positions (points in the sky). The two types of information are catalogs and image surveys. For catalogs, proximity searches are performed with respect to archived catalogs, finding which of the input positions is near enough to a catalog member that the two might be the same physical object. For images, cutouts can be generated from the major surveys. Most of the worlds quantitive astronomical data is available to VIM, as with any Virtual Observatory application, including primary catalogs (SDSS, 2MASS, NED, etc), and also the long tail -- the holdings of data centers such as NASA, CDS, and ESO, which have almost any published catalog.

 

Sky positions can be input in two ways: by the simple format "RA, Dec, ID" for each position, or by a VOTable file. Such files can be created and downloaded at places such as HEASARC,or Vizier, and you can read a step-by-step guide to this. You can also make a VOTable from text with the NVO Table Wizard. When uploading a set of positions, it is allocated persistent storage ("workbench") for itself and the tables of catalog matches that will share that space. The workbench can have a password attached, that will be needed to write-enable the workbench.

 

For any table in the VIM workbench, the user can select which part of this they see. For each table there is a data-dictionary display, showing the column names, descriptions etc, and allowing the user to select which should be visible. The rows of the position table are split into pages for display purposes. Thus very wide and very long tables can be stored on the server, but only a small view of this sent to the client.

 

Data is added to the system through proximity searches, cutout images, or spectra. The user can select from a cache of primary catalogs, or visit the VO registry system to find and select other catalogs, which can then be added to the cache. The cache can be seen with the Catalogs/View Catalog Cache. The user can initiate a proximity search with any catalog, finding all catalog entries within a specified distance of each of the input positions. These can then be merged with the position catalog. Rows can be selected to keep based on arithmetic predicates, and new columns can be created by arithmetic expression.

 

In this way, by integrating with published data, the user can identify the nature of the input positions, finding the outliers, clusters, and correlations. All this can be done with a web based system that scales to thousands of positions. As you work with Vim, a transcript is created of what you have done (top right of the display). If you have terminal access to a machine with Vim installed, for example your laptop, this script can be used to rerun the data fetching and computing, or the script can be edited to do a larger task.

Why Choose Vim?

Vim illustrates several new ideas that will extend and generalize, not just within the virtual astronomical observatory (VAO), but more generally to any of the other 'VxO' infrastructures that are growing up now to fully and richly virtualize all scientific data.

 

A key concept of the VxO is that data can be 'published, found, and bound'; meaning that there is a single, global library of data resources; anyone can build and publish data to that library; there are rich search tools for resources in that library; and the machines can automatically connect and utilize the found resources. Early in the NVO project, federation became important, leading to DataScope and other tools that can find out 'everything' about an astronomical source. After seven years, the global registry is well-established and has thousands of resources registered.

 

Another dimension of the VO concept is scalability, the recognition that modern astronomy is not only about studying single objects, but about populations of objects, where the number can range from a handful to a billion. Again, early in the NVO project, this was exploited with the OpenSkyQuery tool, where complex joint queries can be run against astronomical databases that may be large and gerographically separated. A sophisticated architecture was built (Skynode), and some success reported. However, it is a difficult technical matter for a user to create these queries, there have been difficulties with international acceptance of the required protocol, and awkward steps when the number of sources is scaled up.

 

VIM: Multiple sources, Multiple catalogs

 

Vim is a software architecture and toolbox, developed at Caltech, that allows users to investigate specified sources in multiple catalogs, with the idea that both the number of sources and the number of catalogs can be scaled up, and scaled not just in terms of the computer, but in terms of the human experience. The sources are specified by a set of identified sky positions, and the catalogs are actually any of the data services registered with the global registry. Here, we use the word 'catalog' generally, implying not only source catalogs, but also image and spectral surveys, observing logs, light curve archives, etc. Vim gives users persistent storage, called a 'workbench', which stores the original source table, then mashes up information about the sources from VO resources, so the source table gets rich annotation. Data is fetched from the VO in several protocols, so that all resources can be used through slow protocols (eg Cone Search), yet a resource with the fast protocol (eg Skynode, TAP, SDSS) can be fully exploited. Vim offers a short list of 'primary' catalogs (eg SDSS, NED), together with access to the NVO registry for keyword and other semantic searching, as well as to the NVO Inventory service to find catalogs that are spatially correlated.

 

Scaling: the click and script paradigm

 

Some users want to compare a handful of sources, some want to work with thousands or millions. Vim is built to enable users to *start* by playing with a handful of sources -- browsing the different catalogs and image surveys, understanding what the data means by reading rich metadata and the literature, visualizing, building strategy for bad and missing data, and so on. All this is done with a web browser that shows a view of a long, wide table, together with a set of tools that can operate on the table and fetch data. As the user works, a script is automatically built, each command corresponding to mouse clicks in the browser; running that script against the Vim API library will then reproduce exactly what happened in the interactive session. However, the real utility of the script is so the user can edit it, removing dead ends, extending and exploiting promising paths. Huge tables can be handled by the web interface, since it can show only a subset of the full data; huge tables can be handled by the underlying infrastructure, which is based on bulk access to data and codebase that can handle tables with a billion rows (Stilts). For information on scripting, see Scripting With VIM.

 

Cloudable: from laptop to server

 

Vim can run as a service, with workbenches on a remote server, or it can also run on a desktop or laptop, with data on the local disk. The server-based version allows users to assess and play with Vim before the commitment of a download and install; it means that powerful machines with big data and fast pipes can be brought to bear on big datasets; it also means that URL links to data can be sent to colleagues. The laptop version, however, can build workbenches without fear of purging by server adminstrators; it can exploit the scripting capability of Vim without the need to get an account at a computer center; it can run data fetching for hours or days without fear of timeouts. A local installation also means that programmer-users can build new tool components for the Vim toolbox, or modify the existing ones. For information on installing, see the VIM Installation Page.

 

Sharing made simple

 

Each Vim workbench has a unique URL that contains a 32-digit random string, the 'bench ID', and so workbenches are secure, because a valid URL cannot be invented (security through obscurity). However, the workbench URL can be put in an email or blog, and thereby shared very simply; others just click on the URL and the whole visualization and toolbox appears in the browser. There is also a read-only mode for any Vim workbench that can only be unlocked with a password; this means that the creator of a workbench can, if they wish, share its content in a read-only mode, retaining the right to change the content.

 

Architecture: tools in a toolbox

 

Vim is built as a toolbox of tools. A tool, in this context, is a method that modifies the data in the workbench; it can be called by either a web form or a function call, and its code is separate from other tools, communicating only through the workbench. A tool consists of: a way to generate its web form control; a way to read the values that the user put in that form; a way to generate the equivalent script to the invocation; and a 'business logic' to take those values and compute on the workbench. A Vim application is a combination of these tools with the web-display technology; Vim uses the Yahoo User Interface to generate sophisticated menus, grids, and table displays. The astronomy-based toolbox described above is not the only one; the Vim download also includes a tiny reference implementation of a toolbox that can only do adding and multiplyng numbers; in this way programmers could utilize the display and scripting capabilities of Vim, a clean separation from the tables of stars that is the business of the astronomy toolbox.

 

 


NSF HOMENASA HOME

Developed with the support of the National Science Foundation
under Cooperative Agreement AST0122449 with the Johns Hopkins University
The NVO is a member of the International Virtual Observatory Alliance

This NVO Application is hosted by Caltech

Member
IVOA HOME
Meet the Developers
MEET THE DEVELOPERS