NESSSI Developers Guide

Joseph C. Jacob, Conrad Steenberg, Matthew Graham, Roy Williams

California Institute of Technology

Introduction

NESSSI means the NVO Extensible, Scalable, Secure Service Infrastructure. The infrastructure is Extensible because a developer can deploy a service into a NESSSI server that can take powerful advantage of national-scale supercomputing infrastructure, and yet integrate well with the NVO, in the following senses:

The NESSSI application server is built from the Clarens secure service container, that has been developed by the HIgh-Energy Physics community for distributed analysis of the LHC data. The container enables secure, asynchronous services to run on a Grid cluster such as one of those of the TeraGrid. The roles are:

The purpose of this document is to provide all of the information a service developer needs to know in order to deploy a Clarens service, with information also provided for the system administrator of the Clarens installation. First we discuss how each of the three roles see the service.

From the Client point of view...

The client is responsible for creating a request for service, together with a X.509 certificate (or other credentials; see Graduated Security description below), wrapping both in a secure SOAP envelope, and delivering this over the HTTPS protocol to the Nesssi server. A web server could be set up with a connection to a PURSE/MyProxy installation, so that a human uses the forms on the web server, which in turn generates the request/certificate, and acts as the client in its interactions with the Nesssi server. The client could also be started by command-line, from a machine which has a certificate stored onboard and is using the open-source Clarens client library.

From the Developer point of view ....

The developer has written code that takes a string argument to drive it, and produces some output files. The developer has the ability to restart the Nesssi server, which is required whren code is changed. The developer also has various environment variables defined to allow interaction with the Nesssi server (see below).

From the Root point of view ....

The Nesssi server is a secure web server (https), and also an open web server (http), also called Clarens, which runs over an Apache web server installation. Requests and certificates come via the secure channel, and the nature of the certificate sets the user account that runs the request, and also the amount of CPU time that can be devoted to the request. There is a "sandbox" area on the file systems where the service creates output files, and this is exposed theough the open web server. A developer that wishes to install a service needs a symlink from the Clarens root directory to the directory where the code is, and the developer also needs the ability to restart the server. Also there is a "gridmap" file listing certificate DN and the corresponding username: if the DN comes with a request, then the request is executed as the corresponding username. Developers should be interviewed so that they understand how to make a service that checks its inputs carefully to prevent buffer-overflow attacks and other types of CGI vulnerabilities.

What is a NESSSI service?

A service consists of a Python program called __init__.py in which a number of functions are defined, followed by a main program that associates names with each function. That name is then used by a client to instantiate that function. Furthermore, the service may be a directory tree of Python programs, each called __init__.py, and each containing named functions.

Here is an example:

  connection = Nesssi.client('https://tg-www.cacr.caltech.edu:8443/clarens/',debug=0)
  session_id = connection.apple.xray.juliet.init("-arg1 54 -arg2 23")
  print "Your session ID is %s." % session_id
  message = connection.apple.xray.juliet.monitor(session_id)
  print message

In the code above, the first function is called with a string argument, and the developer has chosen to write that string as a set of keyword-value pairs (arg1=value1, etc). The return from this initiation of a job is a "sessionID", which is a 32-digit hex number. The sessionID is used by the client to monitor the status of the job, and to retrieve output. On the server side, the sesisonID is associated with a "sandbox" area of the file system, where diagnostics and output can be written for retrieval by the client as web content.

Developer Responsibilities

Once the service code is created in the developer's space, the developer should ask the adminstrator of the Nesssi installation (Root) to make a symbolic link to that directory. The developer will need to restart the Nesssi server in order to make it see the code, and a restart is also needed when the service code is changed. Therefore there is a special way for Developers to do this. A system administrator with Root privileges will make you a member of the Nesssi Service Developer Group, a Unix group that all developers join, and members of that group have permission to restart the Nesssi server with sudo clarens-restart.

When the code runs, it sees the certificate of the client, which will be used to decide (a) the account under which the service is to run, and (b) if the size of the request is too much for that type of certificate. If the certificate DN has a match in the gridmap file of the host cluster, then the service will run as this account. You can find strings in the DN to decide if you want to run as a community account or some other type of account.  Note that only the system administrator with Root privileges can add or remove entries in the gridmap file.

When Nesssi initializes the service, a 'sessionID' is created (a 32-digit hex number), together with a directory of the same name, created in a temporary area. This is the so-called 'sandbox', owned by the user account that is running the service, and with the umask of that user for files created in there.

The service can see the request as a string argument to the function. The string should be carefully parsed and checked for buffer overflow exploits or other hacker tricks, it is like installing code in a cgi-bin directory. The service may run Unix commands through the usual os.system() and os.popen() system calls, and can, for example launch jobs into the queues of the cluster. These all run as the account decided above. The return from this initialization service is the sessionID and a URL that points into the sandbox area so that the client can examine its contents.

Subsequent service requests can quote the sessionID in order to monitor the contents of the sandbox. Each file in the sandbox has a URL, and can therefore be retrieved. The contents of the sandbox available to anyone who has the URL; however, that URL contains the 32-character session ID, and since these are not published, it would be very difficult for anyone to find and read Nesssi output without being given the session ID.

When a Developer wants a service to be deployed, they will need to have a symbolic link from the Nesssi main directory, and to be made a member of the Developers group so that they can restart Nesssi during the edit/test cycle of development. At this stage, Developers should state their intentions: what services will be offered, what types of certificates will be accepted, what parameter in the service controls the resources that will be used, and what the policies are with respect to that parameter. Developers should also expose for auditing the code that reads the request arguments and parses them.

Nesssi services are authenticated with graduated security, which means that the appropriate amount of authentication is required for the size of the service request, no more and no less.  For instance, very small requests may be done anonymously, medium-sized requests require a weak certificate issued by an organization other than the grid computing organization.  An example of a weak certificate is one issued by the National Virtual Observatory, or a Hot Grid certificate.  Large service requests require a strong certificate issued by an official grid computing organization.  Examples of strong certificates are X.509 certificates issued by the TeraGrid or the DOE.  A service request may be processed under a different user name, depending on the authentication level of the requestor (anonymous, weak certificate, or strong certificate).

About the Server

The Nesssi service container is built from the Clarens secure web server. Clarens is an open source web services framework that supports a number of authentication mechanisms, including X.509, grid proxy, and Clarens proxy certificates.  A client for a Nesssi service may be a web browser or a stand-alone program in C, C++, Java, or Python. More information on Clarens may be found at http://clarens.sourceforge.net.

Clarens is implemented as a mod_python extension to the Apache web server.  It provides access control by proxy (certificate) and virtual organization management and an API for service developers. If a user of a service has an X.509 certificate, the certificate's distinguished name (DN) is mapped to a local user on the service host.  The job is then executed as that local user.

We assume that Nesssi has been installed and is running. It means that a port is open for the HTTPS protocol -- usually set to port 8443.  Client codes can connect to the Nesssi service container at:

https://<machine>:8443/clarens/

There is also an install directory for Nesssi, and we abbreviate this with the variable $CLARENS.

Important file and directory locations

Some important files and directories for Nesssi services include the following:

$CLARENS: the install directory, has local configuration and links to developer code.
$CLARENS_SANDBOXES: the directory where sandboxes are created and maintained.
$CLARENS_ERROR: the Clarens error log, a good place to look for debugging service code
$CLARENS_RESTART: the location of the executable for restarting the server

Service installation

Nesssi services are written in Python and are linked as $CLARENS/<ServiceName>, where <ServiceName>is the name of the service.  Usually <ServiceName> is a symbolic link that points to an external service source directory, so that any system user can maintain ownership of the source code.  Two files are required in this subdirectory, the service code itself in a file named __init__.py, and an access control file named .clarens_access.  A sample is avaialable as part of the Nesssi distribution.

The service code module, __init__.py

As mentioned above, the service code module should be written in Python and reside in $CLARENS/<ServiceName>/__init__.py, where <ServiceName> is the name of the service.  This file should import the nesssi.py module and should contain a function that implements the service and a dictionary that maps this function name to a service name, which is the name of this service as exposed to the client code.  The function must take 3 arguments, all of which are automatically assigned by Nesssi when it accepts the client-side call and makes the server-side call to this function.  The arguments are:

req: A Nesssi request object
method_name:
The name of the function being called
args:
The user supplied arguments

The service implementation must define the CPU time required by the job, which is used in the graduated security policy (see the API description for run_job() below).  Finally, the service implementation must launch the job with a call to run_job().

Service developers API

The nesssi.py module provides an API that service developers may use to build their services.  The useful methods and fields provided are:

user_name = getSystemUser(req, pbs_time)
Given a Nesssi request object and the estimated time in seconds needed to handle the request, returns the system user that will run the job.

[session_id, sandbox] = getSessionInfo (req, method_name, pbs_time) :
Given a Nesssi request object, a method name, and the processing time in seconds needed to handle the request, returns the session identification string and sandbox directory that have been assigned to the job.

sandbox = getSandbox (session_id) :
Given a session identification string, returns the sandbox directory for the request.

output = getStatus(session_id)
Given a session identification string, returns a string containing status information about the request, including the PBS queue status and the sandbox directory listing.

status = runCmd (req, method_name, call_args, cmd) :
This runs a command in a unix shell on the service host.  The first 2 arguments (req and method_name) are simply the required arguments to the service function, as described above.  The third argument (call_args) is a list of 1 element, the processing time in seconds needed to handle the request. The fourth argument (cmd) is a string containing the command to run with arguments (usually an external script or program that handles the service request).

status = runJob(req, method_name, call_args, cmd) :
This submits the job to the PBS queue for execution.  The first 2 arguments (req and method_name) are simply the required arguments to the service function, as described above.  The third argument (call_args) is a list of 1 or 2 elements:

the processing time in seconds needed to handle the request (required)

the session identification string (optional; if omitted, a session identification string will be assigned by Nesssi)

The fourth argument (cmd) is a string containing the command to run with arguments (usually an external script or program that handles the service request).

status = jobMonitor (req, method_name, args) :
A generic service implementation for job monitoring; writes a status message showing queue status and sandbox contents back to the client.  The first 2 arguments (req and method_name) are simply the required arguments to the service function, as described above.  The third argument (args) is a list with a single element, the session ID for the job to be monitored.

Restarting the Nesssi server

Any change to a service requires restarting the Nesssi server in order for it to take effect.  To restart the Nesssi server, ask the system administrators to add you to the Developers group (called tgcl at Caltech).  Then you may add /usr/local/adm/bin to your path and run:

	% sudo clarens-restart

You should see something like the following:

     OpenPKG: stop: apache2.
     OpenPKG: start: apache2.  

Sample service: sleepyAdd

Below is the source listing for a sample service definition called sleepy_add, which launches a job in the PBS batch queue to sleep for a user specified number of seconds and then add two user specified numbers together.  The dictionary at the bottom of the source listing maps the service function name of sleepy_add to an exposed service name of sleepyAdd, which the client code would call.

# Import Grist service API functions from gridsession.py
     from gridsession import *
     #
     # Function to implement the sleepyAdd service.  This is called by Nesssi      
     # using the 3 arguments given.
     #
     def sleepy_add(req, method_name, args):    
         #
         # Parse user supplied arguments
         #
         tme = args[0]    # sleep time in seconds
         n1 = args[1]             # first number
         n2 = args[2]             # second number
         #
         # Graduated security implementation
         #
         pbs_tme = tme/60 + 1
         call_args = [pbs_tme]
         #
         # Specify the command to be executed.  This is a complicated      way to 
         # sleep and add two numbers on a single line.
         #
         cmd = "date; sleep %s;  date; echo \"%s      %s\" | awk \'{print \"sleepyadd \", $1, \" + \",      $2, \" = \", $1+$2}\'" % (tme, n1, n2) 
         #
         # Ask Nesssi to submit the job to the PBS batch queue.
         #
         retval = run_job(req, method_name, call_args, cmd)
         return retval
     #
     # Method dictionary mapping function names to exposed client
     # service names.
     methods_list={'sleepyAdd' : sleepy_add}   

Client-side call to a Nesssi web service

Required client software

The client software needed for Nesssi web services consists of a recent Python installation and three additional files:

Example:  sleepyAdd service

The service-specific Python client imports the Nesssi client library and uses it to connect to the Nesssi server, as follows:

	import ClarensDpe as Nesssi
	svrObj = Nesssi.client('https://tg-www.cacr.caltech.edu:8443/clarens/', debug=0)

The svrObj object can then be used to make call to the service, with a unique service identification string returned, as follows:

	# sleep for 20 secs, then add 3 and 5
	sessionID = svrObj.sleepyServices.sleepyAdd(20, 3, 5) 

The sessionID string can then be used to monitor that status of the job, as follows:

	statusMsg = SvrObj.sleepyServices.monitor(sessionID)
	print statusMsg

Certificates and proxies

If you have a strong certificate, make sure it is in your ~/.globus directory as two files, usercert.pem and userkey.pem.  If you have a certificate as usercert.p12 only, you can split it into these two files be running the following commands:

openssl pkcs12 -clcerts -in certkey.p12 -nokeys -out ~/.globus/usercert.pem
openssl pkcs12 -nocerts -in certkey.p12 -out ~/.globus/userkey.pem

Run the clarens-proxy-init program once at the start of your session to get a short term Nesssi proxy certificate.  You will be asked for your certificate pass phrase once and will not have to enter it again for the session.  If you don't do this, you will be asked to enter your pass phrase each time the client code connects to Nesssi.