Project

General

Profile

Webserviceshell 2.2 User Documentation

25 Aug 2016

Introduction

The Web Service Shell (WSS) is a web service that can be configured via simple properties files to utilize external resources (either command-line programs or Java classes) to fulfill web service requests.

WSS core features:

  • manage HTTP connection with the client
  • validate request parameters
  • log service requests, state, and errors
  • route one or more endpoint queries to execution of respective command-line programs or Java classes
  • handle, via the servlet container, HTTP authentication

WSS 2.x change overview, see the Issues page for more details:

  • Version 2.x
    • enable queries to multiple endpoints
    • add more flexibility for service naming convention
    • add selected new configuration options, standardize naming convention
    • generalize the ability to proxy additional content
    • add more flexibility in content handling from Java endpoints
  • Version 2.1
    • Add new RabbitMQ usage logging option
  • Version 2.2 to 2.2.3
    • Fix issues related to initial testing and production operations
      WSS is written in Java and is delivered as a deployable web application in the form of a .war file. When the war file is installed into a Tomcat or other servlet container, WSS reads its respective configuration files and is ready to handle client requests. Java programming is not needed to configure and operate a web service using WSS.

WSS Concepts

The primary objectives of WSS is to enable internet access to data. WSS handles the HTTP communication with clients while executing task specific programs to retrieve the desired data, thus, effectively separating the HTTP component of a web service from the data extraction work.

While the WSS can be extended by writing custom Java processor code, WSS is designed to bypass Java coding and, by configuration, use existing data access command-line scripts for data access. Two general use processors are provided with WSS to accomplish this:

  • edu.iris.wss.endpoints.CmdProcessor
  • edu.iris.wss.endpoints.ProxyResource

The CmdProcessor processor executes command-line programs and streams requested data back to a client. The ProxyResource processor returns data as defined by a couple of configuration parameters, one for the URL of a respective data source and another to designate the output media type.

Both CmdProcessor and ProxyResource can be configured multiple times in one WSS installation to implement multiple endpoints and deliver a suite of data products to HTTP clients.

Conventions and Configuration Concepts

A few conventions are needed to deploy a WSS web service in a service container like Tomcat. These conventions control the name of the service, its respective URL as well as enable each WSS service to load its respective configuration information.

Terms

For example, to create a service with a URL of this form:

http://service.iris.edu/fdsnws/event/1/query?minmagnitude=8.5&format=geocsv

we need to refer to specific part of this URL, so we will use the following names on the left, in bold, to refer to respective parts of a URL, in this case:

  • protocol - http
  • host - service.iris.edu
  • service (or application) - fdsnws/event/1
  • endpoint - query
  • client interface (i.e. query) parameters - minmagnitude and format (i.e. each term after the '?' or '&' character and to the left side of the '=' character)
The general concept, for configuring and deploying each WSS service, is to:
  • when deploying the WSS war file, change its name to reflect the desired service name
  • create configuration files whose names match the deployed WSS war file name
  • put the appropriate entries in the respective config files to get the desired endpoint name(s)
  • put the appropriate entries in the respective config file to get the desired client interface parameter(s) names

Note: Some references external to this document define "endpoint" as the full URL up to the '?' character (e.g. http://service.iris.edu/fdsnws/event/1/query). However, in this document, “endpoint” refers only to the portion of the URL between the service ( i.e. "fdsnws/event/1") and the client interface parameters (i.e. “minmagnitude=8.5&format=geocsv”), so in this case, the endpoint is "query".

The host and protocol are usually set for a given site and are not discussed further here.

Service Name

service naming convention -. The service name is derived from the deployed war file name. For example, to get a service name of " fdsnws/event/1", the WSS war file must be renamed to fdsnws#event#1.war, which when deployed, results in the following URL:

http://service.iris.edu/fdsnws/event/1/

where fdsnws/event/1, is determined by replacing any hash characters ‘#’ in the WSS war file name with slash characters ‘/’. Note: using '#" is not required if your naming conventions do not need a service name with '/' in it.

Endpoint Names

endpoints - Endpoints are defined in a service.cfg file. An endpoint name may include slashes or dots. For example, endpoints named "query", "v2/swagger" and "application.wadl" result in these URLs, respectively:

http://service.iris.edu/fdsnws/event/1/query
http://service.iris.edu/fdsnws/event/1/v2/swagger
http://service.iris.edu/fdsnws/event/1/application.wadl

Client Interface Parameters

Interface parameter convention - Client interface parameters for a given endpoint are defined in a param.cfg file (discussed in more detail below). For example, to define the parameters "minmagnitude" and "format" for the "query" endpoint, add the following lines to the param.cfg file:

query.minmagnitude=NUMBER
query.format=TEXT

WSS will then be configured to accept a client request using this URL:

http://service.iris.edu/fdsnws/event/1/query?minmagnitude=8.5&format=geocsv

In summary -
  • A war file name may be chosen as needed to get a desired service name, but respective configuration file names must correspond to that war file name.
  • Endpoint names may be chosen as needed, but the contents of respective WSS configuration files must contain the respective parameters for that endpoint name.

WSS Configuration

Each WSS service uses a respective set of configuration files to define endpoints and interface parameters, as well as manage operation of WSS. The following naming convention is used so that multiple web services can be deployed in the same container.

configuration file name convention - WSS tries to open these configuration files when starting. These file names are derived from the deployed .war file name and are determined by replacing any hash character ‘#’ with a dot character ‘.’ and appending the respective file type (e.g. -service.cfg, -param.cfg, -log4j.properties). For the fdsnws#event#1.war file, the required configuration files are:
  • fdsnws.event.1-service.cfg - contains operational and endpoint parameters
  • fdsnws.event.1-param.cfg - contains client interface parameters for each endpoint
  • fdsnws.event.1-log4j.properties - contains logging configuration

Note: When deployed, WSS will try to startup even when there are errors in the configuration files, for new web services, the respective log files should be reviewed to see that the desired endpoints and client parameters are fully loaded.

Built in endpoints

These endpoints are built in to WSS and are available to a client without configuration. The wssstatus endpoint is only allowed to clients on the local network. It is assumed that the status may contain site sensitive information and there for should not be generally available on the internet.

Endpoint Default Value Comments
/ Internally generated information page Use parameter rootServiceDoc in service.cfg to specify a URL with this service's specific documentation.
wssstatus status information about this WSS and its configuration Blocked to external request, it is only available on the local network.
wssversion current version of Web Service Shell The wssversion value is set in code when WSS is released.
whoami remote address of requestor The address returned may be affected by network topology.
version "notversioned" Use parameter "version" in service.cfg to set a number or other identifier as needed for this service.

Service configuration file (servicename-service.cfg)

The service.cfg file is composed of two types of parameters: global and endpoint parameters. Global parameters apply to the application as a whole while endpoint parameters can be configured multiple times.

Global Parameters

Parameter Name Default Value Description
appName "notnamed" It can be different from the war file name - it can indicate this service or data application name - Note, this value is also used in output to "usage" log files as field "application".
version "notversioned" Indicates the version of this web service. The response to a "version" endpoint request.
corsEnabled true when true, the HTTP response header includes Access-Control-Allow-Origin: *
rootServiceDoc null a URL providing information about the application, documentation on usage, etc., if a value is not provided, a generic html page will be generated.
loggingMethod "LOG4J" choices are LOG4J or JMS or RABBIT_ASYNC.
loggingConfig null a file name or URL referencing an IRIS RabbitMQ configuration file. Only required if loggingMethod is set to RABBIT_ASYNC
sigkillDelay 30 time in seconds before a handler process is sent a SIGKILL, see section "Command-Line Process Time Limits", time starts after handlerTimeout (see Endpoint Parameter table)
singletonClassName null optional - A user provided Java class that will be instantiated once when the service starts. A service can use this class to provide capability that may be needed by individual endpoints in the application.

Setting up Endpoints

The flexibility to set multiple arbitrary endpoints within the service.cfg file is a primary benefit to the 2.x version of the Web Service Shell. An endpoint is defined and configured within the service.cfg file by prepending a string in front of the parameters described in this section.

For example, to specify a "query" endpoint to execute programABC:

query.handlerProgram=/usr/folder1/programABC
query.handlerTimeout=45

plus other endpoint parameters as needed (defined in the following table).

to add another endpoint "answer" which is to execute programDEF, do the following:

answer.handlerProgram=/usr/folder1/programDEF
answer.handlerTimeout=160

etc.

Endpoint Parameters

The term "formatType" refers to the name of a media type that may be provided by an endpoint. Media type is a case insensitive string and one or more may be defined in the formatTypes parameter.

The term "${UTC}" can be used where noted to insert a UTC time string in ISO 8601 format.

The term "${appName}" can be used to insert the appName parameter string.

Parameter Name Default Value Description
endpointClassName edu.iris.wss.endpoints.CmdProcessor a Java class that extends the IrisProcessor class. The class will be instantiated for each request on this endpoint. Two IrisProcessor classes are provided with Web Service shell, edu.iris.wss.endpoints.CmdProcessor and edu.iris.wss.endpoints.ProxyResource. CmdProcessor executes a handlerProgram and ProxyResource provides the capability to deliver the contents specified in the proxyURL parameter.
handlerProgram "nonespecified" a fully qualified executable file name. It must be specified when endpointClassName is set to edu.iris.wss.endpoints.CmdProcessor
handlerTimeout 30 inactivity time in seconds before WSS sends a handlerProgram process a SIGTERM. see section "Command-Line Process Time Limits".
handlerWorkingDirectory /tmp if specified, it must be a valid folder with write access. It is required to start a handlerProgram process and the handlerProgram may use the folder to write content into.
usageLog true when true, log information is written to the usageLogger logger defined in the log4j.properties file, or JMS, or RabbitMQ, depending on the value of loggingMethod
formatTypes BINARY: application/octet-stream a list of (formatType, media type) pairs that are provided by this endpoint. When provided, the first pair in the list becomes the default. The BINARY formatType pair is retained in the list and is alway available for selection. Note: A client selects a media type by using the "format" interface parameter and specifying one of the defined formatTypes.
formatDispositions empty string a list of (formatType, HTTP Content-Disposition value) pairs that will override the Content-Disposition header set by WSS default or addHeader parameter. A Content-Disposition, filename value may have ${UTC} and/or ${appName} appended/inserted so as to avoid name conflicts when files are downloaded by a client.
addHeaders empty string a list of HTTP headers that will be added to the default headers, or override any default header with the same name. ${UTC} and/or ${appName} may be used in the filename value.
postEnabled false when false, POST request are ignored, when true, the POST body is passed to the respective handlerProgram or Java processor endpoint.
logMiniseedExtents false when true, additional Miniseed channel information is collected and written to the usage log, only applies to edu.iris.wss.endpoints.CmdProcessor
use404For204 false when true and a response has no data, return an HTTP code 404 instead of 204.
proxyURL "noproxyURL" when used, it must point to a valid URL. It is required when the endpointClassName is set to edu.iris.wss.endpoints.ProxyResource

Managing HTTP headers for a client response

HTTP headers can be set both by endpoint parameter and by a command-line handler. WSS sets headers in the following order:
  1. WSS default headers
    • When corsEnabled is true, return this header
      access-control-allow-origin: *
    • When formatType is MSEED, MINISEED, or BINARY, return this header
      content-disposition: attachment; filename=service${UTC}.${formatType} (note: no file type for BINARY)
      otherwise, return this header
      content-disposition: inline; filename=service${UTC}.${formatType}
  2. addHeaders - adds new headers, or overrides default headers.
  3. formatDispositions - overrides content-disposition header, but only for the respective formatType.
  4. command-line handler - can add to, or override previous headers - see section "Additional CmdProcessor Features".

Param configuration file (servicename-param.cfg)

The contents of the param.cfg file specify the names and types of client interface parameters that a given endpoint supports. Web Service Shell will perform type checking of parameters from client requests before executing respective handlers or Java classes. This enables a quick response to a client for simple parameter format errors in a request URL.

The data types recognized by Web Service Shell are NUMBER, TEXT, BOOLEAN, DATE, and NONE. If the type "NONE" is used for a parameter type, WSS will return a "No value permitted" exception for that parameter.

Reserved Parameters

A few interface parameters have special meaning in Web Service Shell and can only be used as defined here. The reserved interface parameters are:
  • format
  • aliases
  • username
  • stdin
  • nodata
  • output

WSS uses the "format" parameter to determine the response media type. The value of format must be one of the formatType names defined in formatTypes parameter. By default, the format parameter does not need to be defined and the response will be the default media type. Add the format parameter to allow a client to explictly select media type.

The "aliases" parameter can be used to specify one or more short names for interface parameters on a given endpoint. The aliases parameter itself is not accessible by a client, see a usage example below.

"username" is added by WSS to a handlerProgram argument list when a user has been successfully authenticated.

"stdin" is added by WSS to a handlerProgram argument list when a post request has been made. It indicates to the handler that the handler should read stdin to get the post data.

"nodata" is predefined in WSS and can be used on any query to indicate to WSS to return an HTTP code 404 instead of 204. The accepted values are "204" or "404". When this parameter is used, it overrides the configuration parameter "use404For204"

"output" is a deprecated parameter and should not be used.

param.cfg parameters

Client interface parameter names and types are defined as needed for respective endpoints in the param.cfg file. Each parameter must be prepended with an endpoint name. The endpoint name must be defined in the service.cfg file or the parameter will be ignored.

For example, to specify interface parameters "format", "minlongitude", "maxlongitude" for endpoint "query" enter the following lines in a param.cfg file:

query.format=TEXT
query.minlongitude=NUMBER
query.maxlongitude=NUMBER

The "aliases" parameter can be used to define shorter names for a respective interface parameter name. Each endpoint may have an "aliases" list that relates a defined parameter to one or more alternate parameter names. For example, to define mnlg and minlong as short names for minlongitude; and mxlong for maxlongitude, add the following:

query.aliases = \
minlongitude: (mnlg, minlong), \
maxlongitude: mxlong

Log4j properties file (servicename-log4j.properties)

WSS uses log4j to log two types of messages: operation messages and data usage messages. Use the log4j properties file to specify the name and location of the respective log files. Note: the usage log file is only used when loggingMethod is set to LOG4J, otherwise usage log messages are sent to their respective JMS or RabbitMQ destination.

If a respective log4j properties files is not found when WSS starts, it will try to write log messages to files wss.log and wss_usage.log in the container log folder.

Additional CmdProcessor Features

The Java class, edu.iris.wss.endpoints.CmdProcessor, provided with WSS has additional features that may be useful when developing services.

Command-Line Process Time Limits

Once a command-line process starts, CmdProcessor will wait for handlerTimeout seconds for a process to return data or error messages. Once handlerTimeout is exceeded, CmdProcessor sends the process a SIGTERM. A process may catch the SIGTERM and do cleanup as needed such as close databases, write messages to standard err, etc., and terminate. WSS will wait sigkillDelay seconds for the process to end, if the process does not end, WSS will send a SIGKILL.

CmdProcessor to Command-Line Interface

CmdProcessor sends request parameters to the command-line process's standard input. The interface parameters and values are written to the process in this form:

--interfaceParameter value

If the client request is a POST, a parameter, --STDIN (with no value) along with the POST body is written to the process.

If an error occurs, CmdProcessor checks standard error and includes any information read from standard error in an error response to the client.

Once CmdProcessor detects any data from the process on standard output, it returns an HTTP 200 to the client and hands off the data stream to the web service container, which then streams the data to the client.

Setting HTTP Headers From a Command-Line Process

CmdProcessor has the ability to set response HTTP headers. The CmdProcessor checks for header values when a process starts writing and then returns any headers found to WSS. WSS adds or overrides any previously set headers before returning an HTTP response, see section "Managing HTTP headers for a client response".

The following restrictions apply:

  • A command-line process that needs to set headers must write the header information before writing any other output
  • The header information must be text in the following form:
    • starts with "HTTP_HEADERS_START"
    • followed by each header in regular HTTP format, terminated with linefeed or carriage return linefeed
    • ends with "HTTP_HEADERS_END"
  • The marker strings must not be followed by linefeed or carriage return linefeed.

For example, to provide headers "Content-Disposition", "Access-Control-Allow-Origin", and "Test-Header", the output looks like this (\n indicates linefeed):

HTTP_HEADERS_STARTContent-Disposition : inline\nAccess-Control-Allow-Origin : http://host.example\nTest-Header : value-for-test-hdr\nHTTP_HEADERS_END

Environmental Information for Command-Line Processes

When CmdProcessor starts a command-line process, the following environmental values are set.

REQUESTURL - the client request URL
USERAGENT - value from request HTTP header User-Agent
IPADDRESS - client IP, may be affected by network topology
APPNAME - value from appName configuration parameter
VERSION - value from version configuration parameter
CLIENTNAME - dummy string, not currently in use
HOSTNAME - host name of WSS server
If an authenticated user is present, then also AUTHENTICATEDUSERNAME - authenticated user name

Changes from version 1.x

Web Service Shell has been upgraded to:

  • enable more flexibility in naming a WSS application
  • support more than one user defined endpoint per WSS application
  • generalize proxying capability
  • enable setting HTTP headers from a command-line process
  • add detailed error description as needed

Additionally, unused parameters have been removed and a few parameter names have been changed.

Converting an existing Web Service Shell application to 2.x

The general steps needed to change a Web Service Shell 1.x configuration to 2.x are:

  1. service.cfg file
    • change web service part of config file name as needed
    • create global parameters
    • create endpoint for "query"
    • create endpoints for "application.wadl" or "v2/swagger" as needed and set respective proxyURL
  2. param.cfg file
    • change web service part of config file name as needed
    • change parameter names by prepending endpoint name
  3. log4j.properties
    • change web service part of config file name as needed
    • change output log file names as needed

Changes for endpoints provided by Web Service Shell

Previous Endpoint New Endpoint Comment
/ same as 1.x
status wssstatus endpoint rename
application.wadl builtin implementation removed A general solution is now provided. Add an endpoint configuration with endpointClassName set to edu.iris.wss.endpoints.ProxyResource, add proxyURL and set the proxy content, and add formatTypes and set the default media type.
wssversion same as 1.x
whoami same as 1.x
version same as 1.x
catalogs removed Removed from WSS code and will be re-implemented as endpoint in Event service
contributors removed same as catalogs
counts removed same as catalogs

service.cfg related changes - global parameters

Previous Parameter New Parameter Comment
appName same as 1.x
version same as 1.x
corsEnabled same as 1.x
rootServiceDoc same as 1.x
loggingMethod same as 1.x with addition of RABBIT_ASYNC option
- loggingConfig new with 2.1, only applies to loggingMethod RABBIT_ASYNC
sigkillDelay same as 1.x
singletonClassName same as 1.x
rootServicePath removed not needed
jndiUrl removed not needed
connectionFactory removed not needed
topicDestination removed not needed
argPreProcessorClassName removed not needed

service.cfg related changes - endpoint parameters

Previous Parameter New Parameter Comment
streamingOutputClassName endpointClassName parameter renamed
handlerProgram same as 1.x
handlerTimeout same as 1.x
handlerWorkingDirectory same as 1.x
usageLog same as 1.x
outputTypes formatTypes parameter renamed
- formatDispositions new parameter
- addHeaders new parameter
postEnabled same as 1.x
- logMiniseedExtents new parameter - extra processing and usage logging for Miniseed channel information is no longer performed by CmdProcessor unless this parameter is added
use404For204 same as 1.x
- proxyURL new parameter, must be used when endpointClassName is set to edu.iris.wss.endpoints.ProxyResource

param.cfg configuration changes

Configuring interface parameters has not changed except to add an endpoint name. Each parameter must now be prepended with an endpoint name. The endpoint name must be defined in the service.cfg file or the parameter will be ignored.

For example, to change parameters for endpoint "query":

original form is now
format=TEXT query.format=TEXT
minlongitude=NUMBER query.minlongitude=NUMBER
maxlongitude=NUMBER query.maxlongitude=NUMBER

The "aliases" parameter is handled like other parameters, i.e. prepend the endpoint name in front of "aliases".

answer.aliases = \
minlongitude: (mnlg, minlong), \
maxlongitude: mxlong

Additional endpoints are specified in the same way, for example, to add parameters for an endpoint "answer":

answer.format=TEXT
answer.minlongitude=NUMBER
answer.maxlongitude=NUMBER

Parameter File Examples

service.cfg

# Service Documentation and context path baselines
# ---------------- globals
appName=my-service-app-name
version=1.0.0

# CORS is enabled by default, set to false to disable CORS processing
##corsEnabled=false

# a URL providing information about the application, documentation on usage, etc.
#rootServiceDoc=http://service/fdsnws/dataselect/docs/1/root/

# LOG4J or JMS or RABBIT_ASYNC
loggingMethod=LOG4J
##loggingConfig=local_file_system or URL for  rabbitconfig-publisher.properties file

# time is seconds
sigkillDelay=30

# -------- endpoint --------
query.endpointClassName=edu.iris.wss.endpoints.CmdProcessor
query.handlerProgram=/user/home/tomcat/bin/extractdata-handler

# time in seconds for command line processes
query.handlerTimeout=300

query.handlerWorkingDirectory=/tmp

# usageLog is true by default, set this to false to disable usage logging
#query.usageLog=false

# formatTypes - specifies a list of "formatType: mediaType" pairs
query.formatTypes = \
    mseed:application/vnd.fdsn.mseed, \
    miniseed:application/vnd.fdsn.mseed, \
    text: text/plain,\
    csv: text/csv,\
    json: application/json, \
    xml: application/xml

# Content-Disposition overrides for respective media types "text" and "miniseed" 
query.formatDispositions= \
  text: inline; filename="mypart_${appName}_${UTC}.txt", \
  miniseed: attachment; filename="a_miniseed_file.mseed" 

# Disable or remove this to disable POST processing
query.postEnabled=true

# false by default, enables additional miniseed processing and logging
##query.logMiniseedExtents=true

# Enable this to return HTTP 404 in lieu of 204, NO CONTENT
query.use404For204=true

# -------- endpoint --------

application.wadl.endpointClassName=edu.iris.wss.endpoints.ProxyResource

# required when endpointClassName is set to edu.iris.wss.endpoints.ProxyResource
application.wadl.proxyURL=http://service/fdsnws/dataselect/docs/1/wadl
application.wadl.formatTypes = \
    xml: application/xml

param.cfg

query.format=TEXT
query.type=TEXT

query.minlongitude=NUMBER
query.maxlongitude=NUMBER
query.minlatitude=NUMBER
query.maxlatitude=NUMBER

log4j.properties

log4j.rootLogger=INFO, ShellAppender

log4j.appender.ShellAppender=org.apache.log4j.RollingFileAppender
log4j.appender.ShellAppender.File=${catalina.base}/logs/my_app_1.log
log4j.appender.ShellAppender.MaxFileSize=10MB
log4j.appender.ShellAppender.MaxBackupIndex=5
log4j.appender.ShellAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.ShellAppender.layout.ConversionPattern=%d %-5p [%t]: %c{1} %x - %m%n

log4j.category.UsageLogger=INFO, UsageAppender
log4j.additivity.UsageLogger=false

log4j.appender.UsageAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.UsageAppender.File=${catalina.base}/logs/my_app_usage_1.log
log4j.appender.UsageAppender.DatePattern='_'yyyy-MM-dd
log4j.appender.UsageAppender.layout=edu.iris.wss.utils.WssLog4JLayout
log4j.appender.UsageAppender.layout.ConversionPattern=%m%n

rabbitconfig-publisher.properties

# Host that's the broker. This will normally be the load balancer
broker=broker1,broker2

# The virtual host within the broker
virtualhost=test

# Internal buffer size for the async publishers
buffersize=10000

# Persistent or not
default_persistence=true

# The exchange name that recieves these messages
exchange=ws_logging

# Credentials
user=hellorabbit
password=hellopw

# Probably never normnally reconnect
reconnect_interval=-1

# How often to wait between failed connection attempts in msec
retry_interval=4000