April 15, 2003
Abstract:
Web services involve one machine calling a function on another machine through the internet. Although there is nothing new about it, there is now a standard for exchanging data. That standard is called SOAP: Simple Object Access Protocol. Software is now emerging that implements the SOAP standard, both on the client and on the server side. The objective of that software is to make it easy for programmers to create SOAP applications. It is not hard to write your own code that implements a SOAP application. By using some readily available open source tools, it is actually quite simple to make a SOAP server. In this paper, I will show a SOAP server that creates an actual function call in an interpreted language by using an XSLT transformation. The programming language used to make the SOAP server is PHP, because of its availability in an Apache web server but any other interpreted language like Perl may be used as well. The remote procedure call that arrives as a SOAP message from the client is transformed into a piece of PHP code by using an XSLT style sheet. The generated piece of code is subsequently executed by the PHP interpreter.
The concept of network services has been around for a few decades. Ever since Sun came up with the implementation of RPC (Remote Procedure Call) one machine has been able to call a function on another machine across a network. Although the idea is simple enough, actually using a remote procedure or serving one across the internet has never been a trivial job. Certainly when different platforms or programming languages are involved, having one machine call a function on another machine invariably creates more problems than it is worth. This situation may have improved a bit with the use of Java or CORBA but still did not seem to provide the ultimate solution.
The latest step in the right direction is a standard based on XML which allows different machines to exchange information in a platform independent manner. That standard is called SOAP: Simple Object Access Protocol. To call a function on another computer, a programmer puts the name of the function, along with its parameters in a prescribed XML message and sends that message to the destination. The destination must of course understand the message and must be prepared to execute the requested function. If all works out, the programmer receives a reply, also in an XML message, from the remotely executed function. Any protocol can be used to deliver the SOAP messages but the most popular are HTTP and SMTP.
The web service server, i.e. the computer that performs its little trick
on behalf of other computers on the internet, needs to dissect the SOAP
message to determine what function to execute and what the parameters are.
Once this information is extracted, the function can be called with the
appropriate parameters.
The result of this operation is repacked into a SOAP message that is returned
as a reply to the calling client.
As you can see on, for example, http://www.xmethods.net/
, there
is a long list of web services that are available and the list is growing
at a steady pace.
As stated before, SOAP is a standard that uses XML as its base syntax. The next section will therefore briefly explain the basic syntactical elements of XML. And, since we're talking about XML anyway, we also will discuss XSLT at an elementary level. XSLT (extensible Stylesheet Language Transformations) is another standard based upon XML that this paper leans on. We will get on target in section 3, in which I will briefly explain what a SOAP message looks like. In section 4 we will shift our attention to actually creating web services. In this section, I will discuss a few of the existing web service programming environments. Certainly those in the open source deserve our attention. Apart from those existing projects, a different way to provide a web service is presented in section 5. In this section we will go into the details of building a SOAP based web service by generating a piece of script that is subsequently executed. And, of course, any respectable paper needs a few conclusions and recommendations for future work, so I reserved the final section for those.
The syntax of XML looks a bit like that of HTML, although the syntactical rules are more strict. An XML document is a collection of self-describing data, organized in a tree structure. The tree is built with elements which may contain other elements, attributes and textual content. An element is any piece of the document that is enclosed in corresponding tags. Just like in HTML there is an open tag and the close tag to accompany it. Unlike HTML, every element must have a close tag. Here is a small example:
<?xml version='1.0'?> <root> <element attribute='example'> Some content... </element> <element> Content of the second element. </element> </root>As you can see from the example above, an XML document always starts with the
<?xml ...?>
processing instruction and an XML document
has exactly one root element.
Elements can also have attributes, which are put inside the open tag.
The example clearly shows the syntax of an attribute.
The value of an attribute must always be present and must be enclosed in quotes.
The names of the tags in XML do not have any meaning until the designer
of the XML application gives them a meaning.
For example, in HTML, the <H1>
tag has the meaning
of 'level 1 heading' and is usually rendered in big bold letters.
If you would write the same <H1>
tag in XML, it may
very well mean the chemical formula for a single hydrogen atom or whatever
is relevant to your application.
In XML, everyone can create their own tags.
This will lead to problems when XML applications from different designers
are combined.
After all, chances are that tags with the same name but entirely different
semantics will be used, leading to a clash in these names.
To prevent name clashes when XML documents from different sources
are combined, the W3C invented the concept of namespaces.
Each tag and attribute name can be prepended by a namespace prefix.
The namespace prefix and the tag or attribute name are separated with a
colon (:).
The namespace prefix must be declared to refer to a globally unique
namespace name, by using the reserved attribute xmlns
.
The XML example below shows the previous example, extended with
namespace declarations:
<?xml version='1.0'?> <root xmlns:first='http://www.andromeda.nl/' xmlns:second='http://some.unique.url/'> <first:element first:attribute='example'> Some content... </element> <second:element> Content of the second element. </element> </root>By using namespaces, the uniqueness of the tag and attribute names is guaranteed.
An XML document can be turned into just about anything by running it through an XSLT (eXtensible Stylesheet Language Transformation) processor. An XSLT processor takes two inputs: the XML document to transform and the style sheet that specifies the transformation. The output will be whatever the style sheet defines. In the style sheet, patterns (actually XPath expressions) are used to match specific parts of the source XML document and replace that part with something else in the output. The following example shows how you might replace the element element from the previous example with the text Replaced text:
<xsl:template match='element'> Replaced text </xsl:template>The style sheet declares what output should be produced when a pattern in the XML document is matched. The XSLT transformation process is rule based with the templates being the basis of the rule system. In that sense, XSLT is a bit like awk, in which the input is matched against a set of rules and the actions occur when a rule 'fires'. Inside an xsl:template element, several constructs are available to control the transformation process. For example, the xsl:apply-templates element instructs the XSLT processor to scan the child elements of the matched node and look for matches of those elements. Suppose, you have a simple XML document with book as the root element and heading elements to separate your chapters. The following XSLT templates will transform that structure into HTML:
<xsl:template match="book"> <html> <head> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="heading"> <h2><xsl:apply-templates/></h2> </xsl:template>There is a lot more to tell about XSLT but that would go beyond the scope of this paper. For a complete coverage of XSLT, see [5].
SOAP (Simple Object Access Protocol) is a communication protocol for
exchanging information in a platform independent way.
Most web services available on the internet work by exchanging messages
in the SOAP format, usually through the HTTP protocol.
A client application uses the POST method to send a SOAP message to the
server application and receives a SOAP response message.
The SOAP specification (see [6]) describes a SOAP message as having
three elements.
Figure 1 shows the structure of a SOAP message.
The root element is the Envelope
, which contains a
Header
and a Body
element.
The Header
element is optional, while the Envelope
and Body
elements are mandatory.
The SOAP elements must be in the namespace declared with the namespace
name http://schemas.xmlsoap.org/soap/envelope/
, as shown
in the example below:
<SOAP:Envelope xmlns:SOAP='http://schemas.xmlsoap.org/soap/envelope/'> <SOAP:Header> <SOAP:Header> <SOAP:Body> <!-- The payload of the message comes here --> </SOAP:Body> </SOAP:Envelope>The
Header
element can be used to provide additional
information about the message, such as authentication or transaction
information.
The part we are interested in is the Body
element, which
must contain the name of the function or procedure the client wishes
to call, followed by the arguments passed to that function.
Here is a simplified example of a SOAP message that requests the currency
exchange rate from US dollars into euros:
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP:Body> <ns1:getRate xmlns:ns1="urn:xmethods-CurrencyExchange"> <country1>usa</country1> <country2>euro</country2> </ns1:getRate> </SOAP:Body> </SOAP:Envelope>This is a web service that is actually available on
services.xmethods.net
.
The function we're trying to call here is:
getRate("usa", "euro");The two arguments to this function are
country1 = "usa"
and country2 = "euro"
.
The name of the function is the immediate child element of the SOAP's
Body
element: ns1:getRate
.
Note the namespace prefix which is subsequently declared with the
xmlns:ns1
attribute.
The arguments to the function are the child elements of the function
element.
The name of the element is the name of the parameter and the value
of the parameter is in the content of the parameter's element.
Several tools, mainly libraries of classes and functions, are available to create SOAP based client or server applications. Most of these SOAP development platforms provide support for Java and C++, although libraries are also available for other programming languages, such as Perl, Tcl, Python and PHP.
The most well known SOAP project is probably DotGNU
(http://dotgnu.org/
).
DotGNU is a Free Software project to create a platform for web services
in a wide variety of programming languages.
The core component of DotGNU is DGEE, the DotGNU Execution
Environment, which provides the basic functionality for accepting web
service requests.
The Portable.NET is another project under the DotGNU
umbrella which provides a suite of web services software tools.
Another prominent platform for web services is Apache AXIS,
the successor for the Apache SOAP project.
Both the client side and the server side of the web services functions
are entirely written in Java and you will need a Java application
server such as Tomcat to run a web service created
with AXIS.
Apache AXIS is available on http://ws.apache.org/axis/
.
Running a quick search on freshmeat or sourceforge will reveal a few more projects related to SOAP. To name just a few:
http://gsoap2.sourceforge.net/
. A cross-platform
set of generator tools to create web service applications in C and C++.
http://easysoap.sourceforge.net/
. A light weight
client library for SOAP applications.
http://dietrich.ganx4.com/nusoap/
. Provides server
and client SOAP applications in PHP.
From the previous sections, you may have gathered that SOAP messages are relatively simple pieces of XML text. Then, referring back to section 3, XML can be turned into any other kind of text by transforming the XML data through an XSLT style sheet. And since a program's source code is just another piece of text, it should be rather trivial to transform the SOAP message into a program or a program fragment. As you will see in this section, this is actually the case. All you need to build a web service with this recipe is three packages of readily available open source software. Here are the ingredients:
The key ingredient here is the XSLT processor.
Without an XSLT processor, we wouldn't be able to transform the SOAP
message into anything, let alone a program's source code.
The example presented here uses the one packaged with the XSLT C library
for Gnome from http://xmlsoft.org/
,
but any other XSLT processor, like Xalan from http://xml.apache.org/
will do just fine as well.
The second part is an interpreter for some programming language.
Using an interpreted language allows us to generate a program and
execute the program (or program fragment) all in one go.
Nearly all popular scripting languages are capable of doing this job, so
we could use Perl, Python, Tcl or even an ordinary shell script.
In this example, I use PHP because it integrates nicely into the
Apache web server.
Strictly speaking, the web server is not necessary, but it the easiest way
to make a web service available on the internet and HTTP is the protocol which
is mostly used by web service clients.
When a web services client wishes to invoke a function on a
server, the client will pack the name of the function and its
parameters into a SOAP message.
The client then sends the SOAP message to the server by using
the POST method of the HTTP protocol.
Fortunately, all the mucking about with the HTTP protocol is handled
quite nicely by the web server and the PHP module.
All the web service PHP script needs to do is access the data in
the global PHP variable $HTTP_RAW_POST_DATA
.
This is where PHP will store the POSTed data if the data did
not come from an HTML form (i.e. the data is not URI encoded).
If you would implement a web service with a program that is invoked
through CGI, you would read the SOAP message from standard input (stdin).
Now that we have our SOAP message in a string variable in our script, we need to transform the content of this string into a piece of script. There are three steps to accomplish this task:
The objective of the style sheet is to extract the name of the
requested function and to construct a list of arguments to pass
to that function.
The name of the function is easy to find.
It is the name of the element which is the immediate child node
of the Body
node in the SOAP message.
So, we match a template on the Body's children (of which there is
only one) and call another template to extract the list of
arguments:
<xsl:template match="SOAP-ENV:Body/*"> <xsl:call-template name='method'/> </xsl:template>
The called template outputs the name of the function and creates the list of arguments. Generating the argument list is a slightly more complicated operation:
<xsl:template name='method'> <xsl:value-of select='local-name()'/> ( <xsl:for-each select='node()'> '<xsl:apply-templates/>' <xsl:if test='position()!=last()'>,</xsl:if> </xsl:for-each> ); </xsl:template>Note that the context of this template is the immediate child of the
Body
node. The name of this node is the name of the function
requested by the client.
The output of the template starts with this name through the
xsl:value-of
construct.
The hard part in this template is to create a comma-separated list of
the argument's values.
The values we want are in the textual content of the child nodes.
A xsl:for-each
construct is used to scan this list of
children and output the content of each child.
Printing a node's content to the output is the default action of the
xsl:apply-templates
statement, so we don't need to create
a special template for that.
The comma that separates each argument from the next one is created with
the xsl:if
statement on the next line.
The xsl:if
is needed to prevent the transformation from creating
a comma after the last argument.
The next step is to apply this style sheet to the SOAP message by invoking
an XSLT processor.
Recent versions of PHP include classes and functions to do just that.
However, XSLT support in PHP in still in an experimental stage and
most distributions, such as Red Hat Linux, leave it out of the PHP module.
To perform the transformation in a PHP script, we either have to compile
our own PHP module or spawn a separate process for an external XSLT processor.
The latter option is probably the easiest solution until the XSLT
classes are included in the mainstream distributions.
The script shown below shows how the SOAP message is stored in a temporary
file before invoking xsltproc
, the XSLT processor from
the Gnome XSLT library:
// Create a temporary file and store the SOAP message. $xmlfile = tempnam("/tmp", "PHP-SOAP"); $fd = fopen($xmlfile, "w"); fwrite($fd, $HTTP_RAW_POST_DATA); fclose($fd); $xslfile = 'soap2php.xsl'; // Invoke the external XSLT processor and capture its output. exec('/usr/bin/xsltproc ' . $xslfile . " " . $xmlfile, $output);The XSLT transformation generates the PHP function call statement in a few lines of output. Here is an example:
getthedate ( 'l d M Y' , '1' );This output is captured in an array of strings, one array element for each line.
One of the nifty features of PHP (and most other scripting languages
as well) is its ability to execute a piece of script from a text string
inside the script.
So, we crunch the $output
array into a single string
and pass that string to the eval()
function, as shown below:
$output = implode("", $output); $result = eval("return " . $output);The return value of
eval
is whatever we return from our
SOAP function.
The web service client expects this result wrapped inside a proper SOAP
message.
The simplest way to accomplish this is to have the called function
create the whole SOAP message and return it as a string value.
The final phase of the SOAP server generates the proper output by
making an XML header followed by the SOAP message itself:
Header("HTTP/1.0 ". "200 ok"); Header("Content-Type: text/xml; charset=\"utf-8\""); echo $result;The full code for this example is available on
http://www.andromeda.nl/WebEngineering/soap-xslt.tar.bz2
In contrast to some other protocols that start with an 'S', the 'S' in SOAP actually does stand for 'Simple'. Once you discover how simple SOAP is, you may conclude that it is not so hard to use web services. A moderately skilled programmer can quickly build a SOAP application, either as a server or as a client. The example in the previous section shows that an XSLT style sheet with just a few lines of code can create a program fragment from a SOAP message. Theoretically, this style sheet can generate any PHP function call with any list of arguments. However, this example is a bit simplified and suffers a few drawbacks. Certainly, the flexibility and robustness needs a bit of work.
The first problem you may want to solve before using this method in a
real-life application is the order of the arguments.
PHP requires a fixed order of arguments to functions, while in SOAP
they are sorted out by their names.
If the SOAP message would pass the the arguments in a different order,
the function will probably not work as expected.
As a second problem, the response to the client which contains the return
value of the function must be another SOAP message.
This requires the function that provides the actual web service to return
a complete SOAP message in a string.
You may have noticed, there are not many 'standard' functions that return
a SOAP message as their return value, so this limits the availability
of functions that can act as web services to a few specially crafted ones.
On the other hand, this may be a good thing.
After all, you wouldn't want a client application to request the execution
of a "exec('rm -rf *')
" !
This leads to the most important issue that is yet to be resolved. The one of detecting and properly handling anomalous situations. Before actually executing a function on behalf of a web service client, we have to ask ourselves if we want to make this function available to that client. Is this particular client authorized to have the function executed ? Do we have the correct number of arguments and are the arguments passed with their proper names ? Do the arguments contain sensible information ? Finally, how do we react if we discover a wrong answer to any of these questions ?
To start with the last question, the SOAP specification provides a means to respond to erroneous situations. Instead of a response from the function the SOAP reply would contain the Fault element which states the nature of the error. Discovering the error should start by validating the contents of the SOAP message sent by the client, for example with an XML Schema validation. After validating the message itself, the content must still be checked against application specific criteria. Before making a web service available on the internet, you would at least have to add a validation and a fault response to the web service scripts discussed in this paper.
[1] Mark Birbeck e.a. Professional XML, 2nd Edition Wrox Press, 2001
[2] Michael Kay XSLT 2nd Edition Programmer's Reference Wrox Press, 2001
[3] XML Base W3C Recommendation http://www.w3.org/TR/xmlbase/
[4] Namespaces in XML http://www.w3.org/TR/REC-xml-names/
[5] XSL Transformations (XSLT) Version 1.0 W3C Recommendation
http://www.w3.org/TR/xslt
[6] SOAP Version 1.2 W3C Candidate Recommendation
http://www.w3.org/TR/soap12-part0/