Copyright © Snyders.US. All rights reserved.
Just because you can use web services, doesn't mean you should!
When analyzing scalability and performance in SOA, it is important to understand SOAP encoding styles.
In the Java world, there are three basic choices:
SOAP RPC is simple for the developer, and is the default for most of the Java application servers. With SOAP RPC, you make a call to a remote object, passing along any necessary parameters. The SOAP stack serializes the parameters into XML, moves the data to the destination using transports such as HTTP and SMTP, receives the response, deserializes the response back into objects, and returns the results to the calling method. SOAP RPC handles all the encoding and decoding, even for very complex data types, and binds to the remote object automatically.
RPC-literal encoding can be used when the data is already in XML format. SOAP RPC also allows literal encoding of the XML data as a single field that is serialized and sent to the Web service host. There is only one parameter--the XML tree, so the SOAP stack only needs to serialize one value. The SOAP stack still deals with the transport issues to get the request to the remote object. The stack binds the request to the remote object and handles the response.
SOAP document-style encoding sends an entire XML document to a server without even requiring a return value. The message can contain any sort of XML data that is appropriate to the remote service. In SOAP document-style encoding, the developer handles everything, including determining the transport (HTTP, MQ, SMTP), marshaling and unmarshaling the body of the SOAP envelope, and parsing the XML in the request and response to find the needed data.
As the graph below indicates, the size of the payload matters in SOA. Moreover, the selection of the proper SOAP encoding style can mitigate performance issues with large XML payloads.
Everyone is in love with XML and it makes sense in an HTTP world (as in, hyper TEXT transfer protocol). But there is no gatekeeper for XML and it looks like everyone will write their own XML schema. As these schemas evolve it becomes a dependency to maintain the code that parses the documents. XML is verbose, XML API's can be difficult to use efficiently, and there is no common industry standard for compression.
But XML has many advantages in an SOA:
1) XML offers self-describing data;
2) XML is well suited for versioning and domain-specific namespaces.
XML is not a language for semantics. XML is messy, and there are as many ways to represent elements as there are developers. For example, to document order history, one can create one top-level <orders> tag and append each <order> instance under that--or create an <order> tag at the top level for each respective order. This is just one example of the plethora of XML techniques. To prevent SOA performance and scalability problems, the XML payload size is one of the most significant architectural parameters. The image here illustrates what is considered a medium-large XML payload.
Service Oriented Architectures have proven to be the business platform of the future. But there can be a dark side to SOA: poor performance and scalability issues. This article describes two approaches to prevent SOA performance-scalability problems: mid-tier service caching, and a selection of the optimal XML parser. In a back-to-basics approach, design heuristics for distributed systems are also reviewed.
We are constantly bombarded with hyperbole about SOA and how it will transform everything into gold. Of course, there's no panacea:
And there's a dark side to SOA: poor performance and scalability problems. Everyone wants a service-oriented platform with the goal of business modularity. But transforming the legacy of tightly-coupled software components into XML messaging is not without pitfalls. The challenge is to break that tightly-coupled, fast approach into loosely-coupled XML without negatively affecting response-time and service level agreements
John R. Snyder
Originally Published in 2007
SOAP encoding styles tell a SOAP stack how to serialize XML data into a stream of characters. SOAP was intended to be an RPC protocol and it was not designed to be used with large documents. SOAP was envisioned to be used with simple method calls with few parameters. Document style encoding came into the SOAP specification as an alternative to RPC, when architects realized they needed a better approach for huge payloads. For example, sending the medical history of a patient might require 10MB of data on 1200 elements.
The use of document style encoding requires the implementer to handle the details of parsing the XML. In situations where the payload is large--this is most likely the best approach. But in specialized domains, the implementer will have the best knowledge of the schema, and will be able to parse the XML tree more efficiently than a generalized, generic parser. Empirical data tells us that when the payloads will be greater than 10KB, use SOAP document style encoding.
Use a parser appropriate to the payload
Software developers have a wide selection of XML parsers:
1) Streaming API for XML (StAX)
2) XML binding compiler
3) Java Architecture for XML Binding (JAXB and JIBX)
4) Document Object Model (DOM) techniques.
Frank Cohen, in his book "FastSOA", offers these common-sense recommendations:
1) For moderate to large XML documents (typically 10KB to 5MB), an XML binding compiler such as JAXB is the best performer. This is due to the direct element access.
2) For XML that contain many elements but are flat (not more than 2-3 levels deep), the Streaming XML (StAX) approach works best.
3) With medium-sized XML documents that have complexity, use a parser with the DOM approach (evaluate every element).
Frank Cohen has an interesting analogy to the 3-approaches listed above (respectively):
A) TV dinner: as the elements come all at once but the parser goes to compartments (tags) to get the data.
B) Sushi bar: as the elements come in a stream, and it’s up to the parser to select the tags it wants before they pass.
C) Buffet: visiting all of the elements.
Implementing SOA on the legacy of Web applications
The domain model pattern is from the Chris Richardson book "POJOs in Action". Only small (less than 5KB), flat, stable XML content can achieve decent performance in this approach. While it is easy to retain the architecture left-over from the browser application, it may not be the most sagacious choice.
Performance problems will appear when message size and complexity grow, due to:
1) XML-Java mapping requires additional processing time;
2) Every request creates a new service call (no caching);
3) EJB-XML mapping and transformation require complex coding and additional processing time;
4) Calls into the database require object-relational mapping--and additional processing time.
XML technologies offer service acceleration
Caching is a recognized technique for performance enhancement. Everyone uses caching--nothing new. What is new is the use of native XML database technology and XQuery to provide services without the overhead of XML-object translations.
XML technologies in the mid-tier provides performance-scalability advantages:
1) An XML service database holds cached message payloads, for example, using a time to live value for frequent requests;
2) An XML policy database contains business logic and policies that makes decision on where to route requests for efficiency (workflow based on a taxonomy);
3) A direct-view, aggregate-view, archive-view database in XML format.
XML databases and XQuery offer:
1) 100% native XML environment;
2) Avoid object-relational-XML mapping;
3) Reduce the need for expansive application servers and network bandwidth
4) Automatically adapt to XML schema change, reducing software maintenance costs.
Localize Related Concerns
It is an undeniable fact that invoking a method on an object in a different process (on another machine) is hundreds of times slower than invoking a call in process (on the same machine). Therefore, components that frequently interact should be kept as close together as possible.
Don’t overlook the obvious. It looks like the free lunch of ever-increasing hardware performance gains is over. Software developers need to be cognizant of performance issues again.
Use Chunky Instead of Chatty Interfaces
One of the philosophies of object-oriented programming is to create objects that are atomic and single purpose. However, in a distributed computing architecture, this approach can be inefficient and even disastrous to application performance.
An example will help to illustrate the concept. The following is an illustration of a chatty interface, using an approach that would be appropriate for a client that is in process:
Public Property FirstName() As String
Public Property LastName() As String
Public Property Email() As String
…and so on for each element
Public Sub Create()
Public Sub Save()
Here is an example of that same class, re-designed for good performance in a distributed system:
Public Sub Create(ByVal FirstName as String, ByVal LastName as String, ByVal Email as String, …and so on…)
Public Sub Save(ByVal FirstName as String, ByVal LastName as String, ByVal Email as String, …and so on…)
Admittedly, the second approach is not very elegant and flies in the face of object-oriented teachings. However, the design will minimize expensive out of process method calls, and is therefore, chunky.
Just because you can use Web services, doesn't mean you should.
SOAP provides a standard XML-based message format to represent a method call and its response. SOAP was originally designed to be protocol neutral, however, Web services, by definition, must exclusively use HTTP as the transport protocol. For example, if a Web service is constructed only of parameters with simple types, it is possible to invoke them with simple HTTP GET and POST verbs instead of SOAP. The SOAP specification allows the serialization of complex types. Web services are a good choice when the requirements call for a wide variety of computing platforms, or in a situation where the data must travel through remote firewalls.
In some situations, HTTP may not be the best transport protocol. For example, unlike the text-based HTTP Web service, .NET remoting can manage both synchronous and asynchronous RPC conversations across application domains using a binary payload over TCP. This brings benefits of security and performance. In addition, .NET remoting uses serialization mechanisms that maintain type fidelity, whereas Web services use a serialization approach that maintains XML schema conformity.
A .NET channel is a remoting framework abstraction than hides the complexities of the underlying wire protocol. The channels provided in the .NET framework are the HTTP channel (represented by the HttpChannel class) and the TCP channel (represented by the TcpChannel class). Each channel contains a formatter object that serializes the method call into a payload appropriate for the respective network protocol.
The TCP channel uses the binary formatter by default to convert method calls into a proprietary binary format. The HTTP channel uses the SOAP formatter by default. Interestingly, the HTTP channel can also use the binary formatter.
The TCP channel/binary formatter combination is faster than the HTTP channel/binary formatter combination. In terms of raw performance, the .NET remoting plumbing provides the fastest communication when you use the TCP channel and the binary formatter.
Similarly, it may be advantageous to retain C++ or other sockets based components due to inexorable performance advantages.
The takeaway is to evaluate each situation with an engineering perspective and select the best overall solution architecture.