One of the frequent concerns about deploying any payment solution is “will it be able to process my transactions in a timely manner?” This is both an easy and hard question to answer. In some instances, a bad application design can lead to poor performance. In others, it is faulty system integration of one or more of the other components causing performance bottlenecks. Generally, there are several major components in the processing of a transaction that can significantly affect throughput and response time performance.
These major components are the network, the server hardware, the encryption hardware, the system software, and the application software. How well these pieces are integrated will always have a major impact on the overall performance of any given payment solution. For example, if a throughput rate of 25 transactions per second (TPS) is required and the hardware encryption device selected is only capable of 12 encryptions or decryptions per second, then the encryption device will be the bottleneck and no amount of software tuning can or will improve the throughput above 12 TPS. Unless the throughput of the encryption device is known, then performance degradations may manifest themselves as a software performance issue rather than a system integration issue.
A payment processing software provider, unless they dictate specific requirements for some or all system components, has limited control over the effect on performance of all but the application software. Software providers will normally make recommendations for these components but cannot predicate the performance of their software products on these recommendations being followed. Most payment applications, if they measure performance, do so from time of request message arrival at the application level to time of response message departure from the application level. Some may measure overall response time only; others will measure internal processing and external “wait” time separately.
Historically, in order to have some semblance of control over as many factors as possible, payment applications were written for specific platforms. Usually these platforms were proprietary fault tolerant systems whose cost/performance ratios were degraded by the inherent overhead of providing hardware- and system software- level fault tolerance. In these instances, performance – up to a limiting point – could be bought at a premium price. Frequently these fault tolerant platforms required special software design and coding techniques to properly and fully take advantage of the fault tolerance attribute. And if other components in the path such as communications routers and connections are not redundant, the premium paid for fault tolerance is all for naught.
Generally, it pays to initially set aside specialty hardware arrangement considerations and focus first on the payment application design itself. Pursuit of high performance using a poorly written application on a premium proprietary platform will be a truly expensive undertaking. When confining the performance issue to the payment application itself, there are a few key software attributes that contribute to the tangible performance characteristics of an online transaction processing (‘OLTP’) application. These key attributes are code path length, code efficiency, database design, and encryption approach.
Code path length is relatively objective and refers to the lines of code which translate eventually into the number of machine instructions executed in the process path of a transaction. Longer paths tend to produce longer response times and lower levels of performance and, obviously, shorter paths produce the converse.
Code path efficiency is more subjective and refers to the art (or science, if that is your viewpoint) of finding the logic design that requires the least lines of code to perform a particular function and the least number of functions to complete a transaction processing flow. Generally, but not always, the more experienced designer and coder will produce better software. However, for payment processing, the addition of the experience level of understanding the nuances of OLTP (some deign to call this “real-time processing”) in general and payments OLTP in particular is another efficiency factor.
Database design and how it affects any kind of application is a well-published subject that we do not need to cover here. Suffice to say, a poor database design or a poor implementation of it will have significant impact on an OLTP application which is sweating minute changes in the milliseconds that a process path takes. Again, OLTP and payments experience goes a long way toward subjugating database design as a performance issue. Simply stated, data reads and writes must be efficient and kept to a minimum.
Encryption of in-flight or stored card data is a necessary security step in processing a payment. Without a doubt, it is an expensive process often best relegated to an off-server “single purpose computing device.” However, there are some board-level implementations that do work well if they do not steal primary server CPU cycles. In either case, once again, OLTP experience will mitigate the risk of a poorly implemented encryption design.
Scalability is generally defined as the property of an application to improve its performance due to a change in scale of its hardware environment. Commonly, the hardware change usually involves either faster processors or more processors. It may also include chip or disk memory components with higher transfer rates. For OLTP applications, the environmental impact zone expands to include internal and external connectivity as well as database considerations. It does no good to scale the application if the encryption, database or communications processes cannot scale accordingly.
Many will look solely to an application for scalability when, in reality, it is also very much a system integration issue. Conversely, any application, including an OLTP application, can be designed to be non-scalable. One simple way is to make the application single-threaded … pretty much the “kiss of death” for an OLTP payment processing environment.
Assuming an OLTP application is multi-threaded and has an efficient code path, where can scalability go wrong? There is a multitude of ways. For example, improperly or poorly configured network routers can overload one processing path while underutilizing another path. Poorly configured servers with insufficient memory or processor power will invariably lead to poor performance. Using server clusters incorrectly can lead to load balancing and reliability issues. If they are not set up properly, database servers will severely impact an OLTP application’s ability to achieve even marginal performance.
So, where does scalability really come from or how is it best achieved? Scalability fundamentally derives from the ability of an application to take advantage of a faster server or more servers or both. That means the applications will produce improved performance via increased transaction capacity, reduce response time or both. Assuming a reasonable design and execution, most applications (our earlier single-threaded example notwithstanding) will show linear improvements in performance relative to the change in server count and/or processing power. Running on multiple servers does require the application be replicable in some means or that the servers themselves provide the replication transparently.
The multi-server approach to scalability will eventually become both complex and expensive as the number of servers increases. Increasing the processing power of a server seldom, if ever, creates any additional complexity. And, for commodity servers, Moore’s Law of computing power (2 times power increase every 18 months) comes into play and the additional expense of more processing power is not going to be significant.
Another approach is using virtualization technology to create replication. A Virtual Machine (‘VM’) will, in most cases, create a veneer of replication even when the application is not conducive to duplication. However, the VM approach possesses a fatal flaw: it creates potential single points of failure for multiple instances of the application. This weakness can be mitigated by running the VM on a proprietary fault tolerant hardware platform or in some form of clustered environment. Obviously, this two-part approach adds additional hardware costs on top of the costs for the virtualization technology. And it is a complex solution that creates a number of integration issues to be resolved.
For any payment processing application, replication creates a number of integration and configuration issues. As additional copies of the application are created, communications connections between the application and encryption devices, terminals (POS, ATM, Mobile, Kiosk, etc.) and gateways must be replicated. As connections are replicated, a decision must be made as to whether these are real or virtual connections. Juxtaposed to those choices are the decisions made on the various connection types in regards to failure points and backup paths. If a single physical communication line outage takes out three virtual connections to three separate applications instances, then all three application instances are a single point of failure by default.
When factoring the communications connections along with encryption device and database connections and communications routers into the decision cycle, the complexity of the integration and configuration process increases exponentially. And we haven’t even begun to talk about the overhead of these extra connections. Stated simply, over-replication will often create more problems than it solves.
So, it is readily apparent that scalability for an OLTP application is far more than just a matter of tossing more and faster hardware into the performance pot. For OLTP applications in general and payment processing applications specifically, scalability is a delicate balance of server power and application replication. Knowing where that balance occurs comes from years of experience designing, supporting and managing payment processing environments. In another post, I will talk about how OLS created a practical solution for the performance and scalability issues for a payments processing application.