Is that Transaction Result Code Hard or Soft ?

Soft Decline vs Hard Declines

We were discussing Result Codes (aka. Response Codes) during a call today. We were discussing both “Soft” and “Hard” Declines and the differences of them in the context of reviewing a payment interface and which transactions could be Store-and-Forwarded (‘SAF’).

Result codes are returned in Field or Data Element 39 in an ISO-8583 message.

We use the term “Soft” decline when a subsequent transaction request containing the same pertient information could recieve a different result.

These typically occur from a transient system issue or payment network issue and are temporary in nature.

Examples of some result codes that come to mind:

  • “19” Re-Enter Transaction
  • “91” Issuer Unavailable or Switch Inoperative
  • “96” System Malfunction or System Error

Hard declines contrast from Soft declines in that on a subsequent transation request, the responses are repeatable; you will recieve the same result.

Dev Chats

I lead development and IT teams. Many of us work remotely. We leverage group chats for effective team communication. We tend to use Skype as a tool to perform this, but you can use other tools: Campfire, HipChat, IRC, jabber, etc. Also - have a backup plan - when your primarily venue is not available. If you are used to working this way, and have an IM outage, it is like losing the internet. I’m serious. The key to dev chats is that they are asynchronous in nature. This is important. If you don’t answer a chat that means that you are not there, or are engaged in something else. You are not required to respond and state that you are busy or on a call. You simply don’t respond. Messages can be queued for later consumption and review. Most IM programs have a list of unread chats that can be reviewed. You are expected to configure your notifications, actually disable and mute them. I despise being on conference calls where IM beeps and dings occur in the background. I’m not sure how you can concentrate with all of those auditory and visual distractions. We have the following chats: Off-Topic (OT) The OT Chat is the water cooler talk. We broadcast “Lunch Time”, “Good Morning”, “Have a good evening”, cool links from HackerNews or Reddit’s /r/programming and other general things that keep us human and feeling apart of a team. It helps with culture as well. This is an optional chat, and maybe segmented by teams or departments, but very important to have, especially if you work from your basement, and don’t get out to lunch or coffee very much. Project Chats Project Chats tie to a project, lots of discussion occur here. Daily Standing-Meeting/Chats, priority expectations, requirements clarifications, testing / support, and general questions all live here. Each project chat has a distinctive name, and a cool icon to make it fun. Many discussions here are promoted to a YouTrack tickets. And general status updates are shared here as well with the team - These are much better them synchronous phone calls with project status updates. There are also plenty of - is this done yet or what is the ETA on xyz. Certain projects have additional chats with -MGMT, -DEV, -IT-OPS suffixes. Mangers can discuss project related task with dev leads and PM’s, without distracting the rest of the teams, and Dev’s can really geek out on more technical conversations when required. Guidance depends on the team size to split these out or not, IT-OPS cares about deploying and supporting apps based on more concrete release notes, not necessarily who committed what fix for a given ticket. So we try to segregate folks when it makes sense to. Private Chats Private Chats are generally discouraged, especially if the topic makes sense to discuss in a project chat. I discourage them for the following reasons: If they are private, others can’t learn from them or share their experiences or advice. If you are embarrassed or too shy to make a mistake in a group chat, then you are afraid to learn. If you are afraid of verbosity, keep it in the project chat unless other ask you to take the conversation offline. Tips * Use gists, screencasts, and use clickable links or permalinks in chats to ticketing systems, code repos, documents to make navigation easy. * < Ctrl-F > and search is your friend here. Search first, then ask - folks will paste repeated questions that have already been answered. * Promote ideas and conversations to appropriate venues, “Can we make a ticket for that” is often typed to facilate this. * Don’t be afraid to get on a phone call to discuss something, * Things can be missed in chats, if it is important make sure you have a tickler or enter in as a task or issue, vs assuming someone will remember to review history to address something. * Having a second monitor with dev chats on the side is also a good tip, off to the periphery… * Get face to face when you can as well - either physically or virtually - group video chats, and hangouts are fun to do from time to time. … @dbergert

Compression

data.jpeg

Sometimes you don’t get to define the requirements, they sometimes appear to serve a higher purpose that you can’t begin to understand. All you know is that they are requirements, and there were decisions made for various reasons. Sometimes you have to play the cards that you are dealt. But it is still your choice in how to play them.

I’m talking about message formats here, In a specific transaction processing system there are two requirements that we must adhere to:

  1. Accept a 8,000 - 10,000 bytes incoming fixed message format.
  2. Log the Raw Message Request and Responses for all interface connections

Regarding #1 I’d prefer to see a variable message format here instead, but I understand the need of an existing system to talk in the language this it is used to. Item #2 had me very concerned when I first heard of it, with my PCI background, I was ready to put my foot down and call people crazy - (Imagining the request to log raw messages that contained track data, pin blocks, card verification numbers) To my surprise this was not for a financial transaction processing system but for one of a different purpose. One that exists in a highly regulated word with data retention requirements and the need integrity of the raw transaction messages for compliance and legal reasons.

The challenge I had logging the raw messages where their sheer size - 10K and when you are looking at 4-6 legs of a transaction - client request, client response, endpoint request, endpoint response, and other transaction paths that sometimes seem recursive, we have 50K of logging for a single transaction - times 3 to 5 million transactions per day - that is 150 GB to 250 GB per day of logging !

The easiest solution was to look into compression - how much time would compressing the data stream before logging it take ? Would this impact transaction processing time ? How was the raw messages used ? If we compress the message, what needs to occur on applications on the other end, what language and platform are they written in, what is a portable algorithm ?

It turns out the these messages contains many repeating unused fields with default values - these compress very well:

image001.png

Enter gzip - On our platform Java’s GZipInputStream and for our clients tools the .NET GZipStream.

How did this work out ?

raw_size comp_size Compression %
------------------------------------------
3975 393 90.1
10599 484 95.4

How much disk storage and SAN space and upgrades were saved ;) Priceless.

Measuring External Duration of Endpoints

stopwatch.jpeg

We performed load testing a of new application with a client recently and a recurring question repeatedly came up: “How long was the transaction in OLS.Switch and how long was it at the endpoint ?”

It is an important question - one that is used to monitor application performance as well as to assist in troubleshooting purposes - and one we can clearly answer - the transaction took - a total of 5.6 seconds - and we waited up to our configured endpoint timeout of 5 seconds before we timed-out the transaction. Or - the transaction took 156 ms - 26 ms of those against a local response simulator.

In our application we use a profiler to trace execution time of each of our Transaction Participants: In which we see in our application logs as:

A normal transaction:

open [0/0]
parse-request [7/7]
create-***-tranlog [9/16]
populate-****-tranlog [1/17]
validate-***** [42/59]
validate-***** [1/60]
validate-**** [0/60]
create-*****-request [24/84]
query-** [26/110]
prepare-**-response [40/150]
close [6/156]
send-response [0/156]
end [157/157]

A timed-out transaction:

open [2/2]
parse-request [23/25]
create-***-tranlog [91/116]
populate-***-tranlog [1/117]
validate-*** [67/184]
validate-***-card [31/215]
validate-** [1/216]
create-****-request [32/248]
query-*** [5000/5248]
prepare-***-response [67/5315]
close [284/5599]
send-response [0/5599]
end [5600/5600]

(* note these traces are from a test app running on my macbook and are for illustrative purposes only *)

While we can answer the question by reviewing application logs - it is harder to perform any analysis on a series of transactions, specifically for external duration. We can do currently for total duration, however - this is valuable from the device perspective for how long a transaction took to process.

Logging the external duration along with our total duration for switched-out transactions and we now have:

duration.png

CaseSwitch - Source Port Routing

We have implemented a new component to our Java and jPOS fueled Payment Switch - OLS.Switch which we have called the CaseSwitch. The vast majority of our switching algorithms are based on either the determination of CardType - which dictates which outbound endpoint we send that transaction to, or on Card Bin Ranges. An example of a Bin Range:

BinRanges.png

If a CardNumber’s Bin or IIN - matches our Bin Range configurations - We will select the appropriate EndPoint. In this example if we have a VISA or MC Card we switch it out to a FDR Gateway. If we were connecting to a to MasterCard MIP or Visa VAP or DEX then we would have a MC and VISA EndPoint defined with our BankNet and VisaNet interfaces and switch the transactions to those endpoints. An example of a Card Type: We have certain transaction types that we know where they go because of their Card Type - Many of these are internal authorization hosts such as implementations of Authorized Returns, MethCheck, Loyalty, Couponing. Others are transactions where the transaction type also dictates the card type - such as those to GreenDot, InComm and other external hosts where a BIN Range lookup is unnecessary. Source (Port) Based Routing We recently had a requirement for Source-Based Routing - where depending on the source port that would dictate the outbound transaction path(s). In our Server we accept the incoming transaction and then place a Context varaible we call PORT that tells us which Server Port the transaction came in on. One we have that additional data we can perform a Logic Branch in our Transaction Manager that looks like this. This allows us to define transaction paths based on the incoming port of the server, so in this example.



<property name=”case 5001” value=”LookUpResponse Log Close Send Debug” />
<property name=”case 5002” value=”QueryRemoteHost_xxx Log Close Send Debug” />
<property name=”case 5005” value=”QueryRemoteHost_yyy Log Close Send Debug” />
<property name=”default” value=”Log Close Debug” />

Port 5001 - we perform an authorization locally Port 5002 - we switch out the transaction and reformat it to endpoint xxx - message format and interchanges requirements. Port 5005 - we switch out the transaction and reformat it to endpoint yyy - message format and interchanges requirements.

Signed Overpunch or Zoned Decimal or what are these weird characters in numeric fields ???

cobol.jpg

We interface to many different systems and sometimes we get to talk to IBM Mainframes or message formats that uses Signed Overpunch

Where we see numberic values like “100000{“ , “100999I”, or “100495N”

Signed Overpunch is used in order to save a byte the last character can indicate both sign (+ / -) and value.

These types are defined in COBOL Copybook this looks like:

S9(3)V9(4);

which equate to :

100000{ = 100.0000

100999I = 100.9999

100495N = -100.4955

Here is a snippet of Java Code that we use to handle this:

public static final char\[\] gt_0 = { 
    '{', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' 
};
public static final char\[\] lt_0 = { 
    '}', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R' 
};

protected static String convertToCobolSignedString (String aString) {
int aInt = Integer.parseInt(aString);
char[] conv = (aInt >= 0) ? gt_0 : lt_0;
int lastChar = (int) aInt % 10;
StringBuffer sb = new StringBuffer (Integer.toString(aInt));
sb.setCharAt (sb.length()-1, conv[lastChar]);
return sb.toString();
}

Continuous integration

Testing acquirer side implementations are hard. The incoming message formats and communication protocols from the card acceptors ( Payment Terminals, Point-of-Sale Machines, Store Controllers) are known and the endpoint’s message formats and communication protocols are also known. The challenge is testing and validating the translated incoming messages to various outbound endpoints, their communication protocols and message formats. Some end-points provide simulators (very very few) others will allow test access over leased lines and communication equipment over a separate test ip/port combination. This works well for our customers to perform user acceptance and certification -to these endpoints - this isn’t viable for regression and testing during phases and development before code delivery. We have solved some of this with various custom build response simulators that have basic logic - typically transaction amount prompted to provide alternating response messages. These response messages are built from message specs or are built from captured network traffic on test systems. We can only be sure we are simulating basic transaction types and request and response channels, however. Oh, and then there is always this problem.

Issuer side implementations are easier test - you can feed into the authorization host both simulated network and local transaction sets to test implemented authorization rules and other features.

testing.jpg

In 2009 we built and launched a new Issuing Payment Switch and tested it using Continuous Integration techniques. This system has 3 primary interfaces.

  1. Network - connected to an association’s network to recieve incoming transactions based on BIN ranges.
  2. Local - Card Management style interface - Manage Cardholder, Cards, and Accounts on the system - and allow local transaction sets to be performed.
  3. Flat File generation: Authorized File, Financial File, and a Card Status and Balances File. Flat file processing - clearing/settlement/reconciliation files.

Continuous Integration as defined by Martin Fowler:

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly. This article is a quick overview of Continuous Integration summarizing the technique and its current usage.

CI’s general steps:

  1. Maintain a code repository
  2. Automate the build
  3. Make the build self-testing
  4. Everyone commits every day
  5. Every commit (to mainline) should be built
  6. Keep the build fast
  7. Test in a clone of the production environment
  8. Make it easy to get the latest deliverables
  9. Everyone can see the results of the latest build
  10. Automate Deployment

Our CI model is based on an implementation that is scheduled multiple times a day - It checks out the code from our software repository, compiles it, builds a new database and schema and required setup data, starts our software services up - performs unit tests, shutdown the software services, and we receive an email and attachments that tell us if the code compiled and the results of the unit tests and which ones were successful and unsuccessful. The email attachments we receive contain the run log zipped of the transactions, and a unit test report.
Our Unit tests are written using the Groovy Programming Language and we leverage the TestNG testing framework. We act as a network client to our switch which was built and ran from the current source, and perform both Network and Local Side testing. The system is also setup using some of the Local Transaction sets, Here is a short list of a few of the transaction types:
Local:

  • Add Cardholder
  • Add Card
  • Add Account
  • Debit Account (Load Funds)
  • Set Cardholder/Card/Account Status (Active/Lost/Stolen/Suspended/etc)
  • Local Debit and Credits
  • Balance Inquiry
  • Expire Authorization
  • View Transaction History

Network:

  • Authorization
  • Completions
  • Chargebacks
  • Representments
  • Force Posts
  • Returns
  • Reverals

The combination of local and network transaction types are tested against various test cases.
If we setup a Cardholder with AVS information and run an AVS Authorization - do we get the expected results, and for each AVS result code ? Does an authorization on a statused card get approved ? Do transactions with amounts greater then, equal to, or less then the cardholder’s available balance get authorized or declined properly ? Authorization on a Card not Found ? You get the idea.
We build and test our issuer platform a few times a day - each developer can also run the test suite locally on their development environment, this ensures that future changes doesn’t impact existing functionality. On a test failure - relevant information in included in the autotest emails so issue discovery can be identified by our business analysts and developers without logging into test systems.
Oh, and Please don’t break the build ;)

You don't know until you know (or go into Production)

images-2.jpgOver the last six months we have been busy building and implementing an OLS.Switch Issuer Implementation with one of our customers and their banking and payment processing partners. It has been a process of reviewing and implementing message specifications, business processing requirements, authorization rules, clearing, settlement, flat file and reporting requirements. We also filtering external messages into our IMF - Internal Message Format based on ISO8583 v2003, build an interface Card Management functions via our local API’s and message sets. Building client simulators and trying to faithfully reproduce what happens when you are connected to a real system.

Testing on test systems is the next step - replacing our client simulators with other “test” systems that are driven by simulators by the processing gateway we interfaced to. Those simulators have limitations - in their configured test suites or test scripts, some require manual entry to send original data elements for subsequent transaction types, (e.g completions and reversals). We generate clearing and settlement files and match those to on-line test transactions, and our use cases.

After on-line testing, you connect to an “Association” test environment to do “Certification” and run a week’s worth of transactions through a wider test bed. Then you are certified, your BIN goes live and then you enter a production pilot mode - where you watch everything like a hawk.

You can do all of the simulated testing for both on-line transactions and off-line clearing and settlement files that you want - when you connect to the real world and do your first pilot transaction that is where most likely you will see something that wasn’t simulated, tested, or even included in certification, it happens. You need to be proactive, set-up reviews and manual interventions, perform file generation when you have staff available to review the output before it is released for further processing.

What have we seen :

  • Test environments that are not as robust as production or not setup with up-to-date releases.
  • Certain real-world examples are hard to simulate - reversals, time-outs.
  • Thinly-trafficked transactions: (chargeback, representment)…people can’t even define these much less create them in test
  • Poor or incorrect documentation of message specifications.
  • You receive Stand-In Advices or other transactions on-line that you don’t see in testing or certification.

Production pilot is a very important phase of testing - It is where you discover and address the < 1% of issues nobody catches in prior project life-cycles. What can happen, WILL happen. What you think might be something that will occur infrequently will bite you sooner, not later.

Protect Debug Info Transaction Participant

jPOS-EE has a very handy transaction participant called “Debug” its main purpose is to dump the contents of the jPOS’s Context. While this is very helpful in test modes and during development - The context remains “un-protected” and all of the data remains in the clear. Even the ProtectedLogListener and FSDProtectedLogListener will not protect data in the context. Enter the ProtectDebugInfo Transaction Participant a configurable implementation I wrote based on some of Alejandro’s ideas, and one that lives in most of OLS’s payment products in various specific iterations. It’s configuration looks like: ProtectDebugInfo.png Protecting your q2.log in this truncated example:

account-number: ‘599999______0001’ 599999______0001

Put ‘request’, ‘response’ tranlog columns in new table

My partner in crime, Andy Orrock, writes a post about a feature (more of an enhancement) that we have implemented in our OLS.Switch product in a recent blog post titled: Put ‘request’, ‘response’ tranlog columns in new table, I wanted to add some of my own commentary on this change. Please read Andy’s Post first before continuing.

As a Payment Switch there are times ( especially in development / testing ) that you will want to log or see what the switch is sent from a terminal or POS system, or sent and recieved from an authorization end-point. This feature is very handy during integration to new end-points, different message formats, changes with additional data elements and initial testing and certification efforts in test environments. In Production this is very, very bad, because raw messages contain card-numbers, Track Data, CVV2 Data, PIN Blocks, and all of the other “Bad” stuff one is prohibited to store according to PCI. OLS.Switch by default has this feature turned off, and recommends its use as a last resort for troubleshooting production problems.

Let me rip the introduction paragraph and a few bullets from our PABP Implementation Guide:

Secure Troubleshooting Procedures

OLS.Switch is configured to use various techniques to either protect or wipe sensitive cardholder and authentication data to prevent storage of prohibited data, or to use encryption to render the card number unreadable.

There may be instances in which sensitive cardholder information or sensitive authentication data needs to be viewed for troubleshooting purposes. Sensitive authentication information must only by collected when needed to solve a specific problem. The following are secure troubleshooting procedures designed to allow limited controlled access for troubleshooting purposes, all steps must be followed. You must be authorized and approved to make these system configuration changes. Furthermore, it is recommended that your internal company’s Change Management and Problem Management policies and procedures are followed in conjunction with these procedures.

<!-- /* Font Definitions / @font-face {font-family:Arial; panose-1:2 11 6 4 2 2 2 2 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} / Style Definitions / p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:””; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; mso-bidi-font-size:12.0pt; font-family:”Times New Roman”; mso-ascii-font-family:Arial; mso-fareast-font-family:”Times New Roman”; mso-hansi-font-family:Arial; mso-bidi-font-family:”Times New Roman”;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} / List Definitions */ @list l0 {mso-list-id:1140000935; mso-list-type:hybrid; mso-list-template-ids:-872906204 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;} @list l0:level1 {mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} –>

x. Determine if troubleshooting can be performed on test environment with test card numbers. Perform troubleshooting in that environment first.

….

<!-- /* Font Definitions / @font-face {font-family:Arial; panose-1:2 11 6 4 2 2 2 2 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} / Style Definitions / p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:””; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; mso-bidi-font-size:12.0pt; font-family:”Times New Roman”; mso-ascii-font-family:Arial; mso-fareast-font-family:”Times New Roman”; mso-hansi-font-family:Arial; mso-bidi-font-family:”Times New Roman”;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} / List Definitions */ @list l0 {mso-list-id:1140000935; mso-list-type:hybrid; mso-list-template-ids:-872906204 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;} @list l0:level1 {mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} –> x. Only collect the limited amount of information needed to solve the specific problem. Only collect enough data in the troubleshooting log that is required to address the specific problem

(There are lots of other steps and controls to verify that any changes are set back to default, appropriate destruction of captured data is handled, etc, etc, etc.)

Logging raw messages is a dangerous feature and much care needs to be taken with it, and is rightfully heavily scrutinized with knowledgeable PCI Auditors, while not an issue in a test environment using test card numbers, a system misconfiguration or human mistake or “forgotten” changed setting in production could prove disastrous. OLS has added some “controls” around this feature.
Previously there were columns in our TranLog that were called REQUEST and RESPONSE, in order to enable this type of logging an entity such as a store or specific terminal (Terminal ID) would need to be configured and enabled to do so, and would need to follow all of our “user controls” and recommended procedures (including preventive and detective controls) in our PABP guide. For the record non of our clients on our production system have any data in the REQUEST and RESPONSE columns of the TranLog in production environments. I’m happy that it is not a widely used feature in production.

With the new release we now have a single related table called raw_request that has a relationship with a transaction in the TranLog - a much cleaner and normailzed approach. In addition to this, there is a system-wide parameter called auditTrace for each OLS.Switch module that must be enabled by setting the value to true, it is defaulted to false. These system wide parameters are based off of configuration files, and we recommend that our clients use File Integrity Monitoring to detect and alert on any changes to application configuration files. Once the system-wide parameter for the modules are enabled, a specific store or terminal needs to be configured and enabled; It is a two step process. In addition, This approch also makes it easier to detect if the system is configured in a “non-compliant” fashion - we have monitoring tasks and alerts that scan the raw_message table, and alerts if the row count is non-zero. Also if there is any Database replication or archiving, moving this data to a separate table, ensures that troubleshooting data remains and isn’t disseminated.

This feature is a necessary evil that most of our customers ask for or have in other Payment Switches (we do have the ability to remove the raw_message table and functionality completely). We hope that further adding preventive controls (Making it harder to enable, user controls to use dual control and have secure troubleshooting policies and detailed secure troubleshooting procedures to follow), detective controls (user controls to detect application configuration changes and monitor row counts of the raw_message table) ensure that it is an intentional change on the customer’s part to enable this functionality.

Also: the following paragraph by Andy shows off our different biases:

One follow-up to this Release Note: I asked Dave how we should set ‘auditTrace’ in production – my thought was to set it to ‘true,’ thinking we’d be at the ready to turn on tracing without a service re-cycle. Dave strongly disagreed and stated: OLS.Switch ought to be “Secure by Default” in production. I really liked that.

Dave = Security Focused.
Andy = Operations and Timely Troubleshooting.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×