Automating pingdom probe source IP's in Cisco ASA using Terraform

Pingdom is a monitoring service that uses “a global network of 100+ servers from all over the world as often as every minute.” These servers are called probes.

When a website needs to be monitored, but access to it is restricted by the firewall to source IP addresses of the client, also called whitelisting, a list of the Pingdom probes needs to be maintained. Whenever Pingdom changes or adds a probe - you can see the following issue found on one of their help pages:

One of the most common reason for Pingdom reporting an outage is not that a site or server is down, but that our servers are being blocked by a firewall or access control list.

If a Pingdom probe is blocked - there is a good chance that it will alarm and notify you that you website is down.

Pingdom provides a list of those probes IP addresses here https://my.pingdom.com/probes/ipv4

Terraform is a tool that allows you to automate the provision of infrastructure. I noticed that Terraform has a Cisco ASA provider and gave it a quick whirl:

What I want to do is use the list of the Pingdom probe ip address, and update a network object group in my ASA firewall that has a firewall rule to allow access to the website:

Here is my terraform cisco-asa-pingdom-probes.tf file that implements this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
provider "ciscoasa" {
api_url = "https://x.x.x.x"
username = "xxxxxxxxxx"
password = "xxxxxxxxxx"
ssl_no_verify = false
}

resource "ciscoasa_network_object_group" "pingdom" {
name = "tf_pingdom_probes"
members = "${split("\n", trimspace(data.http.pingdom_ranges.body))}"

}

data "http" "pingdom_ranges" {
url = "https://my.pingdom.com/probes/ipv4"
}

To run it, you use the following commands:

$terraform init

$terraform apply

Here is the output of terraform apply

terraform

Here object group is shown in Cisco ADSM tool:

ciscoasdm

Next steps would be to schedule this job to run on a recurring basis to keep those probe ip addresses up-to-date.’

Is that Transaction Result Code Hard or Soft ?

Soft Decline vs Hard Declines

We were discussing Result Codes (aka. Response Codes) during a call today. We were discussing both “Soft” and “Hard” Declines and the differences of them in the context of reviewing a payment interface and which transactions could be Store-and-Forwarded (‘SAF’).

Result codes are returned in Field or Data Element 39 in an ISO-8583 message.

We use the term “Soft” decline when a subsequent transaction request containing the same pertient information could recieve a different result.

These typically occur from a transient system issue or payment network issue and are temporary in nature.

Examples of some result codes that come to mind:

  • “19” Re-Enter Transaction
  • “91” Issuer Unavailable or Switch Inoperative
  • “96” System Malfunction or System Error

Hard declines contrast from Soft declines in that on a subsequent transation request, the responses are repeatable; you will recieve the same result.

Dev Chats

I lead development and IT teams. Many of us work remotely. We leverage group chats for effective team communication. We tend to use Skype as a tool to perform this, but you can use other tools: Campfire, HipChat, IRC, jabber, etc. Also - have a backup plan - when your primarily venue is not available. If you are used to working this way, and have an IM outage, it is like losing the internet. I’m serious. The key to dev chats is that they are asynchronous in nature. This is important. If you don’t answer a chat that means that you are not there, or are engaged in something else. You are not required to respond and state that you are busy or on a call. You simply don’t respond. Messages can be queued for later consumption and review. Most IM programs have a list of unread chats that can be reviewed. You are expected to configure your notifications, actually disable and mute them. I despise being on conference calls where IM beeps and dings occur in the background. I’m not sure how you can concentrate with all of those auditory and visual distractions. We have the following chats: Off-Topic (OT) The OT Chat is the water cooler talk. We broadcast “Lunch Time”, “Good Morning”, “Have a good evening”, cool links from HackerNews or Reddit’s /r/programming and other general things that keep us human and feeling apart of a team. It helps with culture as well. This is an optional chat, and maybe segmented by teams or departments, but very important to have, especially if you work from your basement, and don’t get out to lunch or coffee very much. Project Chats Project Chats tie to a project, lots of discussion occur here. Daily Standing-Meeting/Chats, priority expectations, requirements clarifications, testing / support, and general questions all live here. Each project chat has a distinctive name, and a cool icon to make it fun. Many discussions here are promoted to a YouTrack tickets. And general status updates are shared here as well with the team - These are much better them synchronous phone calls with project status updates. There are also plenty of - is this done yet or what is the ETA on xyz. Certain projects have additional chats with -MGMT, -DEV, -IT-OPS suffixes. Mangers can discuss project related task with dev leads and PM’s, without distracting the rest of the teams, and Dev’s can really geek out on more technical conversations when required. Guidance depends on the team size to split these out or not, IT-OPS cares about deploying and supporting apps based on more concrete release notes, not necessarily who committed what fix for a given ticket. So we try to segregate folks when it makes sense to. Private Chats Private Chats are generally discouraged, especially if the topic makes sense to discuss in a project chat. I discourage them for the following reasons: If they are private, others can’t learn from them or share their experiences or advice. If you are embarrassed or too shy to make a mistake in a group chat, then you are afraid to learn. If you are afraid of verbosity, keep it in the project chat unless other ask you to take the conversation offline. Tips * Use gists, screencasts, and use clickable links or permalinks in chats to ticketing systems, code repos, documents to make navigation easy. * < Ctrl-F > and search is your friend here. Search first, then ask - folks will paste repeated questions that have already been answered. * Promote ideas and conversations to appropriate venues, “Can we make a ticket for that” is often typed to facilate this. * Don’t be afraid to get on a phone call to discuss something, * Things can be missed in chats, if it is important make sure you have a tickler or enter in as a task or issue, vs assuming someone will remember to review history to address something. * Having a second monitor with dev chats on the side is also a good tip, off to the periphery… * Get face to face when you can as well - either physically or virtually - group video chats, and hangouts are fun to do from time to time. … @dbergert

Compression

data.jpeg

Sometimes you don’t get to define the requirements, they sometimes appear to serve a higher purpose that you can’t begin to understand. All you know is that they are requirements, and there were decisions made for various reasons. Sometimes you have to play the cards that you are dealt. But it is still your choice in how to play them.

I’m talking about message formats here, In a specific transaction processing system there are two requirements that we must adhere to:

  1. Accept a 8,000 - 10,000 bytes incoming fixed message format.
  2. Log the Raw Message Request and Responses for all interface connections

Regarding #1 I’d prefer to see a variable message format here instead, but I understand the need of an existing system to talk in the language this it is used to. Item #2 had me very concerned when I first heard of it, with my PCI background, I was ready to put my foot down and call people crazy - (Imagining the request to log raw messages that contained track data, pin blocks, card verification numbers) To my surprise this was not for a financial transaction processing system but for one of a different purpose. One that exists in a highly regulated word with data retention requirements and the need integrity of the raw transaction messages for compliance and legal reasons.

The challenge I had logging the raw messages where their sheer size - 10K and when you are looking at 4-6 legs of a transaction - client request, client response, endpoint request, endpoint response, and other transaction paths that sometimes seem recursive, we have 50K of logging for a single transaction - times 3 to 5 million transactions per day - that is 150 GB to 250 GB per day of logging !

The easiest solution was to look into compression - how much time would compressing the data stream before logging it take ? Would this impact transaction processing time ? How was the raw messages used ? If we compress the message, what needs to occur on applications on the other end, what language and platform are they written in, what is a portable algorithm ?

It turns out the these messages contains many repeating unused fields with default values - these compress very well:

image001.png

Enter gzip - On our platform Java’s GZipInputStream and for our clients tools the .NET GZipStream.

How did this work out ?

raw_size comp_size Compression %
------------------------------------------
3975 393 90.1
10599 484 95.4

How much disk storage and SAN space and upgrades were saved ;) Priceless.

Measuring External Duration of Endpoints

stopwatch.jpeg

We performed load testing a of new application with a client recently and a recurring question repeatedly came up: “How long was the transaction in OLS.Switch and how long was it at the endpoint ?”

It is an important question - one that is used to monitor application performance as well as to assist in troubleshooting purposes - and one we can clearly answer - the transaction took - a total of 5.6 seconds - and we waited up to our configured endpoint timeout of 5 seconds before we timed-out the transaction. Or - the transaction took 156 ms - 26 ms of those against a local response simulator.

In our application we use a profiler to trace execution time of each of our Transaction Participants: In which we see in our application logs as:

A normal transaction:

open [0/0]
parse-request [7/7]
create-***-tranlog [9/16]
populate-****-tranlog [1/17]
validate-***** [42/59]
validate-***** [1/60]
validate-**** [0/60]
create-*****-request [24/84]
query-** [26/110]
prepare-**-response [40/150]
close [6/156]
send-response [0/156]
end [157/157]

A timed-out transaction:

open [2/2]
parse-request [23/25]
create-***-tranlog [91/116]
populate-***-tranlog [1/117]
validate-*** [67/184]
validate-***-card [31/215]
validate-** [1/216]
create-****-request [32/248]
query-*** [5000/5248]
prepare-***-response [67/5315]
close [284/5599]
send-response [0/5599]
end [5600/5600]

(* note these traces are from a test app running on my macbook and are for illustrative purposes only *)

While we can answer the question by reviewing application logs - it is harder to perform any analysis on a series of transactions, specifically for external duration. We can do currently for total duration, however - this is valuable from the device perspective for how long a transaction took to process.

Logging the external duration along with our total duration for switched-out transactions and we now have:

duration.png

CaseSwitch - Source Port Routing

We have implemented a new component to our Java and jPOS fueled Payment Switch - OLS.Switch which we have called the CaseSwitch. The vast majority of our switching algorithms are based on either the determination of CardType - which dictates which outbound endpoint we send that transaction to, or on Card Bin Ranges. An example of a Bin Range:

BinRanges.png

If a CardNumber’s Bin or IIN - matches our Bin Range configurations - We will select the appropriate EndPoint. In this example if we have a VISA or MC Card we switch it out to a FDR Gateway. If we were connecting to a to MasterCard MIP or Visa VAP or DEX then we would have a MC and VISA EndPoint defined with our BankNet and VisaNet interfaces and switch the transactions to those endpoints. An example of a Card Type: We have certain transaction types that we know where they go because of their Card Type - Many of these are internal authorization hosts such as implementations of Authorized Returns, MethCheck, Loyalty, Couponing. Others are transactions where the transaction type also dictates the card type - such as those to GreenDot, InComm and other external hosts where a BIN Range lookup is unnecessary. Source (Port) Based Routing We recently had a requirement for Source-Based Routing - where depending on the source port that would dictate the outbound transaction path(s). In our Server we accept the incoming transaction and then place a Context varaible we call PORT that tells us which Server Port the transaction came in on. One we have that additional data we can perform a Logic Branch in our Transaction Manager that looks like this. This allows us to define transaction paths based on the incoming port of the server, so in this example.



<property name=”case 5001” value=”LookUpResponse Log Close Send Debug” />
<property name=”case 5002” value=”QueryRemoteHost_xxx Log Close Send Debug” />
<property name=”case 5005” value=”QueryRemoteHost_yyy Log Close Send Debug” />
<property name=”default” value=”Log Close Debug” />

Port 5001 - we perform an authorization locally Port 5002 - we switch out the transaction and reformat it to endpoint xxx - message format and interchanges requirements. Port 5005 - we switch out the transaction and reformat it to endpoint yyy - message format and interchanges requirements.

Signed Overpunch or Zoned Decimal or what are these weird characters in numeric fields ???

cobol.jpg

We interface to many different systems and sometimes we get to talk to IBM Mainframes or message formats that uses Signed Overpunch

Where we see numberic values like “100000{“ , “100999I”, or “100495N”

Signed Overpunch is used in order to save a byte the last character can indicate both sign (+ / -) and value.

These types are defined in COBOL Copybook this looks like:

S9(3)V9(4);

which equate to :

100000{ = 100.0000

100999I = 100.9999

100495N = -100.4955

Here is a snippet of Java Code that we use to handle this:

public static final char\[\] gt_0 = { 
    '{', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' 
};
public static final char\[\] lt_0 = { 
    '}', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R' 
};

protected static String convertToCobolSignedString (String aString) {
int aInt = Integer.parseInt(aString);
char[] conv = (aInt >= 0) ? gt_0 : lt_0;
int lastChar = (int) aInt % 10;
StringBuffer sb = new StringBuffer (Integer.toString(aInt));
sb.setCharAt (sb.length()-1, conv[lastChar]);
return sb.toString();
}

Continuous integration

Testing acquirer side implementations are hard. The incoming message formats and communication protocols from the card acceptors ( Payment Terminals, Point-of-Sale Machines, Store Controllers) are known and the endpoint’s message formats and communication protocols are also known. The challenge is testing and validating the translated incoming messages to various outbound endpoints, their communication protocols and message formats. Some end-points provide simulators (very very few) others will allow test access over leased lines and communication equipment over a separate test ip/port combination. This works well for our customers to perform user acceptance and certification -to these endpoints - this isn’t viable for regression and testing during phases and development before code delivery. We have solved some of this with various custom build response simulators that have basic logic - typically transaction amount prompted to provide alternating response messages. These response messages are built from message specs or are built from captured network traffic on test systems. We can only be sure we are simulating basic transaction types and request and response channels, however. Oh, and then there is always this problem.

Issuer side implementations are easier test - you can feed into the authorization host both simulated network and local transaction sets to test implemented authorization rules and other features.

testing.jpg

In 2009 we built and launched a new Issuing Payment Switch and tested it using Continuous Integration techniques. This system has 3 primary interfaces.

  1. Network - connected to an association’s network to recieve incoming transactions based on BIN ranges.
  2. Local - Card Management style interface - Manage Cardholder, Cards, and Accounts on the system - and allow local transaction sets to be performed.
  3. Flat File generation: Authorized File, Financial File, and a Card Status and Balances File. Flat file processing - clearing/settlement/reconciliation files.

Continuous Integration as defined by Martin Fowler:

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly. This article is a quick overview of Continuous Integration summarizing the technique and its current usage.

CI’s general steps:

  1. Maintain a code repository
  2. Automate the build
  3. Make the build self-testing
  4. Everyone commits every day
  5. Every commit (to mainline) should be built
  6. Keep the build fast
  7. Test in a clone of the production environment
  8. Make it easy to get the latest deliverables
  9. Everyone can see the results of the latest build
  10. Automate Deployment

Our CI model is based on an implementation that is scheduled multiple times a day - It checks out the code from our software repository, compiles it, builds a new database and schema and required setup data, starts our software services up - performs unit tests, shutdown the software services, and we receive an email and attachments that tell us if the code compiled and the results of the unit tests and which ones were successful and unsuccessful. The email attachments we receive contain the run log zipped of the transactions, and a unit test report.
Our Unit tests are written using the Groovy Programming Language and we leverage the TestNG testing framework. We act as a network client to our switch which was built and ran from the current source, and perform both Network and Local Side testing. The system is also setup using some of the Local Transaction sets, Here is a short list of a few of the transaction types:
Local:

  • Add Cardholder
  • Add Card
  • Add Account
  • Debit Account (Load Funds)
  • Set Cardholder/Card/Account Status (Active/Lost/Stolen/Suspended/etc)
  • Local Debit and Credits
  • Balance Inquiry
  • Expire Authorization
  • View Transaction History

Network:

  • Authorization
  • Completions
  • Chargebacks
  • Representments
  • Force Posts
  • Returns
  • Reverals

The combination of local and network transaction types are tested against various test cases.
If we setup a Cardholder with AVS information and run an AVS Authorization - do we get the expected results, and for each AVS result code ? Does an authorization on a statused card get approved ? Do transactions with amounts greater then, equal to, or less then the cardholder’s available balance get authorized or declined properly ? Authorization on a Card not Found ? You get the idea.
We build and test our issuer platform a few times a day - each developer can also run the test suite locally on their development environment, this ensures that future changes doesn’t impact existing functionality. On a test failure - relevant information in included in the autotest emails so issue discovery can be identified by our business analysts and developers without logging into test systems.
Oh, and Please don’t break the build ;)

You don't know until you know (or go into Production)

images-2.jpgOver the last six months we have been busy building and implementing an OLS.Switch Issuer Implementation with one of our customers and their banking and payment processing partners. It has been a process of reviewing and implementing message specifications, business processing requirements, authorization rules, clearing, settlement, flat file and reporting requirements. We also filtering external messages into our IMF - Internal Message Format based on ISO8583 v2003, build an interface Card Management functions via our local API’s and message sets. Building client simulators and trying to faithfully reproduce what happens when you are connected to a real system.

Testing on test systems is the next step - replacing our client simulators with other “test” systems that are driven by simulators by the processing gateway we interfaced to. Those simulators have limitations - in their configured test suites or test scripts, some require manual entry to send original data elements for subsequent transaction types, (e.g completions and reversals). We generate clearing and settlement files and match those to on-line test transactions, and our use cases.

After on-line testing, you connect to an “Association” test environment to do “Certification” and run a week’s worth of transactions through a wider test bed. Then you are certified, your BIN goes live and then you enter a production pilot mode - where you watch everything like a hawk.

You can do all of the simulated testing for both on-line transactions and off-line clearing and settlement files that you want - when you connect to the real world and do your first pilot transaction that is where most likely you will see something that wasn’t simulated, tested, or even included in certification, it happens. You need to be proactive, set-up reviews and manual interventions, perform file generation when you have staff available to review the output before it is released for further processing.

What have we seen :

  • Test environments that are not as robust as production or not setup with up-to-date releases.
  • Certain real-world examples are hard to simulate - reversals, time-outs.
  • Thinly-trafficked transactions: (chargeback, representment)…people can’t even define these much less create them in test
  • Poor or incorrect documentation of message specifications.
  • You receive Stand-In Advices or other transactions on-line that you don’t see in testing or certification.

Production pilot is a very important phase of testing - It is where you discover and address the < 1% of issues nobody catches in prior project life-cycles. What can happen, WILL happen. What you think might be something that will occur infrequently will bite you sooner, not later.

Protect Debug Info Transaction Participant

jPOS-EE has a very handy transaction participant called “Debug” its main purpose is to dump the contents of the jPOS’s Context. While this is very helpful in test modes and during development - The context remains “un-protected” and all of the data remains in the clear. Even the ProtectedLogListener and FSDProtectedLogListener will not protect data in the context. Enter the ProtectDebugInfo Transaction Participant a configurable implementation I wrote based on some of Alejandro’s ideas, and one that lives in most of OLS’s payment products in various specific iterations. It’s configuration looks like: ProtectDebugInfo.png Protecting your q2.log in this truncated example:

account-number: ‘599999______0001’ 599999______0001

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×