In 2001 I helped a reporting system strategy project for a regional bank. One day, the project manager and business expert, Lyn San got up to the white board and said, “Kip, why don’t we suggest a system that looks like this?” She then began to draw boxes, each one representing a system component to perform some function in a fairly standard reporting or analytical system architecture.

As she finished, I asked, “Lyn, why would we suggest so many boxes?” She was a bit stumped and said, “Well, isn’t that the way these systems are built?” I said, “Yes, typically, but why?”

I have since concluded that the simplest explanation is that we don’t have unlimited computing capacity, and we can’t manage unlimited complexity. Those two reasons are why we don’t perform all processing in one big box – that big box being the operational systems.

Arrangement Ledger Performance

Yet the entire premise of this book is that if we discover the underlying fundamental computing patterns, and automate those efficiently, we can achieve much greater scale in analytical applications than we do today.

As an example of the most recent performance statistics for a large global bank’s implementation of the platform, in one process SAFR was configured to read four files:

  • 80 million SAL records
  • 33.5 million AL key records (attributes which describe the balances below)
  • 82.5 million AL daily balances
  • 165.5 million AL monthly balances

For a total of 361.3 million total records. It then produced nine different outputs, ranging in size from 125,000 records to 248.3 million records1 each of which summarized all of the input balances, writing in total 311.9 million output records. Using SAFR piping processes, it was possible to only perform one look-up from each daily and monthly record to the key and from the key to the SAL record. Thus the process performed a limited number of SAFR joins, 281.3 million joins.

This process was done in 120 CPU minutes. Thus if we take 361.3 million inputs, and 311.9 million outputs, for a total records processed of 673.2 million, SAFR processed on average 5.6 million records a minute combined input and output2.


In a separate analysis, David Heap, a well-respected IBM System Z engineer, performed a detailed analysis of a SAFR process reading 11.186 billion input records, representing 1.551 terabytes of data. The process performed 19.709 billion lookups or in memory joins. It output 412 million output records for five different views. This required 4 hours and 45 minutes of elapsed time, and 20 hours 45 minutes of CPU time. After calculating total instructions available on this specific machine, David commented: “This is …2,533 instructions per record – a VERY short instruction path-length!!”3

In other words, substantially more processing is possible than we typically think can be done.

Search Engines and Scale

The problem, in a nutshell, is one of accumulation of history. As noted in The Problem, Internet search giants have solved the problem of finding a needle in a haystack. Search technology allows us to do that very efficiently. But finding a needle in a haystack is not the same as updating yesterday’s balance with today’s activity, and then combining that updated balance with a host of other balances to create a picture of accumulated results. There is no substantial accumulation process in an Internet search.

The Internet search process is certainly applicable at the front end of the analytic supply chain to find customers or balances or other information. It is also critical at the far back end of the supply chain, to find the appropriate accumulated balance. But it has precious little to do with the process in the middle of turning business events into accumulated balances. This is a very different problem, but is performed in business computing systems all over the place. This process requires as much attention as the intense focus on the needle in-a-haystack search technology investments of the last 15 years.

I have recently been surprised by the number of companies beginning serious initiatives to overhaul the entire flow of data through their systems, from operational systems to reporting. Some are attempting to go so far as to completely eliminate any other data stores for any details except in one primary environment. Realizing what it takes to gain consistency of views of the data across a large enterprise, I find these efforts amazing.

Yet, as I analyze the direction of IT in this regard, such efforts seem to be consistent with where we must go. While some companies are working to make technology smaller and smaller and more and more distributed, the other end of the spectrum must also be progressing equally, driving down the per unit cost in consolidated environments through economies of scale and increasing the insights provided through that data. And as a company establishes the data backbone with the scale capable of tracking daily balances for individual customer-contracts, there is one further step we may be able to take to make the system cost effective.

Source System Integration

In a 30 minute conversation with Rick and Dave Willis in July 2010, Dave observed that the platform we had constructed was effectively reprocessing the entire bank’s balances every day. I asked him if he thought it possible to remove the need for the balances to be maintained in the source systems altogether and simply have those systems refer to the finance system for balances throughout the day. He said that he thought that was possible, and it would dramatically simplify the source systems.

I think this may be possible particularly for financial services firms. As we noted in The General Ledger, financial services firms systems by their nature are much more similar to a general ledger than others are. Some of the products keep track of amounts people have paid and what they have loaned, insured, or underwritten – these are simply forms of accounts. As money is digitized, the business is a digital business. So thinking about their source systems in that manner makes sense.

But it seems to me the problem has to be approached recognizing the difference in some of the traditional “boxes” we have placed on financial services data flow supply chain diagrams, as outline in the following chart.

Processing Characteristics Transaction Processing Operational Ledgers Integrated Finance Detailed Ledger Thin General Ledger
Frequency Real time (Market Hours or 24×7) Intraday Period Close (3x/day, daily, monthly, quarterly, yearly) Period Close (3x/day, daily, monthly, quarterly, yearly)
Level of Detail Trade, Deal or Transaction Level Product, “Leg,” Lot, Policy or Customer Account Level Product, “Leg,” Lot, Policy or Customer Account Level Aggregated Product/COA Level
Aggregation/ Integration None Settlement, Lot, Policy or Customer Account Financial reporting level Financial Accounting reporting level
History Required 1 Day History Multiple Days, Weeks or perhaps a Few Months Multiple Years Multiple Years
Perspective Quantity & Amount for Customer/Agent/Acct. Rep/Trader Focus or View Quantity & Amount for Settlement/ Customer/Cash/ Statement Focus, Operational Efficiency Primarily amount for Internal & External View of Financial Performance Nearly exclusively Amount, Internal & External View of Financial Performance

Processing Characteristics of Typical Financial Services System Layers

These various categories show a daunting array of requirements if everything were to be done in one system. Yet if this is the goal, perhaps a more practical approach, I submit, is to analyze the entire life cycle of business event generation through to report processes as a supply chain, and execute the appropriate layer of consolidated functions at higher scales.

Taking Dave’s example, the periodic posting process is now in multiple systems; we post the transaction detail in the operational systems, we post in the finance system, we “post” (in a sense) in the risk system, we post in reporting applications. All these posting processes could be consolidated into one posting engine (measured in minutes) working with the last master file and new transactions to produce a new master file. Interim transactions during the post process would need to be logged for the next cycle. When the new master file is available, all functions would cut over to use it

So if we are to undertake a consolidated operational and reporting environment, we must (1) consolidate the existing supply chain functional processes into single layers, (2) design the processes and data structures to preserve all business events, both customer facing events and journal and other reporting events, (3) optimize the indexed access required at both ends of the supply chain: transaction capture and report analysis, (4) perform each function at the appropriate time and with great efficiency. For the posting and aggregating functions, this would be done periodically, as close to analysis as possible, using many of the features of the SAFR architecture. In this way the data may be shared but complexity and capacity might be managed.


I am not sure what the results of this system would be. Does it turn McCarthy’s idea on its head: Instead of dismantling the finance systems all together, does it make the finance system the center of all needs for balances in the organization? Or does it do exactly what he envisioned, and nearly eliminate the finance system master files altogether? It will be interesting to see where we go from here.


1 One extract reformatted each daily and monthly record with key values, thus the output for one extract equaled the two balance files combined.
2 Results spreadsheet in author’s possession, unpublished.
3 David’s full calculation was as follows: “This run …started at 09:09 am on 5 May 2010. Elapsed clock time was 4 hours 45 minutes. CPU hour consumption was 20 hours 45 minutes. This was the equivalent of 4.37 fully loaded CPUs on the z9 EC 730. (The 30 CPU z9 can deliver approx 11,376 MIPS [millions of instructions per second]. If running at 100% capacity, so 4.37 CPUs = (4.37/30)*11,376 = 1,657 MIPS consumed. This gives a TOTAL MIPS burn of 1,657 MIPS * 4.75 hours = 7,871 MIPS-Hours for this [SAFR Extract] job step…. In this case, I get {7,871 MIP-hours * 3,600 seconds = 28.335 Million Mip-seconds} / {Total input records of 11.186 Billion records}. This is 28.335 Million instructions per 11,186 records, or 2,533 instructions per record.” E-mail from David Heap to Randall Ness, May 30, 2010, in author’s possession.