Chapter 20. Parallelism and Platform

A few months after I joined the SAFR project, Rick allowed a major US retailer to challenge the system to solve the most difficult problem of their choosing. The results of these benchmarks showed that a scan process, the heart of organizing the report production process to produce the results at the last possible minute, could be performed very rapidly.

Major Retailer

A major US retailer had a customer marketing database containing all of this retailer’s point of sale credit transactions during the last two years, organized by household. A series of COBOL programs, the standard tool for this type of work in 1993, would scan this data, and score each household’s buying activity according to various criteria as to attractiveness for marketing mailings. The output of this weekly process is used to drive mailing list applications.

The following are the results from the runs from November 1993. The data was stored on 80 tapes.1

Figure 39. Major Retailer CMDB Benchmark Results

The results2 were astounding to everyone involved: A 96% reduction in wall time from 28 hours to just over one hour, CPU time from 22 minutes to less than a minute, and CPU (computing) costs from US $18 thousand to less than US $1000.

The reduction in wall or elapsed time is due to parallelism. Parallelism is designing a computer system to use more than one CPU at a time. Think of it as using multiple computers at the same time, although in large computers each computer actually contains multiple CPUs. Suppose instead of doing road construction from one end of the project to the other, a city deployed multiple work crews along the road to work on segments of the road at the same time. The total length of the project might be as low as dividing the estimated time for the project by the number of work crews deployed.


CPU time is the accumulation of time used by all processors or computers. In our road work example, the total work time spent on the road project is the accumulation of each of the work crews’ time. One would have to pay each work crew for the time spent on the project, even though they are working all at the same time. So although using parallelism takes less time to complete the project, the total time worked and cost may not be reduced at all. In fact, if the plan for parallelism is poor, it may actually be slower and cost more: having one crew responsible for paving and the other responsible for grading means they would be working over the top of each other and saving no time or money.

The reduction in cost and CPU time for this application is purely because the machine code used by SAFR to solve the problem has many less instructions in it than the machine code of the COBOL program. Rick called this instruction path length. If the number of instructions for the COBOL program to accomplish the task were 100, SAFR accomplished the same work in less than four. There is no way around this. Not all of this reduction may be attributable to poor patterns in the COBOL compiler; some portion may attributable to a poorly written program that performed tasks that were not essential to solving the problem that needed to be solved. But consistent tests like this have shown that SAFR was 30% to 60% more efficient than COBOL for particular types of computing patterns. And those patterns exist in thousands, perhaps millions of business processes.3

Being able to scan 1 billion records in one hour was an unheard of level of performance at the time, and is still impressive today. It proved that organizing the report manufacturing process to use a great deal more detail than is used in the subsystem architecture is technically feasible.

The retailer couldn’t believe the results. So they asked to have SAFR applied to another problem of their choosing. They selected a system called Delinquent Account Collector Assignment. This nightly process evaluated all accounts that are one or more days past due and determined the follow up action to be taken and the collector assigned to follow up. Approximately 80,000 unique criteria were used in making the determination. Wall or elapsed time was a critical issue in this application as the results had to be available for the collectors each morning, and the production system ran multiple hours nightly. Here are the results:

Figure 40. Major Retailer Delinquent Account Collector Benchmark Results
Again, the reduction in wall, CPU, and cost was similar to the prior application. Over the years, these types of results were repeated again and again and again.

Death of the Mainframe

Around this time, the IT industry was embracing a new processing model called client-server. Client-server used a combination of PCs with larger servers to perform much of the computing. Many in the industry proclaimed the death of the mainframe.4

One of the drives for this was the cost of mainframe computing. I remember joking with Rick at this time that one of the problems with the mainframe was its ability to track its own costs; because it showed up as a line item on the IT financial report, people knew how much it costs, whereas PCs and servers tended to be buried amid other numbers in the report. Thus to reduce cost, the mainframe was attacked because it was visible.

Yet in truth at the time the machines were very expensive compared to getting similar computing capacities from PCs and servers. These less expensive machines used less expensive operating systems as well. One operating system that came to be used extensively was UNIX. You’ll remember that UNIX operating system is well suited to on-line processes, and the mainframe operating system tends to emphasize batch processes.5

This trend to client server computing tended to emphasize the processing characteristics at both ends of the subsystem architecture—the input and the final output. This trend contributed further to a de-emphasis of the need for the process in the middle, the process of turning business events into report values.

Rick responded to this trend first with a series of papers that pointed out the need for these functions, some coauthored by Eric.6 At times the discussion led to what some would call a holy war, with those supporting the mainframe allied against those supporting other platform architectures. The high point in this battle was when IBM asked Rick to install SAFR on the IBM Executive Briefing Center machine in Poughkeepsie, NY and show what it could do. The IBM engineers that monitored the system said they had never seen a system more fully utilize all aspects of a mainframe, from the disk processing system getting the data into and out of memory to the CPU processes. Not even IBM’s own software exploited the hardware more effectively to solving a business problem.

But our product had just come to market at a time when the market was being declared dead. At times the team felt like we were lone voices in the wilderness. With time and the maturing of the UNIX operating system, Rick proved that the concepts were not platform dependent while he was engaged at a global investment bank.

Global Investment Bank

Beginning in 1998 Rick was engaged on a project for a worldwide investment bank, to manage its Detail Repository, a central component of the Global General Ledger ERP implementation. The constructed repository provided trade-level detail analysis for internal and external global reporting. The project ported SAFR to UNIX to build the detailed financial reporting environment.

Approximately 1.3 million detailed transaction data records from twenty-two different feeder systems were loaded into the Detail Repository nightly. These transactions were trade level detail records from Europe, Asia Pacific, and North America. The UNIX version of SAFR scanned the repositories’ 51 million records in 22 entities and 269 physical partitions. It extracted 20 million records that were aggregated into approximately 480,000 summary balances. These summary records were sent to the GL for balance sheet and summary profit and loss reporting. This process ran in approximately 3 hours of elapsed time and 5 and ½ hours of CPU time and produced 30 different outputs.

A second Detail Repository process used the UNIX version of SAFR and custom programs to satisfy the intricate regulatory requirements. This system was executed 19 times with each execution handling a subset of the core business requirements. During this nightly process SAFR read 71 million records in 40 gigabytes, extracted 59 million records in 30 gigabytes, and performed 229 million table joins. The output was created in 12 CPU hours and 8 wall clock hours. In comparison, legacy applications required 24 hours to complete a limited number of these processes.

Outputs from these processes were used for US tax and regulatory, Asia-specific regulatory management, and Swiss regulatory reporting. They included information on:

  • Capital Allocations
  • Average Balancing and Multi-Currency Revaluation
  • Syndicated Loan Netting
  • Federal and Swiss Regulatory Collateral Allocation
  • Residual Maturity
  • Interest Rate Selection
  • Product Risk Weighting
  • Specific Reserve Allocation
  • Un-utilized Facility Calculation.

The process outputs included files used in additional processing or to feed other systems, delimited files, tabular reports, and inputs to a cube-based reporting system.

The following slide depicts the system architecture for this implementation:

Figure 41. Global Investment Bank Solution Architecture Diagram

UNIX Benchmark

In the spring of 2002 the team conducted a head-to-head benchmark of the UNIX version of SAFR against the mainframe version. The test consisted of running 10 concurrent SAFR extract processes (in parallel) writing to 10 extract files. Each thread read 1 million rows, 100 megabyte source file and wrote 1 million rows, 300 megabyte output file.

Each output row contained 6 sort key columns, 10 columns with fields from the source file, and 16 columns with derived fields from joins. The results show that the UNIX and mainframe versions of the software were similar, and performance increased roughly in a linear fashion with platform power.7

Figure 42. UNIX vs. Mainframe Platform Benchmark Results

The test showed that the UNIX platform actually could perform the functions faster than the mainframe, although the mainframe CPU time was less.

In 1999 Doug, Jay and Al Sung, who performed the z/OS to UNIX port, investigated porting SAFR to a new generation Intel processor called Itanium. They worked with Intel on the potential for having a much smaller computer perform data analysis like no other small machine.8


The issue isn’t really the platform; it’s the need for a process between event capture and reporting. My experience is that people who work with mainframes understand batch processes, particularly long running batch processes, better than those that work with UNIX.

It seems to me that writing a batch program is a dying art, or science, or craft, or whatever you care to call it; fewer and fewer people can write these programs. One reason is the overwhelming emphasis on on-line systems, and the desire to reduce latency in processes, the lag between the time the transactions are created and when the reports are produced.

Does this mean that batch processing will someday go away? Over my career, I have heard a lot of opinions saying yes; but I don’t believe it. Innovation, by and large, usually does not eliminate earlier classes of technology, but rather adds to them. Batch processes are not in danger of being eliminated soon for a number of reasons.

As we have noted, reporting processes by their nature deal with history, they deal with business events that happened hours, days, weeks, months or even years ago. They also require accumulation of those events. Also, financial reporting processes in particular include cutoff points. Year end reports, a whole raft of them, are produced including business events up to 12:00 midnight December 31. All reports are using this data, and should be produced consistently. The use of a batch program facilitates these requirements nicely.

It seems to me they are unlikely to go away for a more fundamental reason: The most advanced information processing mechanism on the planet, the human brain, sleeps every night. The earth turns on an axis as it revolves around the Sun, and half the world is dark at any one time. Sleep and dreaming are a type of batch processing. If evolutionary biology hasn’t eliminated batch processes, they are unlikely to go away in my lifetime.

So although initial data capture of business events and the ultimate display of accumulated business events will likely always entail some type of on-line processing, the reporting processes in the middle will likely continue to have a batch component. ETL processes are by and large batch. What many people are saying when they argue batch should be done away with is the length of time batch processes take is too long. Because on-line systems are driven by clicks with people sitting in front of the screens, their processing cycles from start to end of each transaction must be short.

Yet if the report production process was driven down to an acceptable level, the number of new “configurations” if reports that may be available may dramatically change the nature and structure of the reporting solution.


1 Data for the major US retailer benchmarks from interview with Jay Poulos on May 18, 2009. As part of the project, the retailer’s IT personnel review system outputs to verify results.
2 The differences in the records read were because SAFR didn’t ignore voided records or households with more than 1,000 purchases. Thus SAFR did more work than the production system.
3 A 2009 program translated from COBOL to assembler results in a reduction in the load module size from 13,808 bytes, to 5,416 bytes, a 60% reduction in the load modules size. Load module size is not exactly the same as the machine instructions contained in the program, but it is indicative. One of the major efficiencies in the retail benchmark likely came from the use of BSAM access method rather than COBOL’s default QSAM access method. The access method is a function of the operating system, not COBOL.
4 “It was around that same time that some industry observers were declaring the impending death of the mainframe. One such analyst wrote in the March 1991 issue of InfoWorld, for example, I predict that the last mainframe will be unplugged on March 15, 1996.” IBM Online Archives, Exhibits, Mainframes, Page 4, or at, (Accessed May 2009).
5 See Servers and Online Processes in the Types of Computers and Processes chapter.

6 Richard K Roth, Denna, Eric L. Ph.D. Making Good on a Promise: Platform Assignment is the Key to Leveraging Data Warehouses Manufacturing Systems, February 1996 (© Chilton Publications).
Richard K. Roth, Denna, Eric L., Ph.D., Platform Assignment Principles For Decision Support Systems And Data Warehouses, Price Waterhouse White Paper, Nov. 1995.

7 Mainframe power is measured in Millions of Instructions per Second, (MIPS) rather than in mega or gigahertz.
8 Notes from interview with Jay Poulos and Doug Kunkel, June 11, 2009.